What is Pig?
Apache Pig is a platform for Apache Hadoop used to simplify MapReduce programming—the data processing module in Hadoop. Some applications of Pig include building data pipelines, building behavior prediction models, exploring raw data and building iterative processing models
Data flow language
Pig is a data flow language. Data flow languages allow you to define how data should be read, processed and stored. You can build workflows (like you do for data pipelines) for this purpose.
How does Pig work?
The language used to write scripts in Pig is called Pig Latin, which is a simple SQL-like language.
With Pig, you can write complex data manipulations without knowing Java.
200 lines of Java code (which takes hours to work) can be written in only 10 lines using the Pig Latin language (barely 15 minutes to write and execute)! 😱
Yahoo! originally developed Pig to make processing large data sets easier for its researchers.