What is Pig?

Apache Pig is a platform for Apache Hadoop used to simplify MapReduce programming—the data processing module in Hadoop. Some applications of Pig include building data pipelines, building behavior prediction models, exploring raw data and building iterative processing models

Data flow language

Pig is a data flow language. Data flow languages allow you to define how data should be read, processed and stored. You can build workflows (like you do for data pipelines) for this purpose.

How does Pig work?

Pig can work with data from various sources in a wide variety of formats—structured and unstructured.

The language used to write scripts in Pig is called Pig Latin, which is a simple SQL-like language.

With Pig, you can write complex data manipulations without knowing Java.

200 lines of Java code (which takes hours to work) can be written in only 10 lines using the Pig Latin language (barely 15 minutes to write and execute)! 😱

Pig translates your Pig Latin scripts into a series of MapReduce jobs for processing large volumes of data using Hadoop and stores the results into the HDFS.

Yahoo! originally developed Pig to make processing large data sets easier for its researchers.

Articles you might be interested in

  1. Beginner's guide to Apache Pig
  2. Introduction to Apache Pig
  3. Programming Pig