What is Hive?
Apache Hive is a data warehouse software built on top of Hadoop for analyzing data stored in Hadoop clusters. Initially developed by Facebook, Hive is written in Java.
How does Hive work?
Hive provides a data query interface to Apache Hadoop. This means you can read, write and manage data by writing queries in Hive.
To write queries, Apache Hive offers a SQL-like language called HiveQL.
Hive makes MapReduce (the data processing module of Hadoop) programming easier as you don't have to be familiar with writing long Java codes.
Instead, you can write queries in HiveQL (easy if you're already familiar with SQL) and Hive will create the map (reading input data) and reduce the functions (processing data).
How is Hive different from Pig?
Hive is used for querying, so it helps you describe the question you want to be answered. But it doesn't control how the question is answered.
For that, you need a data flow language (like Pig Latin), which defines how data should be processed.
Here are some other differences between Apache Hive and Apache Pig.
|Apache Hive||Apache Pig|
|Uses a querying language (HQL)||Uses a data flow language (Pig Latin)|
|Can only handle structured data||Can handle structured and unstructured data|
|Mainly used for data querying and reporting||Mainly used for building data pipelines and exploring raw data|
|Mainly used by analysts||Mainly used by researchers and engineers|
|Originally developed by Facebook||Originally developed by Yahoo!|
Think we're missing something? 🧐 Help us update this article by sending us your suggestions here. 🙏