What is Apache Impala?
Impala's MPP architecture allows fast computations against data stored in a Hadoop cluster using SQL. It combines the experience of a tradition analytics database with the scalability of Apache Hadoop. (It means you can analyze lots of data quickly, using our friendly neighborhood SQL! 😍🚀)
What does Impala add to the Apache Hadoop ecosystem?
Impala can communicate directly with storage engines using Apache HBase and HDFS (Hadoop's file system). It uses the same file formats (like Parquet, Avro, RCFile) that are used by Hadoop. As a result, Impala avoids having to transport data into specialized systems/formats for analytics and seamlessly integrates with the Hadoop cluster.
Impala also uses the same metadata engines, security and resource management frameworks used by MapReduce, Apache Hive, Apache Pig and other Hadoop software. All data is immediately query-able, with no delays for ETL.
- Apache HBase
- Apache Hadoop