The Atlan Data Wiki
  • Algorithms
    • Clustering
    • Decision Trees
    • Random Forests
    • Logistic Regression
    • Linear Regression
    • k-nearest Neighbors
  • Big Data and Big Data Technologies
    • Apache Spark
    • Apache Kafka
    • Apache Flume
    • Apache Cassandra
    • Big Data
  • CRUD
    • CRUD
  • Dark Data
    • Dark Data
  • Data Dictionary
    • Data Dictionary
  • Data Lifecycle
    • Data Governance
    • Data Visualization
    • Data Analysis
    • Data Science
    • Data Storage
    • Data Cleaning
    • Data Pipeline
    • Data Architecture
    • DataOps
    • Data Acquisition
  • Formats of Data
    • Unstructured Data
    • Structured Data
  • Hadoop
    • Apache ZooKeeper
    • Apache Pig
    • Apache Impala
    • Apache Hive
    • HBase
    • Azkaban
    • Hadoop
  • Languages
    • R
    • Python
  • Libraries
    • SciPy
    • TensorFlow
    • Scikit-learn
    • Pandas
    • NumPy
    • Matplotlib
  • Project Jupyter
    • Project Jupyter
  • Roles in Data Teams
    • Statistician
    • Machine Learning (ML) Engineer
    • Data Scientist
    • Data Protection Officer (DPO)
    • Data Engineer
    • Chief Data Officer (CDO)
    • Business Analyst
  • Sources of Data
    • Alternative Data
    • Internal Data
    • External Data
  • Types of Computer Science Disciplines
    • Deep Learning
    • Machine Learning (ML)
    • Artificial Intelligence (AI)
  • Types of Data Repositories
    • Data Mart
    • Data Warehouse
    • Data Lake
    • Database Management System (DBMS)
    • Database
Published with Ghost
  • About Us
  • Contributors
  • Subscribe for Free Resources

Hadoop

Hadoop is an open-source software ecosystem for big data processing and storage.

Apache ZooKeeper

Apache ZooKeeper is an open-source, distributed service for collecting and moving logs. It helps you read, write and observe updates to data in distributed systems.

  • Ayswarrya G
Updated 3 years ago

Apache Pig

Apache Pig is a high level scripting language used with Hadoop to simplify MapReduce programming.

  • Ayswarrya G
Updated 3 years ago

Apache Impala

Apache Impala is an open-source SQL query engine for processing large volumes of data stored in Hadoop clusters (aka where Hadoop stores its data—HDFS, HBase or even an Amazon S3 bucket).

  • Ayswarrya G
Updated 3 years ago

Apache Hive

Apache Hive is a data warehouse software built on top of Hadoop for analyzing data stored in Hadoop clusters. Initially developed by Facebook, Hive is written in Java.

  • Ayswarrya G
Updated 3 years ago

HBase

Apache HBase (Hadoop Database) is a NoSQL database that runs on top of the HDFS (Hadoop Distributed File System). It is natively integrated with Hadoop.

  • Ayswarrya G
Updated 3 years ago