What is Hadoop?
Apache Hadoop is a set of open-source software programs that allows you to process large data sets across clusters of computers using distributed systems. It is one of the most widely used frameworks to manage big data processing and storage.
What does Hadoop do?
Along with other tools and applications in its ecosystem, Hadoop helps collect, store, process, analyze and manage big data.
Hadoop processes massive amounts of data using distributed computing—a network of computers that don't share any memory or disks. Since data isn't stored in a central repository, hardware failures don't affect the data stored using the Hadoop framework.
Doug Cutting, one of the original creators, named the framework 'Hadoop' after his son's beloved yellow toy elephant.🐘
What are the different modules of Hadoop?
Hadoop is made up of four main modules—HDFS, MapReduce, Hadoop Common and YARN. The primary components are HDFS and MapReduce.
1. HDFS (Hadoop Distributed File System)
In HDFS, the files are broken down into smaller blocks and distributed across a cluster—a system of linked computers. Each block is replicated several times to avoid data loss due to a hardware malfunction.
The computers in the cluster are called nodes. The nodes can be NameNodes (that manage the file system metadata) or DataNodes (that store the actual data).
Initially developed by Google, you can use MapReduce to analyze data in Hadoop. This takes place in two stages:
- Map: Read input data from a database and prepare it for analysis
- Reduce: Process the data and perform mathematical operations
To work with MapReduce, you must be well-versed with Java.
3. Hadoop Common
Hadoop Common contains Java libraries and utilities required by all Hadoop modules to read data stored in the Hadoop file system.
4. Apache YARN (Yet Another Resource Negotiator)
Apache YARN is a resource manager. In Hadoop, it is the central platform responsible for managing computing resources across Hadoop clusters and scheduling jobs.
Think we're missing something? 🧐 Help us update this article by sending us your suggestions here. 🙏