The Atlan Data Wiki
  • Algorithms
    • Clustering
    • Decision Trees
    • Random Forests
    • Logistic Regression
    • Linear Regression
    • k-nearest Neighbors
  • Big Data and Big Data Technologies
    • Apache Spark
    • Apache Kafka
    • Apache Flume
    • Apache Cassandra
    • Big Data
  • CRUD
    • CRUD
  • Dark Data
    • Dark Data
  • Data Dictionary
    • Data Dictionary
  • Data Lifecycle
    • Data Governance
    • Data Visualization
    • Data Analysis
    • Data Science
    • Data Storage
    • Data Cleaning
    • Data Pipeline
    • Data Architecture
    • DataOps
    • Data Acquisition
  • Formats of Data
    • Unstructured Data
    • Structured Data
  • Hadoop
    • Apache ZooKeeper
    • Apache Pig
    • Apache Impala
    • Apache Hive
    • HBase
    • Azkaban
    • Hadoop
  • Languages
    • R
    • Python
  • Libraries
    • SciPy
    • TensorFlow
    • Scikit-learn
    • Pandas
    • NumPy
    • Matplotlib
  • Project Jupyter
    • Project Jupyter
  • Roles in Data Teams
    • Statistician
    • Machine Learning (ML) Engineer
    • Data Scientist
    • Data Protection Officer (DPO)
    • Data Engineer
    • Chief Data Officer (CDO)
    • Business Analyst
  • Sources of Data
    • Alternative Data
    • Internal Data
    • External Data
  • Types of Computer Science Disciplines
    • Deep Learning
    • Machine Learning (ML)
    • Artificial Intelligence (AI)
  • Types of Data Repositories
    • Data Mart
    • Data Warehouse
    • Data Lake
    • Database Management System (DBMS)
    • Database
Published with Ghost
  • About Us
  • Contributors
  • Subscribe for Free Resources

Big Data and Big Data Technologies

Understand what's big data and take a look at some of the most popular big data technologies used today.

Apache Spark

Apache Spark is an open-source, distributed computing framework for processing and analyzing big data.

  • Ayswarrya G
Updated a year ago

Apache Kafka

Apache Kafka is a distributed event-streaming platform. It is similar to a big commit log where data is stored in sequence in real-time. A commit log keeps track of what's happening—a record of transactions.

  • Ayswarrya G
Updated a year ago

Apache Flume

Apache Flume is an open-source, distributed service for collecting and moving logs.

  • Ayswarrya G
Updated a year ago

Apache Cassandra

Apache Cassandra is an open-source, distributed, NoSQL DBMS that can process large volumes of data across several servers quickly.

  • Ayswarrya G
Updated a year ago

Big Data

Big data refers to massive and complex volumes of structured, semi-structured or unstructured data. Examples include social media data, transactional data(stock prices, purchase histories), sensor data (location data, weather data) and satellite data.

  • Ayswarrya G
Updated a year ago