What is Kafka?

Apache Kafka is a distributed event-streaming platform. It is similar to a big commit log where data is stored in sequence in real-time. A commit log keeps track of what's happening—a record of transactions.

How does this work in the real world?

If you track the activity of your website visitors in real time, you're using a real-time streaming data architecture. This is where Kafka comes in.

If you track the activity of your website visitors in real time, you're using a real-time streaming data architecture. This is where Kafka comes in.

Some applications of Kafka include analyzing data in real time, streaming data in real time to other systems, tracking website activity, ingesting data into Spark or Hadoop and collecting logs.

How does Kafka work?

Kafka allows users to subscribe to it and publish data (writing data) to numerous systems or real-time applications.

Where does Kafka fit in? Image courtesy: Confluent

Kafka consists of four elements:

  1. Topics: A stream of records that store Kafka messages
  2. Consumers: Other applications that process Kafka messages (they read data from topics)
  3. Producers: The applications that publish data into the Kafka system (they write data to topics)
  4. Brokers: The servers that run Kafka (Kafka nodes)

Applications (producers) send messages (records) to a Kafka node (broker) and said messages are processed by other applications called consumers. Said messages get stored in a topic and consumers subscribe to the topic to receive new messages.

Stanislav Kozlovski, Software Engineer at Confluent

Still racking your brains trying to figure out what's what? 🤯 Here's an illustration to help you out.

The elements of Kafka. Image courtesy: Cloudkarafka

Why should you use Kafka?

Kafka is horizontally scalable—adds new machines when the data repository gets overloaded.

Since Kafka uses a distributed architecture, it is fault-tolerant with no single points of failure.

Who uses Kafka?

Large organizations that deal with a lot of data use Kafka.

Initially developed by LinkedIn and donated to the Apache Software Foundation, Kafka is written in Scala and Java.

LinkedIn initially used Kafka to track activity data and operational metrics. Netflix uses Kafka to provide real-time recommendations to its subscribers. Uber uses Kafka to collect and transform data that they feed into their pricing model (Uber's dynamic pricing is result of this).

Several organizations worldwide such as Airbnb, Goldman Sachs, The New York Times, Apple, Cisco, PayPal and Walmart use Kafka.

Think we're missing something? 🧐 Help us update this article by sending us your suggestions here. 🙏

See also

Articles you might be interested in

  1. Thorough introduction to Apache Kafka
  2. Getting started with Apache Kafka in Python
  3. What is Kafka?
  4. Publishing with Apache Kafka at The New York Times