Apache Kafka: A Practical Cheat Sheet (Updated)

Updated: January 30, 2024 By: Guest Contributor Post a comment

Apache Kafka is a distributed streaming platform that has established itself as a critical component for building real-time, fault-tolerant, and scalable messaging systems. With Kafka, developers can publish, subscribe to, store, and process streams of records in a fault-tolerant manner. As Kafka continues to evolve, staying up-to-date with its features is essential for developers and data engineers. In this cheat sheet, we will cover key commands, configurations, and concepts needed to work efficiently with Apache Kafka these days.

Introduction to Kafka Basics

Before diving into the practical commands, let’s ground our knowledge with some Kafka basics:

  • Producer: An entity that publishes data to Kafka topics.
  • Consumer: An entity that subscribes to topics and processes the feed of published records.
  • Broker: A Kafka server that stores data and serves clients.
  • Topic: A category or feed name to which records are published.
  • Partition: Topics are split into partitions, which are ordered logs for a subset of the data.

Setting Up Kafka

Starting with the installation of Kafka, we’ll need to download the binaries, extract them, and start the Kafka environment.

wget https://archive.apache.org/dist/kafka/3.2.0/kafka_2.13-3.2.0.tgz
tar -xzf kafka_2.13-3.2.0.tgz
cd kafka_2.13-3.2.0
bin/zookeeper-server-start.sh config/zookeeper.properties &
bin/kafka-server-start.sh config/server.properties &

Working with Topics

Topics in Kafka are the core around which the rest of Kafka is built. Let’s go through some essential topic operations:

Creating a Topic

bin/kafka-topics.sh --create --partitions 3 --replication-factor 1 --topic my_topic --bootstrap-server localhost:9092

Listing Topics

bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Describing a Topic

bin/kafka-topics.sh --describe --topic my_topic --bootstrap-server localhost:9092

Deleting a Topic

bin/kafka-topics.sh --delete --topic my_topic --bootstrap-server localhost:9092

Producing and Consuming Messages

Producing Messages to a Topic

echo "Hello, Kafka!" | bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic

Consuming Messages from a Topic

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my_topic --from-beginning

Advanced Kafka Operations

As we get more familiar with Kafka, we might need to perform more advanced operations like configuring partitions and consumer groups or conducting administrative tasks such as increasing replicas for topics.

Modifying Topic Partitions

bin/kafka-topics.sh --alter --zookeeper localhost:2181 --topic my_topic --partitions 4

Consumer Groups

Consumers in Kafka are typically part of a consumer group. Let’s see how to manage these groups:

Listing Consumer Groups

bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092

Describing Consumer Groups

bin/kafka-consumer-groups.sh --describe --group my_group --bootstrap-server localhost:9092

Resetting Consumer Group Offsets

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group my_group --reset-offsets --to-earliest --execute --topic my_topic

Monitoring and Managing Kafka

Monitoring Kafka is critical for understanding the performance and the health of the cluster. Apache Kafka provides JMX metrics out of the box, and you may choose to monitor these with tools like JConsole or integrate them into monitoring solutions such as Prometheus.

Conclusion

In this guide, we’ve explored key concepts, configurations, and commands that are essential for effective Kafka operations. From setting up a Kafka cluster to advanced consumer group management, these practical tips should serve as a rapid reference for your Kafka-related tasks. With the power of Kafka at your fingertips, you can build and maintain robust streaming applications well into the future.