Apache Kafka is a distributed streaming platform that has established itself as a critical component for building real-time, fault-tolerant, and scalable messaging systems. With Kafka, developers can publish, subscribe to, store, and process streams of records in a fault-tolerant manner. As Kafka continues to evolve, staying up-to-date with its features is essential for developers and data engineers. In this cheat sheet, we will cover key commands, configurations, and concepts needed to work efficiently with Apache Kafka these days.
Introduction to Kafka Basics
Before diving into the practical commands, let’s ground our knowledge with some Kafka basics:
- Producer: An entity that publishes data to Kafka topics.
- Consumer: An entity that subscribes to topics and processes the feed of published records.
- Broker: A Kafka server that stores data and serves clients.
- Topic: A category or feed name to which records are published.
- Partition: Topics are split into partitions, which are ordered logs for a subset of the data.
Setting Up Kafka
Starting with the installation of Kafka, we’ll need to download the binaries, extract them, and start the Kafka environment.
wget https://archive.apache.org/dist/kafka/3.2.0/kafka_2.13-3.2.0.tgz
tar -xzf kafka_2.13-3.2.0.tgz
cd kafka_2.13-3.2.0
bin/zookeeper-server-start.sh config/zookeeper.properties &
bin/kafka-server-start.sh config/server.properties &
Working with Topics
Topics in Kafka are the core around which the rest of Kafka is built. Let’s go through some essential topic operations:
Creating a Topic
bin/kafka-topics.sh --create --partitions 3 --replication-factor 1 --topic my_topic --bootstrap-server localhost:9092
Listing Topics
bin/kafka-topics.sh --list --bootstrap-server localhost:9092
Describing a Topic
bin/kafka-topics.sh --describe --topic my_topic --bootstrap-server localhost:9092
Deleting a Topic
bin/kafka-topics.sh --delete --topic my_topic --bootstrap-server localhost:9092
Producing and Consuming Messages
Producing Messages to a Topic
echo "Hello, Kafka!" | bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my_topic
Consuming Messages from a Topic
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my_topic --from-beginning
Advanced Kafka Operations
As we get more familiar with Kafka, we might need to perform more advanced operations like configuring partitions and consumer groups or conducting administrative tasks such as increasing replicas for topics.
Modifying Topic Partitions
bin/kafka-topics.sh --alter --zookeeper localhost:2181 --topic my_topic --partitions 4
Consumer Groups
Consumers in Kafka are typically part of a consumer group. Let’s see how to manage these groups:
Listing Consumer Groups
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
Describing Consumer Groups
bin/kafka-consumer-groups.sh --describe --group my_group --bootstrap-server localhost:9092
Resetting Consumer Group Offsets
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group my_group --reset-offsets --to-earliest --execute --topic my_topic
Monitoring and Managing Kafka
Monitoring Kafka is critical for understanding the performance and the health of the cluster. Apache Kafka provides JMX metrics out of the box, and you may choose to monitor these with tools like JConsole or integrate them into monitoring solutions such as Prometheus.
Conclusion
In this guide, we’ve explored key concepts, configurations, and commands that are essential for effective Kafka operations. From setting up a Kafka cluster to advanced consumer group management, these practical tips should serve as a rapid reference for your Kafka-related tasks. With the power of Kafka at your fingertips, you can build and maintain robust streaming applications well into the future.