How to Add and Manage Brokers in Kafka (with Examples)

Updated: January 31, 2024 By: Guest Contributor Post a comment

Introduction

Apache Kafka is a distributed stream processing system which is widely used for handling real-time data feeds. As businesses grow and data demands increase, the ability to scale and manage Kafka brokers becomes essential. This tutorial will guide you through the basic to advanced steps required to add and manage brokers within a Kafka cluster.

What are Kafka Brokers?

Before delving into the management of brokers, it’s important to understand what a broker is. In Kafka, a broker is a server that stores data and serves clients. A Kafka cluster consists of one or more brokers to ensure fault tolerance and high availability.

Let’s start with some prerequisites needed before adding or managing Kafka brokers:

  • Java Runtime Environment (JRE) or Java Development Kit (JDK) installed.
  • Apache Kafka downloaded and extracted.
  • A basic understanding of Kafka architecture and concepts.

Setting Up Your First Broker

To set up a Kafka broker, you first need to configure the Kafka server properties file. This file is located in the config directory of your Kafka installation. For the first broker, you can use the default configuration.

# Start the ZooKeeper service
bin/zookeeper-server-start.sh config/zookeeper.properties

# Start the Kafka broker
bin/kafka-server-start.sh config/server.properties

This will start a single Kafka broker along with ZooKeeper, which manages broker coordination.

Adding a New Broker to the Cluster

To add a new broker to an existing Kafka cluster, you need to create a new server.properties file for the new broker and configure the following properties:

broker.id=2
listeners=PLAINTEXT://:9093
log.dirs=/tmp/kafka-logs-2

It is important that the broker.id is unique within the cluster. Additionally, you need to set a separate log.dirs path and ensure that the listeners port does not conflict with existing brokers.

Start the new broker by running:

bin/kafka-server-start.sh config/server-2.properties

Expanding the Cluster

Expanding your Kafka cluster involves adding more brokers. You can repeat the above process for each new broker you wish to add. A guide on how to replicate topics and partitions among the new brokers for scalability and fault tolerance may be required, which is beyond the scope of this tutorial.

Monitoring Kafka Brokers

Once your brokers are up and running, monitoring is crucial. Kafka comes with several inbuilt monitoring capabilities which can be accessed by JMX (Java Management Extensions). You can use the following command to enable JMX when starting a broker.

KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false" bin/kafka-server-start.sh config/server.properties

There are also various community and commercial tools available for Kafka monitoring, such as Prometheus and Grafana, which provide visual dashboards to monitor your Kafka cluster health.

Removing Brokers from the Cluster

As with any dynamic system, sometimes you may need to remove brokers from the cluster. This could be for maintenance, scaling down, or decommissioning a server.

To remove a broker, you need to ensure that no partitions use the broker as their leader, which requires reassigning partitions through Kafka’s partition reassignment tool.

bin/kafka-reassign-partitions.sh --bootstrap-server localhost:9092
 --reassignment-json-file reassign.json --execute

Where reassign.json contains the reassignment plan. Ensure no significant impact on your system’s performance or availability before, during, and after removing a broker.

Conclusion

Managing brokers within a Kafka cluster is a key aspect of maintaining a scalable, reliable messaging system. By following the processes outlined in this tutorial, you should now have a good grasp of how to add new brokers, monitor them efficiently, and gracefully remove them if necessary. As your Kafka ecosystem grows, leveraging these skills will be fundamental to achieving robust data systems design and operation.