How to download and install Kafka on Ubuntu

Introduction
1. Prerequisites
Step-by-Step Instructions
What’s Next?
Conclusion

Introduction

Apache Kafka is a distributed streaming platform with powerful publish-subscribe messaging capabilities and robust features for processing data streams. It has become a preferred choice for building real-time analytics and monitoring pipelines, event sourcing architectures, and log aggregation systems. In this tutorial, you’ll learn how to download and install Kafka on Ubuntu and set up the basic environment to get started with stream processing. Whether you’re a beginner or experienced developer, these steps will guide you through seamlessly.

Prerequisites

Before we dive into the installation process, ensure you meet the following prerequisites:

An Ubuntu 18.04 or higher server.
A non-root user with sudo privileges set up on your server.
The Java Development Kit (JDK) because Kafka runs on Java. You can use OpenJDK or Oracle JDK, version 8 or higher.

Step-by-Step Instructions

Step #1 – Installing Java

Kafka is written in Java, so the first step is to install Java on your Ubuntu server. Install OpenJDK with the following command:

sudo apt update
sudo apt install openjdk-11-jdk -y

Verify the installation with:

java -version

You should see output indicating the installed version of Java, such as:

openjdk version "11.0.1" 2019-10-15
OpenJDK Runtime Environment (build 11.0.1+13-Ubuntu-1ubuntu218.04.4)
OpenJDK 64-Bit Server VM (build 11.0.1+13-Ubuntu-1ubuntu218.04.4, mixed mode, sharing)

Step #2 – Download Kafka

Next, we need to download Kafka. Go to the official Kafka download page and grab the latest binary. You could also use curl to download Kafka directly to your server:

curl -O https://downloads.apache.org/kafka/latest/kafka_2.13-3.0.0.tgz

Step #3 – Install Kafka

Once Kafka is downloaded, it’s time to install it. Begin by extracting the tar file:

tar -xvzf kafka_2.13-3.0.0.tgz

Move the extracted folder to a proper directory, like /usr/local/kafka:

sudo mv kafka_2.13-3.0.0 /usr/local/kafka

Step #4 – Configuring Kafka

We now have Kafka on our server, but we need to configure it. Let’s create necessary directories for Kafka’s operation, such as the logs:

sudo mkdir -p /var/lib/kafka/data

Then we must update the configuration file. Open server.properties:

sudo nano /usr/local/kafka/config/server.properties

In ‘server.properties’, set the ‘log.dirs’ variable to the directory created above:

log.dirs=/var/lib/kafka/data

Step #5 – Setting Up Kafka as a System Service

If you want Kafka to run as a service (which is typical in a production environment), you will need to set up unit files for Zookeeper and Kafka. Start with Zookeeper:

[Unit]
Description=Zookeeper Service
Requires=network.target
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties
ExecStop=/usr/local/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Then, create a similar unit file for Kafka:

[Unit]
Description=Apache Kafka Server
Requires=zookeeper.service
After=zookeeper.service

[Service]
Type=simple
ExecStart=/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
ExecStop=/usr/local/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Step #6 – Start Kafka

Once you’ve set up your service files, enable and start Zookeeper, and then Kafka:

sudo systemctl enable zookeeper
sudo systemctl start zookeeper
sudo systemctl enable kafka
sudo systemctl start kafka

Confirm they’re running:

sudo systemctl status zookeeper
sudo systemctl status kafka

You should see active (running) in the output, telling you everything is set up correctly.

Step #7 – Testing Kafka Installation

To test your installation, create a topic using Kafka’s built-in command-line tools:

/usr/local/kafka/bin/kafka-topics.sh --create --topic test --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1

Now you can write messages into the topic as a producer:

/usr/local/kafka/bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092

Press CTRL+C to stop the producer. To read these messages as a consumer:

/usr/local/kafka/bin/kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092

What’s Next?

To take your Kafka knowledge further, consider exploring the Kafka Streams API for stream processing, connect Kafka to external systems with Kafka Connect, and ensure high availability with multiple Kafka brokers and Zookeeper nodes.

Kafka also has a powerful REST proxy for building web applications that interact with Kafka clusters. Scaling your Kafka deployment and setting up cluster monitoring with tools like Kafka’s JMX metrics, Prometheus, and Grafana can provide deeper insights into the system’s performance.

Conclusion

You now have a fully functional Kafka environment on your Ubuntu server. This streaming platform is designed to handle real-time data feeds and build powerful streaming applications. With Kafka installed, you can start developing systems capable of processing large amounts of data with ease. The journey to mastering Kafka is ongoing, so continue learning and experimenting to leverage its full potential.

Next Article: How to use Kafka with Docker and Docker Compose

Previous Article: How to set up Kafka on Mac

Series: Apache Kafka Tutorials

DevOps