Introduction
Apache Kafka is a distributed streaming platform with powerful publish-subscribe messaging capabilities and robust features for processing data streams. It has become a preferred choice for building real-time analytics and monitoring pipelines, event sourcing architectures, and log aggregation systems. In this tutorial, you’ll learn how to download and install Kafka on Ubuntu and set up the basic environment to get started with stream processing. Whether you’re a beginner or experienced developer, these steps will guide you through seamlessly.
Prerequisites
Before we dive into the installation process, ensure you meet the following prerequisites:
- An Ubuntu 18.04 or higher server.
- A non-root user with sudo privileges set up on your server.
- The Java Development Kit (JDK) because Kafka runs on Java. You can use OpenJDK or Oracle JDK, version 8 or higher.
Step-by-Step Instructions
Step #1 – Installing Java
Kafka is written in Java, so the first step is to install Java on your Ubuntu server. Install OpenJDK with the following command:
sudo apt update
sudo apt install openjdk-11-jdk -y
Verify the installation with:
java -version
You should see output indicating the installed version of Java, such as:
openjdk version "11.0.1" 2019-10-15
OpenJDK Runtime Environment (build 11.0.1+13-Ubuntu-1ubuntu218.04.4)
OpenJDK 64-Bit Server VM (build 11.0.1+13-Ubuntu-1ubuntu218.04.4, mixed mode, sharing)
Step #2 – Download Kafka
Next, we need to download Kafka. Go to the official Kafka download page and grab the latest binary. You could also use curl to download Kafka directly to your server:
curl -O https://downloads.apache.org/kafka/latest/kafka_2.13-3.0.0.tgz
Step #3 – Install Kafka
Once Kafka is downloaded, it’s time to install it. Begin by extracting the tar file:
tar -xvzf kafka_2.13-3.0.0.tgz
Move the extracted folder to a proper directory, like /usr/local/kafka:
sudo mv kafka_2.13-3.0.0 /usr/local/kafka
Step #4 – Configuring Kafka
We now have Kafka on our server, but we need to configure it. Let’s create necessary directories for Kafka’s operation, such as the logs:
sudo mkdir -p /var/lib/kafka/data
Then we must update the configuration file. Open server.properties:
sudo nano /usr/local/kafka/config/server.properties
In ‘server.properties’, set the ‘log.dirs’ variable to the directory created above:
log.dirs=/var/lib/kafka/data
Step #5 – Setting Up Kafka as a System Service
If you want Kafka to run as a service (which is typical in a production environment), you will need to set up unit files for Zookeeper and Kafka. Start with Zookeeper:
[Unit]
Description=Zookeeper Service
Requires=network.target
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties
ExecStop=/usr/local/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Then, create a similar unit file for Kafka:
[Unit]
Description=Apache Kafka Server
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
ExecStart=/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
ExecStop=/usr/local/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal
[Install]
WantedBy=multi-user.target
Step #6 – Start Kafka
Once you’ve set up your service files, enable and start Zookeeper, and then Kafka:
sudo systemctl enable zookeeper
sudo systemctl start zookeeper
sudo systemctl enable kafka
sudo systemctl start kafka
Confirm they’re running:
sudo systemctl status zookeeper
sudo systemctl status kafka
You should see active (running) in the output, telling you everything is set up correctly.
Step #7 – Testing Kafka Installation
To test your installation, create a topic using Kafka’s built-in command-line tools:
/usr/local/kafka/bin/kafka-topics.sh --create --topic test --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1
Now you can write messages into the topic as a producer:
/usr/local/kafka/bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
Press CTRL+C to stop the producer. To read these messages as a consumer:
/usr/local/kafka/bin/kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092
What’s Next?
To take your Kafka knowledge further, consider exploring the Kafka Streams API for stream processing, connect Kafka to external systems with Kafka Connect, and ensure high availability with multiple Kafka brokers and Zookeeper nodes.
Kafka also has a powerful REST proxy for building web applications that interact with Kafka clusters. Scaling your Kafka deployment and setting up cluster monitoring with tools like Kafka’s JMX metrics, Prometheus, and Grafana can provide deeper insights into the system’s performance.
Conclusion
You now have a fully functional Kafka environment on your Ubuntu server. This streaming platform is designed to handle real-time data feeds and build powerful streaming applications. With Kafka installed, you can start developing systems capable of processing large amounts of data with ease. The journey to mastering Kafka is ongoing, so continue learning and experimenting to leverage its full potential.