eric | May 16, 2023, 9:09 a.m.
A basic Apache Kafka test-setup with 2 servers and 3 brokers and controllers each, using KRaft. The recommended setup for production is at least 3 brokers and 3 controllers.
If you don't have Scala and Java installed, you will need to install these first. It is probably best to go for Scala 2.13.x With root:
# scala -version
bash: scala: command not found...
Download the Scala command line tool and use it to set up Scala, see also the Scala intro page:
# curl -fL https://github.com/coursier/coursier/releases/latest/download/cs-x86_64-pc-linux.gz | gzip -d > cs && chmod +x cs && ./cs setup
Use the tool to install the latest version of 2.13:
# ./cs install scala:2.13.15 scalac:2.13.15
# export PATH="$PATH:/home/{your username}/.local/share/coursier/bin"
# # scala -version
Scala code runner version 2.13.15 -- Copyright 2002-2024, LAMP/EPFL and Lightbend, Inc.
Download Kafka binaries from the Kafka downloads page.
Move the files to somewhere where SELinux will allow you to run them:
$ mv ~/Downloads/kafka_2.13-3.x.x /usr/local/bin/
First off, you need to edit the broker's properties file:
# cp /usr/local/bin/kafka/kafka_2.13-3.x.x/config/kraft/server.properties /usr/local/bin/kafka/kafka_2.13-3.x.x/config/kraft/server1.properties
# vim /usr/local/bin/kafka/kafka_2.13-3.x.x/config/kraft/server1.properties
For this example, we are using three brokers and three controllers on each server with IPv4 addresses and ports as follows:
- Broker 1: IP-address 10.0.0.20, listener on port 9092, controller on port 19092
- Broker 2: IP-address 10.0.0.20, listener on port 9093, controller on port 19093
- Broker 3: IP-address 10.0.0.20, listener on port 9094, controller on port 19094
- Broker 4: IP-address 10.0.0.22, listener on port 9092, controller on port 19092
- Broker 5: IP-address 10.0.0.22, listener on port 9093, controller on port 19093
- Broker 6: IP-address 10.0.0.22, listener on port 9094, controller on port 19094
- Broker 7: IP-address 10.0.0.24, listener on port 9092, controller on port 19092
- Broker 8: IP-address 10.0.0.24, listener on port 9093, controller on port 19093
- Broker 9: IP-address 10.0.0.24, listener on port 9094, controller on port 19094
The following must be specified in the properties file for Broker 1 (server1.properties):
node.id=1
controller.quorum.voters=1@10.0.0.20:19092,2@10.0.0.20:19093,3@10.0.0.20:19094,4@10.0.0.22:19092,5@10.0.0.22:19093,6@10.0.0.22:19094,7@10.0.0.24:19092,8@10.0.0.24:19093,9@10.0.0.24:19094
listeners=PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:19092
advertised.listeners=PLAINTEXT://10.0.0.20:9092
The following must be specified in the properties file for Broker 2 (server2.properties):
node.id=2
controller.quorum.voters=1@10.0.0.20:19092,2@10.0.0.20:19093,3@10.0.0.20:19094,4@10.0.0.22:19092,5@10.0.0.22:19093,6@10.0.0.22:19094,7@10.0.0.24:19092,8@10.0.0.24:19093,9@10.0.0.24:19094
listeners=PLAINTEXT://0.0.0.0:9093,CONTROLLER://0.0.0.0:19093
advertised.listeners=PLAINTEXT://10.0.0.22:9093
..and so forth for each node. For this to work, make sure that your firewalls are open for the ports above. For Ubuntu and the like:
# ufw allow 9092/tcp
Rinse and repeat for each port. In Rocky or Alma:
# firewall-cmd --zone=public --add=9092/tcp
Add a user that will run Kafka on both brokers:
$ sudo adduser --system kafka-user
$ sudo usermod -a -G adm kafka
$ sudo chown -R kafka /usr/local/bin/kafka
$ cd /usr/local/bin/kafka/kafka_?.??-?.?.?
$ KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
$ bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server1.properties
In addition, the kafka-user needs access to the kafka log file in the /tmp-directory:
$ touch /tmp/kafka-combined-logs
$ sudo chown kafka /tmp/kafka-combined-logs
To enable automatic restart in case of failures or system restarts, you can use a process manager like systemd (assuming you are using a Linux-based operating system). Create a systemd service file for Kafka on each server. For example, create a file named kafka.service in the /etc/systemd/system/ directory.
$ touch /etc/systemd/system/kafka.service
Add the following contents to the kafka.service file for Broker 1, modifying the paths and options according to your setup:
[Unit]
Description=Apache Kafka Server
Wants=network.target
After=network.target
[Service]
Type=simple
Restart=always
RestartSec=1
User=kafka-user
ExecStart=/usr/local/bin/kafka/kafka_2.13-3.3.2/bin/kafka-server-start.sh /usr/local/bin/kafka//kafka_2.13-3.3.2/config/kraft/server1.properties
ExecStop=/usr/local/bin/kafka/kafka_2.13-3.2.2/bin/kafka-server-stop.sh
[Install]
WantedBy=multi-user.target
Repeat the process for Broker 2.
Enable the service and start it on each broker:
$ sudo systemctl enable kafka
$ sudo systemctl start kafka
$ sudo systemctl status kafka
That's it!
It is possible to harden the service significantly. The price is more complexity, including more complex fault-finding. The service file for Broker 1 could look like this:
[Unit]
Description=Apache Kafka Server
Wants=network.target
After=network.target
[Service]
Type=simple
User=kafka-user
Restart=always
RestartSec=1
ExecStart=/usr/local/bin/kafka/kafka_2.13-3.3.2/bin/kafka-server-start.sh /usr/local/bin/kafka/kafka_2.13-3.3.2/config/kraft/server1.properties
ExecStop=/usr/local/bin/kafka/kafka_2.13-3.2.2/bin/kafka-server-stop.sh
NoNewPrivileges=true
PrivateTmp=yes
RestrictNamespaces=uts ipc pid user cgroup
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
PrivateUsers=strict
CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_DAC_READ_SEARCH
[Install]
WantedBy=multi-user.target
The hardening chiefly consists of privilege restrictions configured with User=, Group=, CapabilityBoundingSet= or the various file system namespacing options (such as PrivateDevices=, PrivateTmp=), control of privileges, private temporary directories made inaccessible to other services, and prevention of explicit kernel module loading.
Create a similar file for Broker 2. Remember to change the properties file name.
Enable the service and start it:
$ sudo systemctl enable kafka
$ sudo systemctl start kafka
$ sudo systemctl status kafka
If you introduce changes to the kafka service, you normally need to reformat the kafka storage, reload the service daemon, and restart the service:
$ bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties
$ sudo systemctl daemon-reload
$ sudo systemctl restart kafka $ sudo systemctl status kafka
If you forget to format the storage after changes to the kafka setup, you tend to get the following message in the error log:
Jul 21 16:50:41 broker02 kafka-server-start.sh[1975]: Could not rename log file '/usr/local/bin/kafka/kafka_2.13-3.3.2/bin/../logs/kafkaServer-gc.log' to '/usr/local/bin/kafka/kafka_2.13-3.3.2/bin/../logs/kafkaServer-gc.log.6' (Read-only file system).
Format the kafka storage anew as set out above.
Scala Introduction Page https://docs.scala-lang.org/getting-started/index.html#using-the-scala-installer-recommended-way
Apache Kafka® Quick Start - https://developer.confluent.io/quickstart/kafka-local/
Kafka Command-Line Interface (CLI) Tools - https://docs.confluent.io/kafka/operations-tools/kafka-tools.html
Console Producer and Consumer Basics - https://developer.confluent.io/tutorials/kafka-console-consumer-producer-basics/kafka.html
Running Kafka in Production - https://docs.confluent.io/platform/current/kafka/deployment.html
Running Apache Kafka in Production (Podcast)- https://developer.confluent.io/podcast/running-apache-kafka-in-production/
Experienced dev and PM. Data science, DataOps, Python and R. DevOps, Linux, clean code and agile. 10+ years working remotely. Polyglot. Startup experience.
LinkedIn Profile
Statistics & R - a blog about - you guessed it - statistics and the R programming language.
R-blog
Erlang Explained - a blog on the marvelllous programming language Erlang.
Erlang Explained