eric | May 16, 2023, 9:09 a.m.
A basic Apache Kafka test-setup with 2 servers using KRaft. The recommended setup for production is at least 3 brokers and 3 controllers. The procedure is the same as below. Just add one more broker.
Download Kafka from the Kafka downloads page.
Move the files to somewhere where SELinux will allow you to run them:
$ mv ~/Downloads/kafka /usr/local/bin/
First off, you need to edit the broker's properties file:
$ vim /usr/local/bin/kafka/kafka_2.13-3.3.2/config/kraft/server1.properties
For this example, we are using two brokers on two different servers with given IPv4 and ports:
- Broker 1: IP-address 10.0.0.20, listener on port 9092, controller on port 19092
- Broker 2: IP-address 10.0.0.22, listener on port 9093, controller on port 19093
The following must be specified in the properties file for Broker 1 (server1.properties):
node.id=1
controller.quorum.voters=1@10.0.0.20:19092,2@10.0.0.22:19093
listeners=PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:19092
advertised.listeners=PLAINTEXT://10.0.0.20:9092
The following must be specified in the properties file for Broker 2 (server2.properties):
node.id=2
controller.quorum.voters=1@10.0.0.20:19092,2@10.0.0.22:19093
listeners=PLAINTEXT://0.0.0.0:9093,CONTROLLER://0.0.0.0:19093
advertised.listeners=PLAINTEXT://10.0.0.22:9093
For this to work, make sure that your firewalls are open for the ports above. For Ubuntu and the like:
$ sudo ufw allow 9093
etc...
Add a user that will run Kafka on both brokers:
$ sudo adduser --system kafka-user
$ sudo usermod -a -G adm kafka
$ sudo chown -R kafka /usr/local/bin/kafka
$ cd /usr/local/bin/kafka/kafka_?.??-?.?.?
$ KAFKA_CLUSTER_ID="$(bin/kafka-storage.sh random-uuid)"
$ bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server1.properties
In addition, the kafka-user needs access to the kafka log file in the /tmp-directory:
$ touch /tmp/kafka-combined-logs
$ sudo chown kafka /tmp/kafka-combined-logs
To enable automatic restart in case of failures or system restarts, you can use a process manager like systemd (assuming you are using a Linux-based operating system). Create a systemd service file for Kafka on each server. For example, create a file named kafka.service in the /etc/systemd/system/ directory.
$ touch /etc/systemd/system/kafka.service
Add the following contents to the kafka.service file for Broker 1, modifying the paths and options according to your setup:
[Unit]
Description=Apache Kafka Server
Wants=network.target
After=network.target
[Service]
Type=simple
Restart=always
RestartSec=1
User=kafka-user
ExecStart=/usr/local/bin/kafka/kafka_2.13-3.3.2/bin/kafka-server-start.sh /usr/local/bin/kafka//kafka_2.13-3.3.2/config/kraft/server1.properties
ExecStop=/usr/local/bin/kafka/kafka_2.13-3.2.2/bin/kafka-server-stop.sh
[Install]
WantedBy=multi-user.target
Repeat the process for Broker 2.
Enable the service and start it on each broker:
$ sudo systemctl enable kafka
$ sudo systemctl start kafka
$ sudo systemctl status kafka
That's it!
It is possible to harden the service significantly. The price is more complexity, including more complex fault-finding. The service file for Broker 1 could look like this:
[Unit]
Description=Apache Kafka Server
Wants=network.target
After=network.target
[Service]
Type=simple
User=kafka-user
Restart=always
RestartSec=1
ExecStart=/usr/local/bin/kafka/kafka_2.13-3.3.2/bin/kafka-server-start.sh /usr/local/bin/kafka/kafka_2.13-3.3.2/config/kraft/server1.properties
ExecStop=/usr/local/bin/kafka/kafka_2.13-3.2.2/bin/kafka-server-stop.sh
NoNewPrivileges=true
PrivateTmp=yes
RestrictNamespaces=uts ipc pid user cgroup
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
PrivateUsers=strict
CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_DAC_READ_SEARCH
[Install]
WantedBy=multi-user.target
The hardening chiefly consists of privilege restrictions configured with User=, Group=, CapabilityBoundingSet= or the various file system namespacing options (such as PrivateDevices=, PrivateTmp=), control of privileges, private temporary directories made inaccessible to other services, and prevention of explicit kernel module loading.
Create a similar file for Broker 2. Remember to change the properties file name.
Enable the service and start it:
$ sudo systemctl enable kafka
$ sudo systemctl start kafka
$ sudo systemctl status kafka
If you introduce changes to the kafka service, you normally need to reformat the kafka storage, reload the service daemon, and restart the service:
$ bin/kafka-storage.sh format -t $KAFKA_CLUSTER_ID -c config/kraft/server.properties
$ sudo systemctl daemon-reload
$ sudo systemctl restart kafka $ sudo systemctl status kafka
If you forget to format the storage after changes to the kafka setup, you tend to get the following message in the error log:
Jul 21 16:50:41 broker02 kafka-server-start.sh[1975]: Could not rename log file '/usr/local/bin/kafka/kafka_2.13-3.3.2/bin/../logs/kafkaServer-gc.log' to '/usr/local/bin/kafka/kafka_2.13-3.3.2/bin/../logs/kafkaServer-gc.log.6' (Read-only file system).
Format the kafka storage anew as set out above.
Apache Kafka® Quick Start - https://developer.confluent.io/quickstart/kafka-local/
Kafka Command-Line Interface (CLI) Tools - https://docs.confluent.io/kafka/operations-tools/kafka-tools.html
Console Producer and Consumer Basics - https://developer.confluent.io/tutorials/kafka-console-consumer-producer-basics/kafka.html
Running Kafka in Production - https://docs.confluent.io/platform/current/kafka/deployment.html
Running Apache Kafka in Production (Podcast)- https://developer.confluent.io/podcast/running-apache-kafka-in-production/
Experienced dev and PM. Data science, DataOps, Python and R. DevOps, Linux, clean code and agile. 10+ years working remotely. Polyglot. Startup experience.
LinkedIn Profile
Statistics & R - a blog about - you guessed it - statistics and the R programming language.
R-blog
Erlang Explained - a blog on the marvelllous programming language Erlang.
Erlang Explained