Tag: kafka

Unable to Use JMX Remotely for Kafka Stats

I noticed, today, that our Kafka Manager interface only shows details from one server — the one where we run Kafka Manager. We’ve done everything that we need to do in order to get this working — the port shows as open with nmap, the command to run Kafka includes all of the settings. I’ve even tried setting the JMX hostname, but still there is just one server reporting data

Then I happened across an article online that detailed how JMX actually uses three ports — the configured port 9999 and two other randomly selected and non-configurable ports. I used netstat to list all of the ports in use by the Java PID running my Kafka server and, voila, there were two odd-ball high ports (30000’s and 40000’s). I added those additional ports to the firewall rules and … I’ve got data for all of the Kafka servers!

This is obviously a short-term solution as the two randomly selected ports will be different when I restart the service next time. I’d prefer to leave the firewall in place (i.e. not just open all ports >1024 between the Kafka Manager host and all of the Kafka servers) so might put together a script to identify the “oddball” ports associated to the Java pid and add them to transient firewalld rules. But the last server restart was back in 2021 … so I might just manually add them after the upgrade next week and worry about something ‘better’ next year!

Upgrading Kafka from 2.5.0 to 3.2.3

Bidirectional backwards compatibility was introduced in 2017 – which means my experience where you needed to upgrade the broker first and then the clients is no longer true. Rejoice!

Sandbox Setup

Two CentOS docker containers were provisioned as follows:

docker run -dit --name=kafka1 -p 9092:9092 centos:latest
docker run -dit --name=kafka2 -p 9093:9092 -p9000:9000 centos:latest

# Shell into each container and do the following:

sed -i -e "s|mirrorlist=|#mirrorlist=|g" /etc/yum.repos.d/CentOS-*
sed -i -e "s|#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g" /etc/yum.repos.d/CentOS-*

# Get Ips and hosts into /etc/hosts

172.17.0.2 40c2222cfea0
172.17.0.3 2923addbcb6d

# Update installed packages & install required tools

dnf update
yum install -y passwd vim net-tools wget git unzip
# Add a kafka user, make a kafka folder, and give the kafka user ownership of the kafka folder
useradd kafka
passwd kafka
usermod -aG wheel kafka

mkdir /kafka

chown kafka:kafka /kafka

# Install Kafka

su – kafka
cd /kafka
wget https://archive.apache.org/dist/kafka/2.5.0/kafka_2.12-2.5.0.tgz
tar vxzf kafka_2.12-2.5.0.tgz
rm kafka_2.12-2.5.0.tgz
ln -s /kafka/kafka_2.12-2.5.0 /kafka/kafka

# Configure zookeeper

vi /kafka/kafka/config/zookeeper.properties
dataDir=/kafka/zookeeperdata
server.1=172.17.0.2:2888:3888

# Start Zookeeper on the first server

screen -S zookeeper
/kafka/kafka/bin/zookeeper-server-start.sh /kafka/kafka/config/zookeeper.properties

# Configure the cluster

vi /kafka/kafka/config/server.properties

broker.id=1 # unique number per cluster node
listeners=PLAINTEXT://:9092
zookeeper.connect=172.17.0.2:2181

# Start Kafka

screen -S kafka
/kafka/kafka/bin/kafka-server-start.sh /kafka/kafka/config/server.properties

# Edit producer.properties on a server

vi /kafka/kafka/config/producer.properties
bootstrap.servers=172.17.0.2:9092,172.17.0.3:9092

# Create test topic

/kafka/kafka/bin/kafka-topics.sh --create --zookeeper 172.17.0.2:2181 --replication-factor 2 --partitions 1 --topic ljrTest

# Post messages to the topic

/kafka/kafka/bin/kafka-console-producer.sh --broker-list 172.17.0.2:9092 --producer.config /kafka/kafka/config/producer.properties --topic ljrTest

# Retrieve messages from topic

/kafka/kafka/bin/kafka-console-consumer.sh --bootstrap-server 172.17.0.2:9092 --topic ljrTest --from-beginning
/kafka/kafka/bin/kafka-console-consumer.sh --bootstrap-server 172.17.0.3:9092 --topic ljrTest --from-beginning

Voila, a functional Kafka sandbox cluster.

Now we’ll install the cluster manager

cd /kafka
git clone --depth 1 --branch 3.0.0.6 https://github.com/yahoo/CMAK.git
cd CMAK
vi conf/application.conf
cmak.zkhosts="40c2222cfea0:2181"

# CMAK requires java > 1.8 … so getting 11 set up
cd /usr/lib/jvm
wget https://cdn.azul.com/zulu/bin/zulu11.58.23-ca-jdk11.0.16.1-linux_x64.zip
unzip zulu11.58.23-ca-jdk11.0.16.1-linux_x64.zip
mv zulu11.58.23-ca-jdk11.0.16.1-linux_x64 zulu-11
PATH=/usr/lib/jvm/zulu-11/bin:$PATH

./sbt -java-home /usr/lib/jvm/zulu-11 clean dist

cp /kafka/CMAK/target/universal/cmak-3.0.0.6.zip /kafka

cd /kafka
unzip cmak-3.0.0.6.zip
cd cmak-3.0.0.6
screen -S CMAK
bin/cmak -java-home /usr/lib/jvm/zulu-11 -Dconfig.file=/kafka/cmak-3.0.0.6/conf/application.conf -Dhttp.port=9000

Access it at http://cmak_host:9000

Sandbox Upgrade Process

# Back up the Kafka installation (excluding log files)

tar cvfzp /kafka/kafka-2.5.0.tar.gz --exclude logs /kafka/ws_npm_kafka/kafka_2.12-2.5.0

# Get newest Kafka version installed
# From another host where you can download the file, transfer it to the kafka server

scp kafka_2.12-3.2.3.tgz list@kafka1:/tmp/

# Back on the Kafka server — copy the tgz file into the Kafka directory

mv /tmp/kafka_2.12-3.2.3.tgz /kafka/kafka

# Verify Kafka data is stored outside of the install directory:

[kafka@40c2222cfea0 config]$ grep log.dir server.properties
log.dirs=/tmp/kafka-logs

# Verify zookeeper data is stored outside of the install directory:

[kafka@40c2222cfea0 config]$ grep dataDir zookeeper.properties
dataDir=/kafka/zookeeperdata

# Get the new version of Kafka – start with the zookeeper(s) then do the other nodes

cd /kafka
wget https://downloads.apache.org/kafka/3.2.3/kafka_2.12-3.2.3.tgz
tar vxfz /kafka/kafka_2.12-3.2.3.tgz

# Copy config from old iteration to new

cp /kafka/kafka_2.12-2.5.0/config/* /kafka/kafka_2.12-3.2.3/config/

# Edit server.properties and add a configuration line to force the inter-broker protocol version to the currently running Kafka version
# This ensures your cluster is using the “old” version to communicate and you can, if needed, revert to the previous version

vi /kafka/kafka/config/server.properties
inter.broker.protocol.version=2.5.0

# Restart each Kafka server – waiting until it has come online before restarting the next one – with the new binaries
# Stop kafka

systemctl stop kafka

# Move symlink to new folder

unlink /kafka/kafka
ln -s /kafka/kafka_2.12-3.2.3 /kafka/kafka

# start kafka

systemctl start kafka

# Or, to watch it run,

/kafka/kafka/bin/kafka-server-start.sh /kafka/kafka/config/server.properties

# Finally, ensure you’ve still got ‘stuff’

/kafka/kafka/bin/kafka-console-consumer.sh --bootstrap-server 172.17.0.3:9092 --topic ljrTest --from-beginning

# And verify the version has updated

[kafka@40c2222cfea0 bin]$ ./kafka-topics.sh --version
3.2.3 (Commit:50029d3ed8ba576f)

# Until this point, we can just roll back to the old folder & revert to the previous version of Kafka … that’s out backout plan.

# Once everything has been confirmed to be working, bump the inter-broker protocol version to the new version & restart Kafka

vi /kafka/kafka/config/server.properties
inter.broker.protocol.version=3.2

Kafka Troubleshooting (for those who enjoy reading network traces)

I finally had a revelation that allowed me to definitively prove that I am not doing anything strange that is causing duplicated messages to appear in the Kafka stream — it’s a clear text protocol! That means you can use Wireshark, tcpdump, etc to capture everything that goes over the wire. This shows that the GUID I generated for the duplicated message only appears one time in the network trace. Whatever funky stuff is going on that makes the client see it twice? Not me 😊

I used tcpdump because the batch server doesn’t have tshark (and it’s not my server, so I’m not going to go requesting additional binaries if there’s something sufficient for my need already available). Ran tcpdump -w /srv/data/ljr.cap port 9092 to grab everything that transits port 9092 while my script executed. Once the batch completed, I stopped tcpdump and transferred the file over to my workstation to view the capture in Wireshark. Searched the packet bytes for my duplicated GUID … and there’s only one.

Confluent Kafka Queue Length

The documentation for the Python Confluent Kafka module includes a len function on the producer. I wanted to use the function because we’re getting a number of duplicated messages on the client, and I was trying to isolate what might be causing the problem. Unfortunately, calling producer.len() failed indicating there’s no len() method. I used dir(producer) to show that, no, there isn’t a len() method.

I realized today that the documentation is telling me that I can call the built-in len() function on a producer to get the queue length.

Code:

print(f"Before produce there are {len(producer)} messages awaiting delivery")
producer.produce(topic, key=bytes(str(int(cs.timestamp) ), 'utf8'), value=cs.SerializeToString() )
print(f"After produce there are {len(producer)} messages awaiting delivery")
producer.poll(0) # Per https://github.com/confluentinc/confluent-kafka-python/issues/16 for queue full error
print(f"After poll0 there are {len(producer)} messages awaiting delivery")

Output:

Before produce there are 160 messages awaiting delivery
After produce there are 161 messages awaiting delivery
After poll0 there are 155 messages awaiting delivery

Troubleshooting Kafka

Our server metrics are fed into a Kafka bus, and various applications are able to pick up and process this data. Problem is, however, that everything I’m sending doesn’t end up in the downstream system. The conflunce_kafka module I’m using in python reports that data is send along it’s merry way, but the primary system that is used to present metrics to end users says they’re not consistently getting data across the channel. Not never like there’s something outright wrong, but long periods of time where there’s no data followed by a cycle where data shows up.

I’ve exhausted all of the in-script debugging I can — the messages are getting there. But I wondered if the async nature of Kafka might mean that the client’s “it got there” wouldn’t actually mean something arrived. So I had to figure out how to test a Kafka server the same way I test my MQTT server — how do I use a quick command line program to send a message and how do I use a quick command line program to subscribe to various topics.

Turns out this is easier than anticipated — the binary build of Kafka includes windows batch files. Download the latest Kafka binary. Untar/unzip it somewhere. This is easy if you have the Win32 port of the GNU utilities and can just run “tar vxfz kafka_2.13-2.8.0.tgz”.

In the .\kafka<version>\bin\windows folder, there are kafka-console-consumer.bat and kafka-console-producer.bat files that can be used for testing Kafka. You can open two command prompts — one for the producer (sending data to Kafka) and one for the consumer (watching Kafka for new messages). In the consumer window, run

kafka-console-consumer.bat –bootstrap-server yourkafkaserver.example.com:Port –topic Test

Then, in the producer, run

kafka-console-producer.bat –broker-list yourkafkaserver.example.com:Port –topic Test

The producer will bring you to a “>” prompt where you can type some strings and hit enter to send the message to Kafka. You should see the messages pop into the consumer window.

To subscribe to multiple topics, use “–whitelist” followed by a pipe-bar delimited list of topics.