Tag: apache kafka

Zookeeper: Finding the Leader

When restarting our ensemble of zookeepers, I restart the leader last (to avoid repeatedly reallocating the role). Which means I’ve got to find the leader. Luckily the zookeepers are happy to report if they are the leader or a follower if you send ‘srvr’ to the zookeeper port.

jumpserver:~ # echo srvr | nc zcserver38.example.net 2181
Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:53 GMT
Latency min/avg/max: 0/0/1383
Received: 3783871
Sent: 3784761
Connections: 7
Outstanding: 0
Zxid: 0x800003d25
Mode: follower
Node count: 3715

Looking at the “Mode” line above, I can see that’s the follower. So I’ll check the next Zookeeper …

jumpserver:~ # echo srvr | nc zcserver39.example.net 2181
Zookeeper version: 3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:53 GMT
Latency min/avg/max: 0/0/1167
Received: 836866
Sent: 848235
Connections: 1
Outstanding: 0
Zxid: 0x800003d25
Mode: leader
Node count: 3715
Proposal sizes last/min/max: 36/32/19782

And that’s the leader — so 39 will be the last one rebooted.

Kafka Troubleshooting (for those who enjoy reading network traces)

I finally had a revelation that allowed me to definitively prove that I am not doing anything strange that is causing duplicated messages to appear in the Kafka stream — it’s a clear text protocol! That means you can use Wireshark, tcpdump, etc to capture everything that goes over the wire. This shows that the GUID I generated for the duplicated message only appears one time in the network trace. Whatever funky stuff is going on that makes the client see it twice? Not me 😊

I used tcpdump because the batch server doesn’t have tshark (and it’s not my server, so I’m not going to go requesting additional binaries if there’s something sufficient for my need already available). Ran tcpdump -w /srv/data/ljr.cap port 9092 to grab everything that transits port 9092 while my script executed. Once the batch completed, I stopped tcpdump and transferred the file over to my workstation to view the capture in Wireshark. Searched the packet bytes for my duplicated GUID … and there’s only one.

Confluent Kafka Queue Length

The documentation for the Python Confluent Kafka module includes a len function on the producer. I wanted to use the function because we’re getting a number of duplicated messages on the client, and I was trying to isolate what might be causing the problem. Unfortunately, calling producer.len() failed indicating there’s no len() method. I used dir(producer) to show that, no, there isn’t a len() method.

I realized today that the documentation is telling me that I can call the built-in len() function on a producer to get the queue length.

Code:

print(f"Before produce there are {len(producer)} messages awaiting delivery")
producer.produce(topic, key=bytes(str(int(cs.timestamp) ), 'utf8'), value=cs.SerializeToString() )
print(f"After produce there are {len(producer)} messages awaiting delivery")
producer.poll(0) # Per https://github.com/confluentinc/confluent-kafka-python/issues/16 for queue full error
print(f"After poll0 there are {len(producer)} messages awaiting delivery")

Output:

Before produce there are 160 messages awaiting delivery
After produce there are 161 messages awaiting delivery
After poll0 there are 155 messages awaiting delivery

Troubleshooting Kafka

Our server metrics are fed into a Kafka bus, and various applications are able to pick up and process this data. Problem is, however, that everything I’m sending doesn’t end up in the downstream system. The conflunce_kafka module I’m using in python reports that data is send along it’s merry way, but the primary system that is used to present metrics to end users says they’re not consistently getting data across the channel. Not never like there’s something outright wrong, but long periods of time where there’s no data followed by a cycle where data shows up.

I’ve exhausted all of the in-script debugging I can — the messages are getting there. But I wondered if the async nature of Kafka might mean that the client’s “it got there” wouldn’t actually mean something arrived. So I had to figure out how to test a Kafka server the same way I test my MQTT server — how do I use a quick command line program to send a message and how do I use a quick command line program to subscribe to various topics.

Turns out this is easier than anticipated — the binary build of Kafka includes windows batch files. Download the latest Kafka binary. Untar/unzip it somewhere. This is easy if you have the Win32 port of the GNU utilities and can just run “tar vxfz kafka_2.13-2.8.0.tgz”.

In the .\kafka<version>\bin\windows folder, there are kafka-console-consumer.bat and kafka-console-producer.bat files that can be used for testing Kafka. You can open two command prompts — one for the producer (sending data to Kafka) and one for the consumer (watching Kafka for new messages). In the consumer window, run

kafka-console-consumer.bat –bootstrap-server yourkafkaserver.example.com:Port –topic Test

Then, in the producer, run

kafka-console-producer.bat –broker-list yourkafkaserver.example.com:Port –topic Test

The producer will bring you to a “>” prompt where you can type some strings and hit enter to send the message to Kafka. You should see the messages pop into the consumer window.

To subscribe to multiple topics, use “–whitelist” followed by a pipe-bar delimited list of topics.