Category: Java

Kafka Streams, Consumer Groups, and Stickiness

The Java application I recently inherited had a lot of … quirks. One of the strangest was that it calculated throughput statistics based on ‘start’ values in a cache that was only refreshed every four hours. So at a minute past the data refresh, the throughput is averaged out over that minute. At three hours and fifty nine minutes past the data refresh, the throughput is averaged out over three hours and fifty nine minutes. In the process of correcting this (reading directly from the cached data rather than using an in-memory copy of the cached data), I noticed that the running application paused a lot as the Kafka group was re-balanced.

Which is especially odd because I’ve got a stable number of clients in each consumer group. But pods restart occasionally, and there was nothing done to attempt to stabilize partition assignment.

Which was odd because Kafka has had mechanisms to reduce re-balancing — StickyAssignor added in 0.11

        // Set the partition assignment strategy to StickyAssignor
        config.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG, "org.apache.kafka.clients.consumer.StickyAssignor");

And groupInstanceId in 2.3.0

        // Set the group instance ID
        String groupInstanceId = UUID.randomUUID().toString();
        config.put(ConsumerConfig.GROUP_INSTANCE_ID_CONFIG, groupInstanceId);

Now, I’m certain that a UUID isn’t the best way to go about crafting your group instance ID name … but it produces a “name” that isn’t likely to be duplicated. Since deploying this change, I went from seeing three or four re-balance operations an hour to zero.

JPA/Hibernate Naming Strategies

One of the challenges of inheriting support of systems and code is reverse engineering what exactly you’ve got. In this case, I have Java code that reads from a Postgresql table named calculation_config & populates the information into a Redis cache. Except I could not find any text containing the string calculation_config. Started to wonder if grep was getting thrown off by line splits (although splitting a line in the middle of a table name is asking for future confusion), so was searching for sub-strings.

Which got me to the code that performs the operation — but the table is absolutely named calculationConfig in the code. ?????

package com.example.applicationmodel;
import lombok.Data;

import jakarta.persistence.*;

@Entity // This tells Hibernate to make a table out of this class
@Data // Lombok: adds getters and setters
@Table(name = "calculationConfig", schema = "components")
public class CalculationInfo {
    @Id
    private int functionId;
    private String dataCollectionGroup;
    private String component;
    private String metricInputs;
    private String metricName;
    private String functionDef;
    private String resourceType;
    private String metricDatatype;
    private String deviceModel;
    private String collectionSystem;
    private int status;
}

And today, I’ve learned about “naming strategies”. A mechanism used by the Hibernate ORM (Object-Relational Mapping) framework to map entities within Java code to table and column names. Other than obfuscation, why are we applying middleware principals to code?? Ostensibly because database naming “best practices” and code naming “best practices” vary. As an aside, I was taught the best naming best practice was one someone was likely to figure out with minimal confusion or research. Explicitly indicating the naming strategy might fit that requirement — ohh, here’s some strange name mapping thing in my code. Let me see what that means.

By default, Hibernate uses ImplicitNamingStrategy and PhysicalNamingStrategy to map Java names to database names. The default PhysicalNamingStrategyStandardImpl converts camelCase to snake_case.

So, for future reference … when I find table_name or field_name in my database, I should be grepping for tableName and fieldName in the code. That is … not super obvious.

Manually Running a JAR File

The java code I now maintain is normally executed through a k8s cluster — this means just testing a quick change requires running the entire deployment pipeline. Sometimes, though, I really just want to test something quickly. In such instances, you can manually run a jar file using “java -jar my_file.jar” —