Understanding session.timeout.ms in Kafka (through examples)

Updated: January 30, 2024 By: Guest Contributor Post a comment

Introduction

Apache Kafka has become a pivotal piece in modern data-driven applications architecture, enabling high-throughput, fault-tolerant messaging and stream processing. At the core of Kafka’s reliability is its distributed nature and the various configuration settings available to developers and administrators. One such configuration is the session.timeout.ms. In this tutorial, we’re going to delve deep into understanding the session.timeout.ms setting in Apache Kafka, demonstrating its usage and impact on consumer group behavior through a series of examples.

What is session.timeout.ms?

The session.timeout.ms setting determines the amount of time a Kafka consumer can be idly connected to a broker before being considered dead and its partitions are reassigned to other consumers in the group. The default session timeout value typically ranges from 10 to 30 seconds, depending on the version of Kafka. This setting helps Kafka maintain stable consumer groups and manage partition rebalancing in the face of network issues or consumer process failures.

Understanding Consumer Groups

Before diving into session timeout nuances, it’s important to understand Kafka consumer groups. A consumer group contains one or more consumers that subscribe to a common set of topics and partitions the data among its members. Each partition is consumed by at most one consumer in the group, which effectively load balances the data processing. If a consumer fails or is deemed inactive (controlled by session.timeout.ms), Kafka triggers rebalancing — redistributing partitions among the active consumers in the group.

Basic Example: Setting session.timeout.ms

Properties props = new Properties();
props.put("bootstrap.servers", "broker1:9092,broker2:9092");
props.put("group.id", "my-consumer-group");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("session.timeout.ms", 15000); //<-- setting session timeout

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("my-topic"));

In this basic example, we’ve created a consumer with a session timeout set to 15 seconds. If the consumer doesn’t send a heartbeat within this interval, it will be considered lost and trigger partition rebalancing.

Heartbeats and session.timeout.ms

Kafka consumers send heartbeats to brokers to signal they’re alive. However, simply configuring session.timeout.ms isn’t enough. The heart beating frequency is controlled by another parameter, heartbeat.interval.ms. To avoid unnecessary rebalances, the heartbeat interval should be set to a value that’s significantly lesser than the session.timeout.ms. Kafka recommends a third of the session timeout.

Intermediate Example: Managing Heartbeats

Properties props = new Properties();
// ...other properties as above...
props.put("session.timeout.ms", 30000);
props.put("heartbeat.interval.ms", 10000); //<-- recommended as a third of the session timeout

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
// ...consumer setup and usage...

With the above configuration, the consumer sends a heartbeat every 10 seconds, anticipating to prevent session timeouts and undesired rebalances in case of minor hiccups.

Advanced Example: Implementing a Robust Consumer

A robust Kafka consumer also manages its polling loop adequately to ensure it doesn’t go beyond the session timeout without sending a heartbeat or committing offsets.

Properties props = new Properties();
// ..properties configuring including session timeout and heartbeat...

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
try {
    while (true) {
        ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
        for (ConsumerRecord<String, String> record : records) {
            processRecord(record); // Placeholder for record processing
        }
        consumer.commitAsync();
    }
} finally {
    consumer.close();
}

This piece of code showcases a consumer that polls messages every second. Noticably, poll duration plus potential processing time should be less than the session.timeout.ms value to avoid unwanted rebalances.

Possible Pitfalls and Gotchas

Misconfiguration of the session.timeout.ms can lead to frequent and unnecessary rebalances which can negatively affect system throughput and latency. One should ensure that session.timeout.ms is not too low, as this could cause rebalances due to minor network or processing delays. On the flip side, setting this value too high may cause delayed reassignment of partitions in real consumer failure scenarios.

Example in the Event of Failure

Let’s understand the impact of session.timeout.ms during a simulated consumer failure:

Properties props = new Properties();
// ...other properties...

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("my-topic"));
try {
    while (true) {
        ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
        for (ConsumerRecord<String, String> record : records) {
            processRecord(record);
        }
        simulateFailure(); // This causes the consumer to stop polling
        consumer.commitAsync();
    }
} catch (WakeupException e) {
    // Handle the exception
} finally {
    consumer.close();
}

If simulateFailure() prevents the consumer from sending heartbeats or polling in due time, it will breach the session timeout threshold, resulting in rebalancing after the configured session timeout period.

Conclusion

In this tutorial, we have covered the significance and necessity for correctly configuring session.timeout.ms in Kafka consumers—a key to building resilient streaming applications. Understanding and correctly applying this configuration mitigates the risk of premature rebalancing due to transient issues while also ensuring the system promptly responds to actual consumer failures.

As with any sophisticated distributed system component, the best approach is careful calibrating, diligent monitoring, and being prepared to adjust as you learn more about your specific use-case requirements.