Applications that need to read data from Kafka use a KafkaConumer to subscribe to Kafka topics and receive messages from these topics.

Kafka Consumer Concepts

To understand how to read data from Kafka, we first need to understand consumers and consumer groups.

Consumers and Consumer Groups

Suppose you have an application that needs to read messages from a Kafka topic. The application will create a consumer object, subscribe to the appropriate topic, start receiving messages, and use them. However, if the rate at which producers write messages to the topic exceeds the rate at which your application can validate them, your application may fall further and further behind. This becomes especially problematic if you only have one consumer. Just like multiple producers can write to the same topic, we need to allow multiple consumers to read from the same topic, splitting the data among them.

Kafka consumers are typically part of a consumer group. When multiple consumers are subscribed to a topic and belong to the same consumer group, each consumer in the group will receive messages from a different subset of the partitions in the topic.

The main way data consumption is scaled in Kafka topics is to add more consumers to a consumer group. It is common for Kafka consumers to do high-latency operations, and therefore a single consumer can’t possibly keep up with the rate data flows into a topic. But keep in mind that there’s no point in adding more consumers than you have partitions in a topic.

In addition to adding consumers in order to scale a single application, it’s common to have multiple applications that need to read data from the same topic. We want each application to get all of the messages, rather than just a subset. To do so, we need to ensure that each application has its own consumer group. Kafka would guarantee that the topic’s message is send to each consumer group, but within the group, each partition would be handled by one consumer.

To summarize:

Consumer Groups and Partition Rebalance

Consumers in a consumer group share ownership of the partitions in the topics they subscribed to. When a new consumer is added to the group, it starts consuming messages from partitions previously consumer by another consumer. When a consumer leaves the group, and the partitions it used to consumer will be consumed by one of the remaining consumers. Reassignment of partitions to consumers also happens when the topics the consumer group is consuming are modified(i.e. when a new partition is added by administrator)

Moving partition ownership from one consumer to another is called a rebalance. Rebalance can provide the consumer group with high availability and scalability, but in the normal course of events, they can be undesirable.

There are 2 types of rebalances, depending on the partition assignment strategy that the consumer group uses:

Consumers maintain membership in a consumer group and ownership of the partitions assigned to them by sending heartbeats to a Kafka broker designated as the group coordinator. As long as the consumer is sending heartbeats at regular intervals, it is assumed to be alive.

Static Group Membership

By default, the identity of a consumer as a member of its consumer group is transient. When consumers leave a consumer group, the partitions that were assigned to the consumer are revoked, and when it rejoins, it is assigned a new member ID and a new set of partitions through the rebalance protocol.