Important configurations to understand when using Kafka

Bruno Joaquim
5 min readDec 7, 2021

--

Kafka is one of the best tools for asynchronous integration between systems and real-time data processing. Since its ecosystem provides a scalable, low latency and message durability, it has been heavily adopted in companies around the world.

A big mistake that I see frequently in companies trying to adopt Kafka, and a did by myself at the begging of my journey, is not understanding some crucial Kafka consumers and producers configurations, leading to some subtle problems in the future and not using the Kafka features to its full potential.

So in this post, I will summarize some important configurations related to Producers and Consumers

Kafka Brokers and Cluster

Before we talk about producers and consumers, I would like to share an important setup of brokers in a cluster to consider when deploying Kafka to a production environment.

Brokers act’s like an intermediator between consumers and producers. They are responsible for writing and reading events in form of messages. Producers write messages to a broker, consumers read them from the brokers. In order to increase the availability and fault tolerance of Kafka, a cluster must consist of a minimum of 3 brokers for a production-ready environment.

Multiple Kafka Brokers allow us to replicate our data into several different regions. So when something bad happens with our main Broker, we can redirect the traffic to other brokers without any downtime, since the same data will be spread across all brokers.

Producer configs

When producing messages to a broker, depending on the use case, we must have some guarantee that the event produced was in fact received by the Broker and a consumer will read it and process it.

The ack config specifies how many acknowledgments the producer must receive from the broker in order to consider that a message produced was successfully persisted in the broker’s log. The ack config has three possible options: none, one, all.

When set to none, the producer will send the message to the broker and that’s enough to consider that the message was successfully produced, no need to wait until the broker signs that it has received the message and written to its log. The none value for the ack config can be a good choice for use cases with high throughput since the producer doesn’t have to wait until the broker signs that it has received the message successfully.

When set to the one option, the producer will wait for a sign from the leader Broker that the event produced was in fact received and persisted to its log. In cases where the message must be delivered to the broker without affecting the overall throughput, the none option can be a good match.

Last but not least, the all option basically tells the producer to wait for the message produced to be received and successfully persisted on the broker leaders and the other broker’s log. In theory the all option should work this way, but in practice is not what happens. The all option does not specify how many in-sync replicas should have received the message. In other words, if two of the in-sync replica fail to persist the event, the leader broker, which will be always in-sync with itself, will sign the producer that the message was “successfully” persisted.

So how can we be sure that the producer will be signed only when the event produced was persisted in all brokers inside the cluster? Using the broker configuration min.insync.replicas we specify how many brokers should be in-sync with the leader. In a scenario with a cluster with 3 brokers, to guarantee that the event was written in all brokers, we should set the min.insync.replicas with a value of two. Now if we produce an event that for some reason was not persisted in all the three brokers, a NotEnoughReplicasException will be thrown and the producer will retry until the delivery.timeout.mswas reached.

The all option along with min.insync.replicas could be used on cases where the message produced should be available in all brokers inside the cluster. So if any broker fails for some reason, the message still will be consumed in the future since it’s spread all over the cluster.

Consumer Configs

One consumer config that I struggled to understand at the begging and it’s very important is the offset.reset. When some consumer group is created or a new consumer joins an existing group without any committed offset, it uses the offset.resetto determine where to start consuming the messages.

Suppose you want to create a new consumer. If you set the offset.reset config to earliest, the consumer will read and process all events ever produced and stored inside the broker’s log. This is particularly useful when you need to process the events in a new service or consume them in a new way.

On the other hand, the latest reset specifies that the consumer will begin to read from the last committed offset, so only new events produced after the time that the consumer is up will be read and processed. The latest offset reset is generally useful when you need don’t need to process old messages or you need to parallel the consumption of new messages from a topic.

Another important config is the consumer auto.commit. Every time a consumer consumes a message, it must commit the message offset to the broker in order to sign that the message was processed and the last offset read must be increased.

Theauto.commit option is enabled by default. The consumer will automatically commit the offset to the broker based on the time specified on the auto.commit.interval.ms. But be aware, the auto.commit can cause duplication of messages since something wrong can happen with the consumer before it commits the offset to the broker.

If the consumer must process the message before committing it to the broker or the same message should not, in any case, have been consumed more than once, the auto.commit should be disabled. Here we have more control over when to commit the message since in this case will need to commit the message manually. In all my experience with Kafka, this was the default scenario.

When using the synchronous API to commit, the consumer will be blocked until the commit request returns successfully and won’t be able to consume other messages until the commit is not finished, leading to a decrease in the overall throughput.

We also can commit the message using the asynchronously API. In this case, the overall throughput will be increased since the consumer won’t wait until the commit request finishes before reading more messages. The drawback of this approach is the commit order. Suppose something wrong happened when committing the offset 20, by the time the consumer identifies this fail, other messages were processed and the offset of a partition was increased. Retrying the commit of the old message could lead to message duplication.

That’s all for today, folks. I hope that this article has helped you in some way.

Take care and happy coding!

--

--