Apache Kafka Producers are going to write data to topics and topics are made of partitions. Now the producers in Kafka will automatically know to which broker and partition to write based on your message and in case there is a Kafka broker failure in your cluster the producers will automatically recover from it which makes Kafka resilient and which makes Kafka so good and used today. So if we look at a diagram to have the data in our topic partitions we’re going to have a producer on the left-hand side sending data into each of the partitions of our topics. Read more on Apache Kafka Producer here: Apache Kafka Producer.
Prerequisites:
- Apache Kafka – Producer Acknowledgement
- Apache Kafka – Producer Retry and max.in.flight.requests.per.connection
So in this article, we are going to discuss a new concept in Kafka which is Idempotent Producer.
What’s the Problem in Kafka Version < 0.11?
So the problem is when the Producer sends messages to Kafka, you can introduce duplicate messages due to network errors. And so here’s how it can happen. Please refer to the below image.
We have the Producer and we have Kafka and the Producer places a good request. And so it produces data to Kafka. Kafka says yep, we got the data, we are going to commit it, and then sends back acknowledgments. That’s what we called a good request. But then sometimes we get a duplicate request. And so the Producer sends a produce request to Kafka, Kafka says, yep, alright, we’ve got the data. we are going to commit it in our log and send back an acknowledgment but the acknowledgment never reaches the Producer because there is a network error. So the Producer will retry because a retry is better than zero. So it retries the produce and now there is a commit duplicate because Kafka sees the message again with a produce request. So, it commits a second time. And this time the acknowledgment goes back to the Producer. So from a Producer perspective, it only sent the data once because it only got one acknowledgment back. But from a Kafka perspective, it got the data twice. And so it did commit the data twice and that created a duplicate. So how to solve this problem? With the help of an Idempotent Producer we have solved this problem but how?
Idempotent Producer
Basically, if you have a Kafka version over 0.11, you can define an “Idempotent Producer”. And so here’s what happens. Please refer to the below image.
On the good request, same stuff. Produce, commit, ack. But now, when you have an Idempotent request, we have a produce, and Kafka commits the data. And the acknowledgment never reaches the producer. Now the producer retries again, but when it retries, it also has a produce request ID. And that’s a new thing from 0.11. And using that request produce ID, the Kafka broker is able to detect that this is a duplicate request. And so Kafka is smart and it is not going to commit the same produce request twice, but this time it will send back an acknowledgment saying, “Yes, we got it once already”. And so from a producer perspective, well, it was sent once and received once. And from a Kafka perspective, there was some de-duplication that happened at the produce request level and the thing has been committed for you. So, it’s not something that you have to implement. It’s a mechanism. And so, basically, Idempotent Producers are great and they’re not a big overhead at all and if you want a stable and safe pipeline, use them.
What Does an Idempotent Producer Come With?
It comes with
retries = Integer.MAX_VALUE
which is a very, very high number. So that means that your producer will basically retry indefinitely. It also comes with
max.inflight.requests = 1 (if you use Kafka over 0.11 or less than 1.1)
Or
max.inflight.requests = 5 (if Kafka is greater than 1.1)
It also comes with
acks=all
So that we ensure we don’t lose data. And so to just get all these things, we have to set
producerProperties.put("enable.idempotence", true)
and that’s it. So Idempotent Producer is very very useful in Apache Kafka.