Kafka Producers are going to write data to topics and topics are made of partitions. Now the producers in Kafka will automatically know to which broker and partition to write based on your message and in case there is a Kafka broker failure in your cluster the producers will automatically recover from it which makes Kafka resilient and which makes Kafka so good and used today. So if we look at a diagram to have the data in our topic partitions we’re going to have a producer on the left-hand side sending data into each of the partitions of our topics.
So how does a producer know how to send the data to a topic partition? For this, we can use Message Keys.
Kafka Message Keys
So alongside the message value, we can choose to send a message key and that key can be whatever you want it could be a string, it could be a number whatever you want and it turns out that if you don’t send the key, the key is set to null then the data will be sent in a Round Robin fashion to make it very simple. So that means that your first message is going to be sent to partition 0, and then your second message to partition 1 and then partition 2, and so on. This is why it’s called Round Robin, but in case you send a key with your message, all the messages that share the same key will always go to the same partition.
So this is a very very important property of Kafka because that means if you need ordering for a specific field, for example, if you have cars and you want to get all the GPS positions in order for that particular car then you need to make sure to have your message key set as the unique identifier for your car i.e carID and so in our car GPS example that we have discussed in this article, Topics, Partitions, and Offsets in Apache Kafka, we need to choose the message key to be equal to carID so that we have all the car positions for that one specific car in order as part of the same partition.
Note: Please refer to the Topic Example that has been discussed in this article, Topics, Partitions, and Offsets in Apache Kafka, so that you can understand which example we are discussing here.
So the second example again if we have the producer sends data to 2 partitions and the key is carID then carID_123 will always go in partition 0, carID_234 as well will always go in partition 0 and carID_345 and carID_456 will always go in partition 1. The idea here again is that you will never find the carID_123 data in partition 1 because of this key property we just mentioned.
So now let’s discuss how does a Kafka message look like.
Kafka Message Anatomy
The Kafka messages are created by the producer and the first fundamental concept we discussed is the Key. The key can be null and the type of the key is binary. So binary is 0 and 1, but it can be strings and numbers and we’ll see how this happens to convert a string or a number into a binary.
Please refer to the above image. So we have the key which is a binary field that can be null and then we have the value which is the content of your message and again this can be null as well. So the Key-Value is some of the two most important things in your message but there are other things that go into your message. For example, your message can be compressed and so the compression type can be indicated as part of your message. For example, none means no compression but we have four different kinds of compressions available in Kafka that are mentioned below.
- gzip
- snappy
- lz4
- zstd
We also have optional headers for your message. So headers are pairs of key-value and you can have many of those in part of one message and it is common to set them in case you’re trying to add metadata to your messages. Once a message is sent into a Kafka Topic then it will receive a partition number and an offset id. So the partition and the offset are going to be part of the Kafka message and then finally a timestamp alongside the message will be added either by the user or by the system and then that message will be sent to Kafka. So remember that the key is a binary and the value is binary but when we start writing some messages in Kafka we’re obviously going to use some higher-level objects and so to transform these objects into binaries we’re going to use the Producer Serializer.
Producer Serializer
Serializer will indicate how to transform these objects into bytes and they will be used for the key and the value. So say for example that we have the value to be “hello world” and as a string and the key to be “123” and that’s an integer. In that case, we need to set the KeySerializer to be an IntegerSerializer and what this will do internally is that it will convert that integer into bytes, and these bytes will be part of the key which is going to be binary, and the same for the value which is “hello world” as a string. We’re going to use a StringSerializer as the ValueSerializer to convert that string into bytes and again this is going to give us our value as part of a binary field.
Here are some common serializers given below
- String (Including JSON if your data is adjacent)I
- Integer, and Float for numbers
- Avro, and Protobuf for advanced kind of data