We know Batching of messages helps us to make applications efficient. But, sometimes we have a message with enough size which is not efficient for communication among applications. There comes a solution of dividing a message into multiple chunks/parts so that it can be utilized as per the required acceptance criteria. The solution is widely used with Apache Kafka to divide messages (>1MB) into multiple parts as it has a threshold value of 1 MB.
Approach
Large messages can be split into multiple small messages in form of a Message Envelope(which contains chunk data and metadata). The generateChunks method takes the input message and chunkSize. Based on this it calculates the total no of chunks it has created. Then, with the help of while loop we process the payload and copy the message(as per chunk size) to the Message Chunk Envelope and add it to the list. Later, once all chunks are created and parsed to Message Envelope, they can be retrieved and printed to the console.
Java
import java.util.ArrayList; import java.util.List; import java.util.UUID; public class GFG { public static void main(String[] args) { // Input Message String message = "The world is beautiful and so are we." ; // size of chunk in bytes int chunkSize = 10 ; List<MessageChunk> messageChunkList = generateChunks(message.getBytes(), chunkSize); for (MessageChunk chunk : messageChunkList) { System.out.println( "Chunk-" + (chunk.getIndex() + 1 ) + ":" + chunk.toString()); } } public static List<MessageChunk> generateChunks( byte [] payload, int chunkSize) { UUID uuid = UUID.randomUUID(); long ts = System.currentTimeMillis(); final int payloadSize = payload.length; System.out.println( "Payload size:" + payload.length + " bytes" ); // Checking the total no of chunks // based on chunksize and message size final int chunkCount = (payloadSize / chunkSize) + ( 0 == (payloadSize % chunkSize) ? 0 : 1 ); System.out.println( "Chunk Count: " + chunkCount); int start = 0 ; int index = 0 ; int i = 0 ; List<MessageChunk> subArray = new ArrayList<>(); while (start < payloadSize) { ++i; final int end = (start + chunkSize) < payloadSize ? start + chunkSize : payloadSize; final int size = end - start; final byte [] chunk = new byte [size]; // copying data from payload to message chunk System.arraycopy(payload, start, chunk, 0 , size); MessageChunk msc = new MessageChunk(uuid.toString(), ts, chunkCount, index, chunk); subArray.add(msc); System.out.println( "The value of " + i + "th message chunk size: " + msc.getBytes().length); index++; start = end; } return subArray; } // A message envelope which have // chunk data and meta data static class MessageChunk { private String msgName; private long timestamp; private int totalPart; private int index; private byte [] bytes; public MessageChunk(String msgName, long timestamp, int totalPart, int index, byte [] bytes) { this .msgName = msgName; this .timestamp = timestamp; this .totalPart = totalPart; this .index = index; this .bytes = bytes; } // Message envelope name public String getMessage() { return msgName; } // Time at which message // chunking takes place public long getTimestamp() { return timestamp; } // Total no of parts in // which message is divided public int getTotalPart() { return totalPart; } // to identify the sequence of message public int getIndex() { return index; } // Actual message in Bytes public byte [] getBytes() { return bytes; } @Override public String toString() { return "MessageChunk{" + "msgName='" + msgName + ' '' + ", timestamp=" + timestamp + ", totalPart=" + totalPart + ", index=" + index + ", bytes=" + new String(bytes) + '}' ; } } } |
Payload size:37 bytes Chunk Count: 4 The value of 1th message chunk size: 10 The value of 2th message chunk size: 10 The value of 3th message chunk size: 10 The value of 4th message chunk size: 7 Chunk-1:MessageChunk{msgName='48148124-7557-467b-b507-2ca65e17cc6d', timestamp=1679749393594, totalPart=4, index=0, bytes=The world } Chunk-2:MessageChunk{msgName='48148124-7557-467b-b507-2ca65e17cc6d', timestamp=1679749393594, totalPart=4, index=1, bytes=is beautif} Chunk-3:MessageChunk{msgName='48148124-7557-467b-b507-2ca65e17cc6d', timestamp=1679749393594, totalPart=4, index=2, bytes=ul and so } Chunk-4:MessageChunk{msgName='48148124-7557-467b-b507-2ca65e17cc6d', timestamp=1679749393594, totalPart=4, index=3, bytes=are we.}