Kafka message too long details

Time:2020-2-16

The size of messages sent by Kafka

⚠ the Kafka version of this experiment is 2.11

Message overview

The message in Kafka refers to a producer record. In addition to carrying the data sent, it also includes:

  • Topic sent to
  • Partition to
  • Headers header information
  • Key data
  • Value data
  • Timestamp long timestamp

Producer production message too long

When a producer sends a message, not all of the above information is counted as the size of the message sent. See the code below for details
Kafka message too long details

The above code will serialize the value into a byte array. Topics, headers, and keys are involved in the serialization. What is used to verify whether the value exceeds the length isensureValidRecordSize(serializedSize);Method.

Ensure valid record size verifies from two aspects: one is maxrequestsize (max.request. Size), the other is totalmemorysize (buffer. Memory). Messages can be sent normally only when the length of value is less than at the same time

private void ensureValidRecordSize(int size) {
    if (size > this.maxRequestSize)
        throw new RecordTooLargeException("The message is " + size +
                " bytes when serialized which is larger than the maximum request size you have configured with the " +
                ProducerConfig.MAX_REQUEST_SIZE_CONFIG +
                " configuration.");
    if (size > this.totalMemorySize)
        throw new RecordTooLargeException("The message is " + size +
                " bytes when serialized which is larger than the total memory buffer you have configured with the " +
                ProducerConfig.BUFFER_MEMORY_CONFIG +
                " configuration.");
}

A single message is too long or has the following error
Kafka message too long details

Here’s a point to note: if you just send a message and don’t use callback to monitor or use future to get results, you won’t give an active prompt if the message is too long,

Receive results using future

Future<RecordMetadata> send = kafkaProducer.send(new ProducerRecord<>("topic", "key", "value"));
RecordMetadata recordMetadata = send.get();
System.out.println(recordMetadata);

Get() method in the future class, @ throws executionexception if the calculation throws an exception, the method will throw the exception

/**
 * Waits if necessary for the computation to complete, and then
 * retrieves its result.
 *
 * @return the computed result
 * @throws CancellationException if the computation was cancelled
 * @throws ExecutionException if the computation threw an
 * exception
 * @throws InterruptedException if the current thread was interrupted
 * while waiting
 */
V get() throws InterruptedException, ExecutionException;

Monitoring with callback

First, look at the interface Kafka writes specifically for callbacks

//Generally speaking, it is used for asynchronous callback. When the message sending server has been confirmed, the method will be called
//There must be a parameter in this method that is not null. If no exception is generated, metadata has data. If there is an exception, it is the opposite
public void onCompletion(RecordMetadata metadata, Exception exception);
kafkaProducer.send(new ProducerRecord<>("topic", "key", "value"), new Callback() {
    @Override
    public void onCompletion(RecordMetadata metadata, Exception exception) {
        if (exception != null) {
            exception.printStackTrace();
        }
    }
});

Log level = debug

Setting the message level of the log to debug will also output this warning message to standard output

Future and callback summary

Through the above two comparisons, it is not difficult to find that future is a Java Concurrent standard library, not specially designed for Kafka, which needs to display and catch exceptions. The callback interface is Kafka’s standard callback measures, so the latter should be used as much as possible

Server receiving message limit

In the producer, there is a parameter to restrict messages, and in the server, there is also a parameter to restrict messages. This parameter is
message.max.bytes, the default value is 1000012b (about 1MB), and the server can not receive 1MB of data. (in the new client producer, the message always passes through the data of batch group into batch. See the recordbatch interface for details.)

/**
 * A record batch is a container for records. In old versions of the record format (versions 0 and 1),
 * a batch consisted always of a single record if no compression was enabled, but could contain
 * many records otherwise. Newer versions (magic versions 2 and above) will generally contain many records
 * regardless of compression.
 *When message compression is not enabled in the old version, a batch contains only one piece of data
 *In the new version, there will always be multiple messages, without considering whether the messages are compressed
 */
public interface RecordBatch extends Iterable<Record>{
    ...
}

Set the message size received by the broker

To modify the message size that can be received by the broker side, you need to add it in the server.properties file of the broker sidemessage.max.bytes=100000. the value can be modified to what you want. The unit is byte

What happens when the production side message is larger than the broker

What happens if the message sending size set by the producer is 1MB and the message size set by the broker is 512KB?
The answer is that the broker rejects the message and the producer returns aRecordTooLargeException. the message will not be consumed by consumers. The message prompted is:org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept.

Restrictions on consumer messages

Consumers will also restrict messages. Here are three parameters about restricting consumption

  • Fetch.max.bytes the size that can be returned by the server message aggregation (multiple messages)
  • Minimum return message size of fetch.min.bytes server
  • Fetch.max.wait.ms maximum wait time

Iffetch.max.wait.msThe set time arrives, even if the total size of messages that can be returned is not satisfiedfetch.min.bytesThe set value will also be returned

Fetch.max.bytes set too small

What happens if fetch.max.bytes is set too small? Does not one piece of data that does not meet the conditions return? We can check it according to the document

The maximum amount of data the server should return for a fetch request. Records are fetched in batches by the consumer, and if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that the consumer can make progress.

Fetch.max.bytes indicates the total size of messages that can be returned by the server. Messages are returned to consumers in batches. If the first message batch in the partition is greater than this value, the message batch will still be returned to consumers to ensure the process operation

It can be concluded that the parameters of the consumer only affect the message read size

Practice fetch.max.bytes set too small

properties.put(ConsumerConfig.FETCH_MAX_BYTES_CONFIG, 1024);
properties.put(ConsumerConfig.FETCH_MIN_BYTES_CONFIG, 1024);
properties.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, 1);
...
while (true) {
    ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofSeconds(Integer.MAX_VALUE));
    System.out.println(records.count());
}

Start the consumer, add the above three parameters. Specify the minimum and maximum return size of the message batch and the maximum waiting time allowed for fetching. Finally, output the total number of returned messages to the standard output

Experimental results: because the message sent each time is greater than 1024b, the consumer can only return one piece of data per batch. The final output is 1