[big data practice] Kafka producer programming (2) – producer sending process

Time:2019-12-14

Preface

In the previous article [big data practice] Kafka producer programming (1) – Kafka producer detailed explanation, mainly forKafkaProducerThe functions in the class are explained in detail, but only for some of the methods. There is no in-depth explanation of the principle and mechanism behind producer. Therefore, in this article, we try to introduce the whole sending process of Kafka producer. When writing this article, I am also in the learning stage of Kafka. I may have some details that are not accurate. I hope you can correct me.

Producer message sending process

[big data practice] Kafka producer programming (2) - producer sending process

Note: picture source: https://blog.csdn.net/zhanglh

Construct kafkaproducer object

In the previous article, the constructor of Kafka producer is introduced in detail, which mainly configures some options of producer. Configuration items can be found in the classProducerConfigFound:

package org.apache.kafka.clients.producer;

public class ProducerConfig extends AbstractConfig {
...
} 

In addition to some simple values, you can also configure some of the Kafka’s own or our custom classes, such as:

  • key.serializer: key serialization class, Kafka in packagepackage org.apache.kafka.common.serialization;A series of commonly used serialization and deserialization classes are implemented in. To customize the serialization class, you need to implement the interfaceorg.apache.kafka.common.serialization.Serializer, such as integer’s serialization class:

     package org.apache.kafka.common.serialization;
    
     import java.util.Map;
    
     public class IntegerSerializer implements Serializer<Integer> {
     public IntegerSerializer() {
     }
    
     public void configure(Map<String, ?> configs, boolean isKey) {
     }
    
     public byte[] serialize(String topic, Integer data) {
         return data == null ? null : new byte[]{(byte)(data.intValue() >>> 24), (byte)(data.intValue() >>> 16), (byte)(data.intValue() >>> 8), data.byteValue()};
     }
    
     public void close() {
     }
     }
  • value.serializer: value.
  • partitioner.class: partition allocated class to send messages evenly to each partition of topicpartitionIn, Kafka default partition isorg.apache.kafka.clients.producer.internals.DefaultPartitioner。 To customize the load balancing algorithm, you need to implementorg.apache.kafka.clients.producer.PartitionerInterface.
  • Interceptors: for the interceptor list, you can let the user do some logical processing for the message or callback information before the message record is sent or before the producer callback method is executed. It can be realized byorg.apache.kafka.clients.producer.ProducerInterceptorInterface to define your own interceptors.

Construct producer record

Producer record is a message record, which records the messages and partitions to be sent to Kafka cluster

public class ProducerRecord<K, V> {
    private final String topic;
    private final Integer partition;
    private final Headers headers;
    private final K key;
    private final V value;
    private final Long timestamp;
  • topic: required field, indicating the topic to which the message record record is sent.
  • value: required field to represent the message content.
  • partition: optional field, to which partition partition to send.
  • key: optional field, the key of message record, which can be used to calculate the selected partition.
  • timestamp: optional field, time stamp; indicates the creation time of the message record. If not specified, the current time of producer is used by default.
  • headers: optional field, (the function is temporarily unknown, to be verified and supplemented later).

Send producer record

Asynchronous send & synchronous send

When sending asynchronously, the message record is directly thrown into the sending buffer and returned immediately. Another thread is responsible for sending the messages in the buffer. When sending asynchronously, you need to setcallbackMethod, the callback method is called when the broker’s ack acknowledgement is received. In the following official example of Kafka, the asynchronous and synchronous sending methods are shown:

package kafka.examples;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.common.serialization.IntegerSerializer;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;
import java.util.concurrent.ExecutionException;

public class Producer extends Thread {
    private final KafkaProducer<Integer, String> producer;
    private final String topic;
    private final Boolean isAsync;

    public Producer(String topic, Boolean isAsync) {
        Properties props = new Properties();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaProperties.KAFKA_SERVER_URL + ":" + KafkaProperties.KAFKA_SERVER_PORT);
        props.put(ProducerConfig.CLIENT_ID_CONFIG, "DemoProducer");
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, IntegerSerializer.class.getName());
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        producer = new KafkaProducer<>(props);
        this.topic = topic;
        this.isAsync = isAsync;
    }

    public void run() {
        int messageNo = 1;
        while (true) {
            String messageStr = "Message_" + messageNo;
            long startTime = System.currentTimeMillis();
            if (isAsync) { // Send asynchronously
                producer.send(new ProducerRecord<>(topic,
                    messageNo,
                    messageStr), new DemoCallBack(startTime, messageNo, messageStr));
            } else { // Send synchronously
                try {
                    producer.send(new ProducerRecord<>(topic,
                        messageNo,
                        messageStr)).get();
                    System.out.println("Sent message: (" + messageNo + ", " + messageStr + ")");
                } catch (InterruptedException | ExecutionException e) {
                    e.printStackTrace();
                }
            }
            ++messageNo;
        }
    }
}

class DemoCallBack implements Callback {

    private final long startTime;
    private final int key;
    private final String message;

    public DemoCallBack(long startTime, int key, String message) {
        this.startTime = startTime;
        this.key = key;
        this.message = message;
    }

    /**
     * A callback method the user can implement to provide asynchronous handling of request completion. This method will
     * be called when the record sent to the server has been acknowledged. Exactly one of the arguments will be
     * non-null.
     *
     * @param metadata  The metadata for the record that was sent (i.e. the partition and offset). Null if an error
     *                  occurred.
     * @param exception The exception thrown during processing of this record. Null if no error occurred.
     */
    public void onCompletion(RecordMetadata metadata, Exception exception) {
        long elapsedTime = System.currentTimeMillis() - startTime;
        if (metadata != null) {
            System.out.println(
                "message(" + key + ", " + message + ") sent to partition(" + metadata.partition() +
                    "), " +
                    "offset(" + metadata.offset() + ") in " + elapsedTime + " ms");
        } else {
            exception.printStackTrace();
        }
    }
}

Intercepting chain intercepting processing producer record

When the send method is called, the interceptor intercepts the producerrecord first, and calls the onsend method of interceptor to process the message record and return the processed producerrecord.

Key and value serialization of producerrecord

Call the configured key and value serialization class to serialize the key and value of the producerrecord and set them to the producerrecord.

Set the partition of producer record

The message is calculated to be sent to a partition in topic by using the partition method in the defaultpartition class or the custom partition class specified in the configuration item. Set to producer record.

Check if the producer record length exceeds the limit

According to configuration itemmax.request.sizeandbuffer.memoryCheck and throw an exception if there is more than one.

Set producer record timestamp

If the time stamp has been specified when the producerrecord is built, the time specified at the time of the build is used, otherwise the current time is used.

Producer record put in buffer

When the producer record is put into the cache (record accumulator maintenance), the message records of the same partition sent to the same topic will be compressed by the bundling batch and compressed into the producer batch. That is, a producer batch may contain multiple producer records. The purpose of this is to send multiple records at one request, improving performance.

The recordaccumulator maintains a two terminal queue for each topicpartition:

ConcurrentMap<TopicPartition, Deque<ProducerBatch>> batches;

Producer batch of the same partition of the same topic will be placed in the corresponding queue.

The compression strategies are:

·None: no compression.
·Gzip: 50% compression
·Snappy: compression rate is 50%
·Lz4: compression rate is 50%

Wake up Sender

When a producer batch is full or a new producer batch arrives, the sender thread that actually sends the message record will wake up and send the producer batch to the Kafka cluster.

The sending logic of sender is as follows:

  1. Check whether there is a leader partition corresponding to the producer batch to be sent in the Kafka cluster. If there is one, it is considered to be sendable. If there is no problem with the server, the batch will not be sent temporarily.
  2. Filter out expired producerbatch. For expired producerbatch, it will notify the interceptor of sending failure through the sensor.
  3. Send batch.
  4. Process the sending result and call the onacknowledge of callback and interceptor for processing.

Summary

This article combs the general sending process of producer messages, some of which are not particularly understood or written in detail. If there is further understanding, this article will be modified to supplement. In the following articles, we will introduce the custom inteceptor and the custom partitioner when building the producer during the sending process.

Recommended Today

Java Engineer Interview Questions

The content covers: Java, mybatis, zookeeper, Dubbo, elasticsearch, memcached, redis, mysql, spring, spring boot, springcloud, rabbitmq, Kafka, Linux, etcMybatis interview questions1. What is mybatis?1. Mybatis is a semi ORM (object relational mapping) framework. It encapsulates JDBC internally. During development, you only need to pay attention to the SQL statement itself, and you don’t need to […]