Kafka: Producer (0.10.0.0)

Time:2021-8-24
  1. Getting started with producer API

    Producer record description
  2. Asynchronous sending process

    2.1 the user thread calls send to compress the record into the bufferpool
       2.2 sending scheduling
  3. Producer design description

  4. Producer Configuration

1. Introduction to producer API:

Kafkaproducer is a client API that sends records to Kafka cluster. This class is thread safe. In an application, the common practice is that all threads sent to a Kafka cluster use the same producer object. If your program wants to send messages to multiple clusters, you need to use multiple producers.

Kafka: Producer (0.10.0.0)

Producer record description
As can be seen from the above code, the record class representing the message to be sent is producerrecord:

Kafka: Producer (0.10.0.0)

A record usually includes five fields:
·Topic: specifies which topic the record is sent to[ Required]
·Partition: Specifies the partition to which the record is sent[ Optional]
·Key: a key[ Optional]
·Value: record the content of the person[ Required]
·Timestamp: timestamp[ Optional]

By default:
If the user specifies a partition, it will be sent to the partition specified by the user. If the user does not specify a partition, it will decide which partition to put according to the key. If the key is not specified, the producer will randomly select a partition.
On the producer side, if the user specifies timestamp, record uses the time specified by the user. If the user does not specify it, the current time on the producer side will be used. On the broker side, if the timestamp is configured to adopt the createtime method, the timestamp time in the record passed to the broker by the producer is used. If it is specified as logappendtime, the time will be rewritten when the broker writes to the log file.


2. Asynchronous sending process

2.1. The user thread calls the send method to put the record into the bufferpool

The previous Kafka client version may also support sending message records synchronously. However, in my version (0.10.0.0), synchronous sending is no longer supported. When the user sends a record using kafkaproducer #send(), the execution process is:

1. The interceptor chain processes the producerrecord before sending it
The interceptor interface is: producer intercept. Users can customize their own interceptor implementation.
Kafka: Producer (0.10.0.0)

The interceptor chain is initialized when the producer object is initialized and will not change after that. Therefore, interceptors in the interceptor chain are public. If you want to customize interceptors, you should pay attention to this.
Producer interceptor has two methods:
·Onsend: this method is executed when kafkaproducer #send is called.
·Onacknowledgement: this method will be called when the sending fails or the sending succeeds (the broker notifies the producer that the sending succeeds).

This stage executes the onsend method.

2. Get the information of the broker cluster on the broker cluster by blocking
The broker information obtained by RPC is encapsulated by a metadata class. It includes the necessary information of the broker cluster, such as: all broker information (idhostport, etc.), all topic names, and the partition of each topic (ID, leader node, replica nodes, ISR nodes, etc.).
Although the process is blocked, not every record sent will be obtained through RPC. Metadata will be cached on the producer side and will be executed only when the topic specified in record does not exist or when the metadata polling cycle expires.

3. Serialize key and value in record
There’s nothing to say about this. Built in serialization tools based on string, integer, long, double, bytes, ByteBuffer and bytearray.

4. Set the partition property for record
As mentioned earlier, when creating a producer record, the partition is optional. Therefore, if the user does not specify the partition attribute when creating a record. The partition is calculated by the partition calculation tool (partitioner interface). This calculation method can be customized. Kafka producer provides a built-in implementation:
·If the key value is provided, the module operation will be performed according to the hashcode of the byte array serialized by the key.
·If no key is provided, the iteration method is adopted (in fact, the value obtained is not a perfect iteration, but similar to a random number).

5. Verify whether the length of record exceeds the threshold
MAX_REQUEST_SIZE_CONFIG=”max.request.size”
BUFFER_MEMORY_CONFIG=”buffer.memory”
If any item is exceeded, an exception will be thrown.

6. Set timestamp for record
If the user does not specify a timestamp when creating a producerrecord, it is set to the current time of the producer.
In fact, in the Java client, a time interface is designed to set the time. There is a built-in implementation SYSTEMTIME. Here, the record timestamp is set to the current time, which is completed by SYSTEMTIME. Therefore, if you want to use other time in Kafka producer java client, you can customize the implementation of time.

7. Compress the record and put it into the bufferpool
This step is done by the recordaccumulator. A double ended queue deque < recordbatch > is maintained for each topic in the recordaccumulator. The element in the queue is recordbatch (recordbatch is compressed from multiple records). What the recordaccumulator needs to do is compress the record and put it at the end of the deque associated with its topic.

As for the compression mode of record, Kafka producer supports several modes:
·None: no compression.
·Gzip: compression ratio is 50%
·Snappy: the compression ratio is 50%
·Lz4: compression ratio is 50%

The process of placing the record in the last recordbatch in deque is as follows: if the last recordbatch can be placed, it can be placed; if not, a new recordbatch will be created.
Recordbatch is actually stored in bufferpool, so this process actually puts record in bufferpool. At the beginning of creating a bufferpool, you will specify the total size of the bufferpool, the size of each recordbatch in the bufferpool, and other configurations.

8. Wake up sending module
When you go to the previous step, kafkaproducer#sender’s processing is basically completed. The purpose of this step is to wake up the NiO selector.

In addition, in the above steps 2 ~ 8, no matter which step has a problem, an exception will be thrown. When an exception is thrown, it will be caught by kafkaproducer and handed over to the sensor for processing. The sensor usually calls the interceptor chain mentioned in step 1 to execute onacknowledge to inform the user.

public Future<RecordMetadata> send(ProducerRecord<K, V> record, Callback callback) {

        // intercept the record, which can be potentially modified; this method does not throw exceptions

        ProducerRecord<K, V> interceptedRecord = this.interceptors == null ? record : this.interceptors.onSend(record);

        return doSend(interceptedRecord, callback);

    }

 

    /**

     * Implementation of asynchronously send a record to a topic. Equivalent to <code>send(record, null)</code>.

     * See {@link #send(ProducerRecord, Callback)} for details.

     */

    private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {

        TopicPartition tp = null;

        try {

            // first make sure the metadata for the topic is available

            long waitedOnMetadataMs = waitOnMetadata(record.topic(), this.maxBlockTimeMs);

            long remainingWaitMs = Math.max(0, this.maxBlockTimeMs - waitedOnMetadataMs);

            byte[] serializedKey;

            try {

                serializedKey = keySerializer.serialize(record.topic(), record.key());

            } catch (ClassCastException cce) {

                throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +

                        " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +

                        " specified in key.serializer");

            }

            byte[] serializedValue;

            try {

                serializedValue = valueSerializer.serialize(record.topic(), record.value());

            } catch (ClassCastException cce) {

                throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +

                        " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +

                        " specified in value.serializer");

            }

            int partition = partition(record, serializedKey, serializedValue, metadata.fetch());

            int serializedSize = Records.LOG_OVERHEAD + Record.recordSize(serializedKey, serializedValue);

            ensureValidRecordSize(serializedSize);

            tp = new TopicPartition(record.topic(), partition);

            long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp();

            log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);

            // producer callback will make sure to call both 'callback' and interceptor callback

            Callback interceptCallback = this.interceptors == null ? callback : new InterceptorCallback<>(callback, this.interceptors, tp);

            RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey, serializedValue, interceptCallback, remainingWaitMs);

            if (result.batchIsFull || result.newBatchCreated) {

                log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);

                this.sender.wakeup();

            }

            return result.future;

            // handling exceptions and record the errors;

            // for API exceptions return them in the future,

            // for other exceptions throw directly

        } catch (ApiException e) {

            log.debug("Exception occurred during message send:", e);

            if (callback != null)

                callback.onCompletion(null, e);

            this.errors.record();

            if (this.interceptors != nullthis.interceptors.onSendError(record, tp, e);

            return new FutureFailure(e);

        } catch (InterruptedException e) {

            this.errors.record();

            if (this.interceptors != nullthis.interceptors.onSendError(record, tp, e);

            throw new InterruptException(e);

        } catch (BufferExhaustedException e) {

            this.errors.record();

            this.metrics.sensor("buffer-exhausted-records").record();

            if (this.interceptors != nullthis.interceptors.onSendError(record, tp, e);

            throw e;

        } catch (KafkaException e) {

            this.errors.record();

            if (this.interceptors != nullthis.interceptors.onSendError(record, tp, e);

            throw e;

        } catch (Exception e) {

            // we notify interceptor about all exceptions, since onSend is called before anything else in this method

            if (this.interceptors != nullthis.interceptors.onSendError(record, tp, e);

            throw e;

        }

    }

2.2 sending and dispatching

Kafkaproducer #sender just puts the record into the bufferpool and does not send the record. The sending scheduling is completed by another thread (sender).
The execution process of sender is as follows:
1. Take out the ready record
This step is to check whether the records to be sent are ready: check whether the leader node to which each record to be sent exists according to the metadata maintained by kafkaproducer. If there is a record that does not exist, it is set to need to be updated, and such a record is considered not ready. To ensure that it can be sent to the leader node of the relevant partition.

2. Take out the recordbatch and filter out the expired recordbatch
For expired recordbatches, the interceptor will be notified of sending failure through the sensor.

3. Create a request for the recordbatch to send
One recordbatch and one clientrequest.

4. Keep request and send
Keep the request object in an inflightrequest collection. Stored in this collection is the request being sent, which is a map from topic to deque. When the sending is successful or failed, it will be removed.

5. Process send results
If the sending fails, retry is attempted. The sensor dispatches the interceptor.
If the transmission is successful, the sensor will schedule the interceptor.


3. Producer implementation description

From the above processing flow, you can see some designs in the Java client:
1. Interceptor chain: it can be used as an interface for customizing plug-ins.
2. Metadata: producer does not send requests on demand and periodically to obtain the latest cluster status information. Based on this information, the producer can directly send the record batch to the leader of the relevant partition. That is, load balance is completed on the client.
3. Partitioner: a partition selection tool, which selects which partitions to send, and completes the load balance in combination with metadata.
4. Recordbatch: compress records into recordbatch at the client, and then send them once for a recordbatch. This can reduce the number of IO operations and improve performance.
5. Asynchronous sending: improve user application performance.


4. Producer Configuration

It is explained at the beginning of the article that when using Kafka producer java client, you only need to create a Kafka producer. During its operation, it will use many configuration items, which are completed during the initialization of kafkaproducer.
Let’s take a look at the required configuration items in the Java client:

·bootstrap.servers
Used to configure the host / port pair of borker in the cluster. One or more instances can be configured without configuring all instances in the cluster. Because it automatically discovers all brokers.
If you want to configure multiple items, the format is: host1: port1, host2: port2, host3: port3

·key.serializer、value.serializer
Configure serialization class name. These classes specified must implement the serializer interface.

·acks
To ensure that the message record is successfully received by the broker. Kafka producer will ask borker to confirm the completion of the request (the request to send recordbatch).
Kafka broker supports three situations for the confirmation of message reception:
1. No confirmation is required;
2) The leader confirms upon receipt;
3) Confirm after all available followers are copied.
It can be seen that these three cases represent different confirmation granularity. In the Java producer client, three scenarios are supported. The above three scenarios correspond to three configuration items: 0, 1, – 1. In fact, another value is all, which is actually – 1.

How does Kafka producer java client support these three confirmations?
1. When creating a request for recordbatch, the value of acks is encapsulated as part of the request header.
2. After sending the request (before receiving the broker’s response), immediately judge whether it is necessary to confirm whether the request is completed (that is, whether the recordbatch is successfully received by the broker), and judge whether the value of acks is 0. If it is 0, no confirmation is required. Then the request is considered to have been successfully completed. Since it is determined to be successful, there will be no retry.
If the value is not 0, you have to wait for the broker’s response. Judge whether the request is successfully completed according to the response.

The default value of this configuration item is 1, that is, the leader will respond after receiving it.

·buffer.memory
Bufferpool size, that is, the space size of records waiting to be sent. The default value is 33554432, or 32MB.
The unit of configuration item is byte, and the range is: [0,…]

·compression.type
Kafka provides a variety of compression types. There are four optional values: none, gzip, snappy, lz4. The default value is none.

·retries
When a recordbatch fails to send, it will be improved again to ensure that the data is delivered. This configuration sets the number of retries, and the value range is [0, integer. Max]. If it is 0, it will not be retransmitted even if it fails.
If retries are allowed (i.e. retries > 0), but max.in.flight.requests.per.connection is not set to 1. In this case, the order of records may change. For example, if the sender thread of a prodcuder client sends two recordbatches to a synchronous partition in a poll, they must be sent to the same broker and use the same TCP connection. If recordbatch1 sends first but fails to send, recordbatch2 sends next to recordbatch1. It is sent successfully. Recordbatch1 will then resend. As a result, the order received by the broker is recordbatch2 before recordbatch1.

·ssl.key.password
Password for the private key in the keystore file. Optional.

·ssl.keystore.location
The location of the keystore file. Optional.

·ssl.keystore.password
Password for keystore file. Optional.

·ssl.truststore.location
The location of the trust store file. Optional.

·ssl.truststore.password
Password for the trust store file. Optional.

·batch.size
Maximum capacity of recordbatch. The default value is 16384 (16kb).

·client.id
Logical name, which is used when the client sends a request to the broker. The default value is.

·connections.max.idle.ms
Maximum idle time for connection. The default value is 540000 (9 min)

·linger.ms
Socket :solinger。 Delay. Default value: 0, i.e. no delay.

·max.block.ms
When kafkaproducer #send is executed before the required metadata arrives (for example, the topic of the record to be sent has no relevant record in the client), the internal processing will wait for the metadata to arrive. This is a blocking operation. In order to prevent infinite waiting, it is necessary to set this blocking time. Range: [0, long. Max]

·max.request.size
Maximum request length, which will be verified before compressing record into recordbatch. Exceeding this size will throw an exception.

·partitioner.class
Used to customize the partitioner algorithm. The default values are:
org.apache.kafka.clients.producer.internals.DefaultPartitioner

·receive.buffer.byte
The size of the TCP receiver buffer. Value range: [- 1,…]. The default value of this configuration item is 32768 (32KB).
If set to – 1, the operating system defaults are applied.

·request.timeout.ms
Maximum request duration. After the request is initiated, it will wait for the response of the broker. If it exceeds this time, it will be considered that the request has failed.

·timeout.ms
This time configures the ACK timeout from follower to leader. This time has nothing to do with the network of the request sent by the producer.

·block.on.buffer.full
When the bufferpool is used up, if the client is still sending records using kafkaproducer, either the bufferpool refuses to receive or throws an exception.
The default value of this configuration is false, that is, when the bufferpool is full, bufferexhaustexception will not be thrown, but blocking will be carried out according to max.block.ms. if timeout, timeoutexception will be thrown.

If this property value is true, the max.block.ms value will be set to long.max. In addition, when the configuration is true, metadata.fetch.time.ms will not take effect.

·interceptor.class
Custom interceptor class. By default, no interceptor is specified.

·max.in.flight.requests.per.connection
The maximum number of requests in the send state per connection. The default value is 5. The range is [1, integer. Max]

·metric.reporters
Implementation class of metricreporter. By default, jmxreporter is automatically registered.

·metrics.num.samples
The number of samples when calculating metric. The default value is 2. Range: [1, integer. Max]

·metrics.sample.window.ms
Time window for sampling. The default value is 30000 (30s). Range: [0, long. Max]