[big data practice] Kafka producer programming (3) — interceptor & partitioner

Time:2020-10-28

preface

In the last article [big data practice] Kafka producer programming (2) – producer sending process, the user-defined interceptor and the user-defined partitioner were briefly introduced, but no in-depth explanation was given. Therefore, in this paper, we try to introduce some theoretical knowledge of interceptor and partitioner, and introduce how to customize the two classes.

Producer interceptor and interception chain

Implementation interface

Interceptor allows the user to do some logical processing on the message or callback information before the message record is sent or before the producer callback method is executed. The interceptor implements the following interfaces:

package org.apache.kafka.clients.producer;

import org.apache.kafka.common.Configurable;

public interface ProducerInterceptor<K, V> extends Configurable {
    ProducerRecord<K, V> onSend(ProducerRecord<K, V> var1);

    void onAcknowledgement(RecordMetadata var1, Exception var2);

    void close();
}
  • onSend()The: onsend function is called before the message record is sentProducerRecordDo some processing and return the processedProducerRecord
  • onAcknowledgement()The: onacknowledgement method will be called before the callback function specified in send is executed, which can process the execution results.
  • close()The: close method will execute producer.close () is called to release resources.

Interceptor chain producer interceptors

Producer interceptors contain a list of interceptors assembled by multiple interceptorsList<ProducerInterceptor<K, V>> When producer sends message, message response and close, onsend, onacknowledgement and close methods of interception chain will be called. Among these methods, onsend, onacknowledgement and close methods of each interceptor will be called one by one. It’s like every processor on the production pipeline.

Location of interception chain class:

package org.apache.kafka.clients.producer.internals;

public class ProducerInterceptors<K, V> implements Closeable {}

custom interceptor

Customize a count interceptor as follows:

import org.apache.kafka.clients.producer.ProducerInterceptor;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;

import java.util.Map;

public class CounterInterceptor implements ProducerInterceptor<Integer, String> {
    public int sendCounter = 0;
    public int succCounter = 0;
    public int failCounter = 0;

    public void configure(Map<String, ?> configs) {

    }

    public ProducerRecord<Integer, String> onSend(ProducerRecord<Integer, String> record) {
        System.out.println("onSend called in CounterInterceptor, key = " + record.key());
        sendCounter++;
        return  record;
    }

    public void onAcknowledgement(RecordMetadata recordMetadata, Exception exception) {
        if (exception == null) {
            System.out.println("record send ok. topic = " + recordMetadata.topic() + "partion = " + recordMetadata.partition());
            succCounter++;
        } else {
            System.out.println("record send failed. topic = " + recordMetadata.topic() + "partion = " + recordMetadata.partition());
            failCounter++;
        }

    }

    public void close() {
        System.out.println("sendCounter = " + sendCounter + " succCounter = " + succCounter + " failCounter = " + failCounter);
    }

}

Assemble the interceptor into a custom producer:

package myproducers; 

/**
 *Kafka message producer——
 */

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.IntegerSerializer;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
import java.util.concurrent.ExecutionException;


public class GameRecordProducer {
    public static final String KAFKA_SERVER_URL = "localhost";
    public static final int KAFKA_SERVER_PORT = 9092;

    public GameRecordProducer() {}


    public static void main(String[] args) {
        Properties props = new Properties();
        props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_SERVER_URL + ":" + KAFKA_SERVER_PORT);
        props.put(ProducerConfig.CLIENT_ID_CONFIG, "myproducers.GameRecordProducer");
        props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, IntegerSerializer.class.getName());
        props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        List<String> intercepters = new ArrayList<String>();
        intercepters.add("myproducers.CounterInterceptor");
        props.put(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG, intercepters);
        KafkaProducer<Integer, String> producer;
        producer = new KafkaProducer<Integer, String>(props);


        try {
            producer.send(new ProducerRecord<Integer,String>("game-score","message1")).get();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ExecutionException e) {
            e.printStackTrace();
        }
    }
}

Message record class producer record

In the message record class, the message content to be sent and the subject and partition to be sent are recorded. Class is defined as follows:

package org.apache.kafka.clients.producer;

public class ProducerRecord<K, V> {
    private final String topic;
    private final Integer partition;
    private final Headers headers;
    private final K key;
    private final V value;
    private final Long timestamp;
  • topic: required field, indicating the topic to which the message record is sent.
  • value: required field, representing the message content.
  • partition: optional field to which partition to send.

    • If a partition is set in the record, it is sent to the partition;
    • If the partition is not set but the key value is specified, the partition will be obtained by modular operation according to the hashcode of the byte array serialized by the key.
    • If the partition and key are not set, the producer iterates (similar to random numbers).
  • key: optional field, the key of the message record, which can be used to calculate the selected partition.
  • timestamp: optional field, time stamp; indicates the creation time of the message record. If not specified, the current time of producer is used by default.
  • headers: optional field.

Default partition algorithm

The partition strategy of Kafka producer is as follows:

  • If a partition is set in the record, it is sent to the partition;
  • If the partition is not set but the key value is specified, the partition will be obtained by modular operation according to the hashcode of the byte array serialized by the key.
  • If the partition and key are not set, the producer uses polling like (but not strict polling, but random numbers).

The specific algorithm source code is as follows:

package org.apache.kafka.clients.producer.internals;

import ...

public class DefaultPartitioner implements Partitioner {

    // ...
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        //Get the topic partition list and the number of partitions from the cluster.
        List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
        int numPartitions = partitions.size();
       
        if (keyBytes == null) {
            //If no key value is specified and the key value is null after serialization, the next available partition value is obtained
            int nextValue = this.nextValue(topic);
            //Get the list of partitions available for the topic
            List<PartitionInfo> availablePartitions = cluster.availablePartitionsForTopic(topic);
            if (availablePartitions.size() > 0) {
                //When the available partition list is greater than 0,
                int part = Utils.toPositive(nextValue) % availablePartitions.size();
                return ((PartitionInfo)availablePartitions.get(part)).partition();
            } else {
                //To positive: make sure it is a positive number, Math.abs ( Integer.MIN_ Value) is a negative number, so it cannot be used.
                // toPositive(Integer.MIN_VALUE) == 0 
                // toPositive(-1) == 2147483647 
                //Surplus
                return Utils.toPositive(nextValue) % numPartitions;
            }
        } else {
            //Using the murmur2 hash algorithm, we can get the value and get the remainder
            return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
        }
    }

    //Gets the next value
    private int nextValue(String topic) {
        
        AtomicInteger counter = (AtomicInteger)this.topicCounterMap.get(topic);
        if (null == counter) {
            counter = new AtomicInteger(ThreadLocalRandom.current().nextInt());
            AtomicInteger currentCounter = (AtomicInteger)this.topicCounterMap.putIfAbsent(topic, counter);
            if (currentCounter != null) {
                counter = currentCounter;
            }
        }

        return counter.getAndIncrement();
    }

    ...
}

Custom partitioner

In addition to using the default partitioner, you can also use the custom partitioner to achieve better partition balancing.

package myproducers;

import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;

import java.util.Map;

public class ConstantPartioner implements Partitioner {
    public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
        //Fixed always return to 1, that is all placed in the 1 partition
        return 1;
    }

    public void close() {

    }

    public void configure(Map<String, ?> configs) {
    }
}

When building the kafkaproducer object, configure the user-defined partitioner class in the configuration information

kafkaProps.put("partitioner.class", "myproducer.ConstantPartitioner");

Summary

This paper introduces two independent concepts in Kafka producer, which can be used as extension points of our program in the actual development process. The latter article will continue to analyze the configuration details of Kafka producer to understand more details and mechanisms in the Kafka sending process.