Kafka series — 4.1, basic introduction to consumers

Time:2020-12-5

1. Consumer consumption of demo

Properties prop = new Properties();
prop.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
prop.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
prop.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka:9092");
prop.put(ConsumerConfig.GROUP_ID_CONFIG, "testConsumer");
prop.put(ConsumerConfig.CLIENT_ID_CONFIG, "consumerDemo");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(prop);
consumer.subscribe(Collections.singleton("test"));
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
    for (ConsumerRecord<String, String> record : records) {
        String key = record.key();
        String value = record.value();
        System.err.println(record.toString());
    }
}

2. Basic concepts of consumers

kafkaConsumers areGroup is the basic unitFor consumption. The consumption model is as follows
Kafka series -- 4.1, basic introduction to consumers

1topicAllow multipleConsumer groupConsumption. Again,kafkaConsumption is in groups.

prop.put(ConsumerConfig.GROUP_ID_CONFIG, "testConsumer");

The above line of code sets up the consumption group.

2.1、partitiondistribution

topicIt’s a logical concept,partitionThat’s the physical concept. Well, after reading the above consumption model diagram. You may be confused. When there are multiple consumers in a group, how does each consumer consume?

First of all:partitionThe distribution is average

Hypothesis 1:topic1There are three partitions below. They are as follows: P1 – P3. thatgroupAThe corresponding consumption of three consumerspartitionAs follows

instance1: p1
instance2: p2
instance3: p3

Hypothesis 2:topic1There are eight partitions below. They were P1 – P8. thatgroupAEach consumer in thepartitionJust like this

instance1: p1,p2,p3
instance2: p4,p5,p6
instance3: p7,p8

2.2、partitionRedistribution

Hypothesis 3:topic1There are eight partitions below: P1 – P8.groupAThere are three consumers: C1, C2, C3. Thepartitionas follows

c1: p1,p2,p3
c2: p4,p5,p6
c3: p7,p8

If at this time, there is a new consumer to joingroupAWhat will happen?partitionWill be redistributed

c1: p1,p2
c2: p3,p4
c3: p5,p6
c4: p7,p8

3. Consumer sideAPIintroduce

3.1 subscription topics

void subscribe(Collection<String> topics);
void subscribe(Collection<String> topics, ConsumerRebalanceListener callback);

In terms of methodkafkaAllow a consumer to subscribe to multipletopic

void subscribe(Pattern pattern);
void subscribe(Pattern pattern, ConsumerRebalanceListener callback);

Entering the referencePatternMeans that regular expressions can be used to match multipletopicThe example code is as follows

Pattern pattern = Pattern.compile("test?");
consumer.subscribe(pattern);

You can subscribe to a topic, and naturally you can unsubscribe

consumer.unsubscribe();

Of course, you can also get the topic of the consumer group subscription directly

Set<String> topics = consumer.subscription();

There are more than one topicpartitionIs it possible to specify the queue to be consumed? The answer is yes

TopicPartition p1 = new TopicPartition("test1", 0);
TopicPartition p2 = new TopicPartition("test1", 1);
consumer.assign(Arrays.asList(p1, p2));

However, it should be noted that if the consumption partition is specified, the consumer cannot automaticallyrebanlanceYes.

3.2. Message consumption

ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));

From this line of code on the consumer side, we can see that,kafkaMessage consumption adopts pull mode. When a message is not pulled, the thread is blocked.

pollMethodConsumerRecordsrealizationIterableInterface, yesConsumerRecordIterator of.ConsumerRecordThe properties are relatively simple

public class ConsumerRecord<K, V> { 
    Private final string topic; // topic
    Private Final int partition; // partition
    Private final long offset; // the partition offset of the message
    Private final long timestamp; // timestamp
    Private final timestamptype timestamptype; // both types, the timestamp of message creation and the timestamp of message appending to the log 
    private final int serializedKeySize;
    private final int serializedValueSize; 
    Private Final headers headers; // sent headers
    Private Final K key; // key sent
    Private Final V value; // content sent
    Private volatile long checkout; // CRC32 check value
}

3.3 displacement submission

For partitions, there is a unique messageoffsetRepresents the location of the message in the partition, calledOffset。 For news consumption, there are also consumption progressoffsetIt is calleddisplacement
kafkaStore the consumption progress of the message in thekafkaInternal theme__onsumer_offsetMedium.
kafkaDefault every5sSave the consumption progress of the message. Can be passed throughauto.commit.interval.msConfigure.

kafkaProvide manually submittedAPILet’s show you.

Properties prop = new Properties();
prop.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
prop.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
prop.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
prop.put(ConsumerConfig.GROUP_ID_CONFIG, "testConsumer");
prop.put(ConsumerConfig.CLIENT_ID_CONFIG, "consumerDemo");
prop.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(prop);
consumer.subscribe(Collections.singleton("test"));
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
    for (ConsumerRecord<String, String> record : records) {
        System.out.println (consumption:+ record.toString ());
    }
    consumer.commitSync();
}

It should be noted thatenable.auto.commitSet totrue.

3.4. Where to set up a new consumption group

kafkaset upNew consumer groupThe configuration from which to start consumption is as follows:auto.offset.reset
The configuration has the following three configuration items

  • latest(default configuration)

Default from the latest location, start consumption.

  • earliest

Start spending from the earliest location. When configured to this parameter,kafkaThe following logs will be printed:Resetting offset for partition

  • none

When the consumption group has no corresponding consumption progress, it will directly throwNoOffsetForPartitionExceptionabnormal

kafkaAlso provided areseek(TopicPartition partition, long offset)Which consumer is allowed to start with a new location.

//Because the action of allocating partitions occurs in the pool, it is necessary to pull the message before setting the consumption offset
Set<TopicPartition> assignment = new HashSet<>();

while (assignment.size() == 0) {
    consumer.poll(Duration.ofMillis(100));
    assignment = consumer.assignment();
}

for (TopicPartition tp : assignment) {
    consumer.seek(tp, 50);
}
while (true) {
    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
    for (ConsumerRecord<String, String> record : records) {
        System.err.println (consumption:+ record.toString ());
    }
}

In more cases, we may specify the consumption group to start consumption at a specified point in time

Map<TopicPartition, Long> timestampToSearch = new HashMap<>();
for (TopicPartition tp : assignment) {
    //Designated to start consumption one day ago
    timestampToSearch.put(tp, System.currentTimeMillis() - 1 * 24 * 3600 * 1000);
}

Map<TopicPartition, OffsetAndTimestamp> offsets = consumer.offsetsForTimes(timestampToSearch);

for (TopicPartition tp : assignment) {
    OffsetAndTimestamp timestamp = offsets.get(tp);
    if (null != timestamp) {
        consumer.seek(tp, timestamp.offset());
    }
}

3.5. Regional rebalancing

During partition rebalancing, consumers in the consumer group cannot read messages. And if the previous consumers do not submit the consumption progress in time, it will lead to repeated consumption.

kafkastaysubscribeA callback function is provided to allow us to control when triggering rebalancing

void subscribe(Collection<String> topics, ConsumerRebalanceListener listener)

to glance atConsumerRebalanceListenerDefined interface

//Before rebalancing starts and before the consumer stops reading the message, it can be used to submit consumption displacement
void onPartitionsRevoked(Collection<TopicPartition> partitions);

//After repartition, it is called before the consumer starts to read the message
void onPartitionsAssigned(Collection<TopicPartition> partitions);

Here’s how to submit a consumption offset before rebalancing

consumer.subscribe(Collections.singleton("test"), new ConsumerRebalanceListener() {

    //It is called before the rebalancing starts and before the consumer stops reading the message, which can be used to submit the consumption displacement
    @Override
    public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
        //Submit consumption offset
        consumer.commitSync();
    }
    
    //After repartition, it is called before the consumer starts to read the message
    @Override
    public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
    }
});

3.6. Consumer interceptors

Consumers are allowed toBefore consumptionAfter consumption offset is submittedBefore closingTo control, multiple interceptors form an interceptor chain, and multiple interceptors need to be separated by ‘,’ before.
Let’s look at the interface defined by the interceptor

public interface ConsumerInterceptor<K, V> extends Configurable {
    //Before news consumption
    ConsumerRecords<K, V> onConsume(ConsumerRecords<K, V> records);
    
    After // after submission
    void onCommit(Map<TopicPartition, OffsetAndMetadata> offsets);
    
    // close before calling
    void close();
}
Properties prop = new Properties();
prop.put(ConsumerConfig.INTERCEPTOR_CLASSES_CONFIG, MyConsumerInterceptor.class.getName() + "," + MyConsumerInterceptor2.class.getName());

Important parameters of consumers

  • fetch.min.bytes

default1BpollThe minimum amount of data pulled.

  • fetch.max.bytes

default5242880B,50MB,pollThe maximum amount of data pulled.

  • fetch.max.wait.ms

default500ms, ifkafkaIt hasn’t been triggeredpollAction, then wait at mostfetch.max.wait.ms

  • max.partition.fetch.bytes

default1048576B, 1MB, the maximum amount of data in partition pull

  • max.poll.records

default500, the maximum number of messages pulled

  • connections.max.idle.ms

default540000ms, 9 minutes, how long to close idle connections

  • receive.buffer.bytes

default65536B64KBSOCKETReceived message buffer(SO_RECBUF

  • request.timeout.ms

default30000ms, configurationconsumerThe maximum time to wait for a response to a request

  • metadata.max.age.ms

default300000ms, 5 minutes, configure metadata expiration time. If the metadata is not updated within a limited period of time, it will be forced to update

  • reconnect.backoff.ms

default50ms, configure the waiting time before trying to connect to the specified host to avoid frequent connection to the host

  • retry.backoff.ms

default100ms, the interval time of 2 times when sending fails

4. Summary

  1. kafkaConsumption is based on groups, and a consumption group is allowed to subscribe to more than onetopic
  2. partitionThe redistribution algorithm is the average algorithm
  3. KafkaConsumerIs not thread safe. thereforepoll()Only the current thread is pulling messages.kafkaIt is relatively troublesome to realize multi thread pull
  4. kafkaConsumer side, provideAPIIt is very flexible, allowing consumption from specified locations and manually submitting consumption offsets for a partition
  5. kafkaProvides a chain of consumer interceptors that allowBefore consumption, after submitting consumption offsetControl.

5. Similarities and differences with rocketmq

  1. RocketMQIt is suggested that one consumption group only consume onetopicAnd in the actual development, if the consumer subscribes to more than onetopicWill not work properly.kafkaConsumers can subscribe to more than 1topic
  2. RocketMQIt can ensure that messages are not lost when consuming,kafkaThere is no guarantee.
  3. RocketMQOn the consumer side, multi thread consumption is realized,kafkaNo
  4. kafkaDefault every5sSustainable consumption progress,RocketMQSo is it. howeverRocketMQThe message with the smallest offset is submitted. For example, thread a consumes 20 messages. Thread B consumes 10 messages. When thread a submits consumption progress, it will commit 10 instead of 20. This is also trueRocketMQReasons for ensuring that messages are not lost when consumed.
  5. RocketMQhappenrebalance, i.ekafkaThe redistribution of. Default andkafkaIn accordance with theAverage allocation algorithm。 howeverRocketMQIt allows user-defined redistribution algorithm and provides rich algorithm support.
  6. RocketMQAndkafkaThere is a problem of repeated consumption.
  7. From the exposedAPILet’s see,kafkaThe client will compareRocketMQMore flexible.
  8. kafkaset upNew consumer groupsThere are no additional restrictions on where to start consumption;RocketMQIt only works when there is a lot of old messages piling up.

Recommended Today

PHP 12th week function learning record

sha1() effect sha1()Function to evaluate the value of a stringSHA-1Hash. usage sha1(string,raw) case <?php $str = “Hello”; echo sha1($str); ?> result f7ff9e8b7bb2e09b70935a5d785e0cc5d9d0abf0 sha1_file() effect sha1_file()Function calculation fileSHA-1Hash. usage sha1_file(file,raw) case <?php $filename = “test.txt”; $sha1file = sha1_file($filename); echo $sha1file; ?> result aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d similar_text() effect similar_text()Function to calculate the similarity between two strings. usage similar_text(string1,string2,percent) case […]