1. Consumer consumption of demo
Properties prop = new Properties();
prop.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
prop.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
prop.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka:9092");
prop.put(ConsumerConfig.GROUP_ID_CONFIG, "testConsumer");
prop.put(ConsumerConfig.CLIENT_ID_CONFIG, "consumerDemo");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(prop);
consumer.subscribe(Collections.singleton("test"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record : records) {
String key = record.key();
String value = record.value();
System.err.println(record.toString());
}
}
2. Basic concepts of consumers
kafka
Consumers areGroup is the basic unitFor consumption. The consumption model is as follows
1topic
Allow multipleConsumer group
Consumption. Again,kafka
Consumption is in groups.
prop.put(ConsumerConfig.GROUP_ID_CONFIG, "testConsumer");
The above line of code sets up the consumption group.
2.1、partition
distribution
topic
It’s a logical concept,partition
That’s the physical concept. Well, after reading the above consumption model diagram. You may be confused. When there are multiple consumers in a group, how does each consumer consume?
First of all:partition
The distribution is average
Hypothesis 1:topic1
There are three partitions below. They are as follows: P1 – P3. thatgroupA
The corresponding consumption of three consumerspartition
As follows
instance1: p1
instance2: p2
instance3: p3
Hypothesis 2:topic1
There are eight partitions below. They were P1 – P8. thatgroupA
Each consumer in thepartition
Just like this
instance1: p1,p2,p3
instance2: p4,p5,p6
instance3: p7,p8
2.2、partition
Redistribution
Hypothesis 3:topic1
There are eight partitions below: P1 – P8.groupA
There are three consumers: C1, C2, C3. Thepartition
as follows
c1: p1,p2,p3
c2: p4,p5,p6
c3: p7,p8
If at this time, there is a new consumer to joingroupA
What will happen?partition
Will be redistributed
c1: p1,p2
c2: p3,p4
c3: p5,p6
c4: p7,p8
3. Consumer sideAPI
introduce
3.1 subscription topics
void subscribe(Collection<String> topics);
void subscribe(Collection<String> topics, ConsumerRebalanceListener callback);
In terms of methodkafka
Allow a consumer to subscribe to multipletopic
。
void subscribe(Pattern pattern);
void subscribe(Pattern pattern, ConsumerRebalanceListener callback);
Entering the referencePattern
Means that regular expressions can be used to match multipletopic
The example code is as follows
Pattern pattern = Pattern.compile("test?");
consumer.subscribe(pattern);
You can subscribe to a topic, and naturally you can unsubscribe
consumer.unsubscribe();
Of course, you can also get the topic of the consumer group subscription directly
Set<String> topics = consumer.subscription();
There are more than one topicpartition
Is it possible to specify the queue to be consumed? The answer is yes
TopicPartition p1 = new TopicPartition("test1", 0);
TopicPartition p2 = new TopicPartition("test1", 1);
consumer.assign(Arrays.asList(p1, p2));
However, it should be noted that if the consumption partition is specified, the consumer cannot automaticallyrebanlance
Yes.
3.2. Message consumption
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
From this line of code on the consumer side, we can see that,kafka
Message consumption adopts pull mode. When a message is not pulled, the thread is blocked.
poll
MethodConsumerRecords
realizationIterable
Interface, yesConsumerRecord
Iterator of.ConsumerRecord
The properties are relatively simple
public class ConsumerRecord<K, V> {
Private final string topic; // topic
Private Final int partition; // partition
Private final long offset; // the partition offset of the message
Private final long timestamp; // timestamp
Private final timestamptype timestamptype; // both types, the timestamp of message creation and the timestamp of message appending to the log
private final int serializedKeySize;
private final int serializedValueSize;
Private Final headers headers; // sent headers
Private Final K key; // key sent
Private Final V value; // content sent
Private volatile long checkout; // CRC32 check value
}
3.3 displacement submission
For partitions, there is a unique messageoffset
Represents the location of the message in the partition, calledOffset
。 For news consumption, there are also consumption progressoffset
It is calleddisplacement
。kafka
Store the consumption progress of the message in thekafka
Internal theme__onsumer_offset
Medium.kafka
Default every5s
Save the consumption progress of the message. Can be passed throughauto.commit.interval.ms
Configure.
kafka
Provide manually submittedAPI
Let’s show you.
Properties prop = new Properties();
prop.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
prop.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
prop.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
prop.put(ConsumerConfig.GROUP_ID_CONFIG, "testConsumer");
prop.put(ConsumerConfig.CLIENT_ID_CONFIG, "consumerDemo");
prop.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(prop);
consumer.subscribe(Collections.singleton("test"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record : records) {
System.out.println (consumption:+ record.toString ());
}
consumer.commitSync();
}
It should be noted thatenable.auto.commit
Set totrue
.
3.4. Where to set up a new consumption group
kafka
set upNew consumer groupThe configuration from which to start consumption is as follows:auto.offset.reset
The configuration has the following three configuration items
latest
(default configuration)
Default from the latest location, start consumption.
earliest
Start spending from the earliest location. When configured to this parameter,kafka
The following logs will be printed:Resetting offset for partition
none
When the consumption group has no corresponding consumption progress, it will directly throwNoOffsetForPartitionException
abnormal
kafka
Also provided areseek(TopicPartition partition, long offset)
Which consumer is allowed to start with a new location.
//Because the action of allocating partitions occurs in the pool, it is necessary to pull the message before setting the consumption offset
Set<TopicPartition> assignment = new HashSet<>();
while (assignment.size() == 0) {
consumer.poll(Duration.ofMillis(100));
assignment = consumer.assignment();
}
for (TopicPartition tp : assignment) {
consumer.seek(tp, 50);
}
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
for (ConsumerRecord<String, String> record : records) {
System.err.println (consumption:+ record.toString ());
}
}
In more cases, we may specify the consumption group to start consumption at a specified point in time
Map<TopicPartition, Long> timestampToSearch = new HashMap<>();
for (TopicPartition tp : assignment) {
//Designated to start consumption one day ago
timestampToSearch.put(tp, System.currentTimeMillis() - 1 * 24 * 3600 * 1000);
}
Map<TopicPartition, OffsetAndTimestamp> offsets = consumer.offsetsForTimes(timestampToSearch);
for (TopicPartition tp : assignment) {
OffsetAndTimestamp timestamp = offsets.get(tp);
if (null != timestamp) {
consumer.seek(tp, timestamp.offset());
}
}
3.5. Regional rebalancing
During partition rebalancing, consumers in the consumer group cannot read messages. And if the previous consumers do not submit the consumption progress in time, it will lead to repeated consumption.
kafka
staysubscribe
A callback function is provided to allow us to control when triggering rebalancing
void subscribe(Collection<String> topics, ConsumerRebalanceListener listener)
to glance atConsumerRebalanceListener
Defined interface
//Before rebalancing starts and before the consumer stops reading the message, it can be used to submit consumption displacement
void onPartitionsRevoked(Collection<TopicPartition> partitions);
//After repartition, it is called before the consumer starts to read the message
void onPartitionsAssigned(Collection<TopicPartition> partitions);
Here’s how to submit a consumption offset before rebalancing
consumer.subscribe(Collections.singleton("test"), new ConsumerRebalanceListener() {
//It is called before the rebalancing starts and before the consumer stops reading the message, which can be used to submit the consumption displacement
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
//Submit consumption offset
consumer.commitSync();
}
//After repartition, it is called before the consumer starts to read the message
@Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
}
});
3.6. Consumer interceptors
Consumers are allowed toBefore consumption,After consumption offset is submitted,Before closingTo control, multiple interceptors form an interceptor chain, and multiple interceptors need to be separated by ‘,’ before.
Let’s look at the interface defined by the interceptor
public interface ConsumerInterceptor<K, V> extends Configurable {
//Before news consumption
ConsumerRecords<K, V> onConsume(ConsumerRecords<K, V> records);
After // after submission
void onCommit(Map<TopicPartition, OffsetAndMetadata> offsets);
// close before calling
void close();
}
Properties prop = new Properties();
prop.put(ConsumerConfig.INTERCEPTOR_CLASSES_CONFIG, MyConsumerInterceptor.class.getName() + "," + MyConsumerInterceptor2.class.getName());
Important parameters of consumers
fetch.min.bytes
default1B
,poll
The minimum amount of data pulled.
fetch.max.bytes
default5242880B
,50MB,poll
The maximum amount of data pulled.
fetch.max.wait.ms
default500ms
, ifkafka
It hasn’t been triggeredpoll
Action, then wait at mostfetch.max.wait.ms
。
max.partition.fetch.bytes
default1048576B
, 1MB, the maximum amount of data in partition pull
max.poll.records
default500
, the maximum number of messages pulled
connections.max.idle.ms
default540000ms
, 9 minutes, how long to close idle connections
receive.buffer.bytes
default65536B
,64KB
,SOCKET
Received message buffer(SO_RECBUF
)
request.timeout.ms
default30000ms
, configurationconsumer
The maximum time to wait for a response to a request
metadata.max.age.ms
default300000ms
, 5 minutes, configure metadata expiration time. If the metadata is not updated within a limited period of time, it will be forced to update
reconnect.backoff.ms
default50ms
, configure the waiting time before trying to connect to the specified host to avoid frequent connection to the host
retry.backoff.ms
default100ms
, the interval time of 2 times when sending fails
4. Summary
kafka
Consumption is based on groups, and a consumption group is allowed to subscribe to more than onetopic
partition
The redistribution algorithm is the average algorithmKafkaConsumer
Is not thread safe. thereforepoll()
Only the current thread is pulling messages.kafka
It is relatively troublesome to realize multi thread pullkafka
Consumer side, provideAPI
It is very flexible, allowing consumption from specified locations and manually submitting consumption offsets for a partitionkafka
Provides a chain of consumer interceptors that allowBefore consumption, after submitting consumption offsetControl.
5. Similarities and differences with rocketmq
RocketMQ
It is suggested that one consumption group only consume onetopic
And in the actual development, if the consumer subscribes to more than onetopic
Will not work properly.kafka
Consumers can subscribe to more than 1topic
。RocketMQ
It can ensure that messages are not lost when consuming,kafka
There is no guarantee.RocketMQ
On the consumer side, multi thread consumption is realized,kafka
Nokafka
Default every5s
Sustainable consumption progress,RocketMQ
So is it. howeverRocketMQ
The message with the smallest offset is submitted. For example, thread a consumes 20 messages. Thread B consumes 10 messages. When thread a submits consumption progress, it will commit 10 instead of 20. This is also trueRocketMQ
Reasons for ensuring that messages are not lost when consumed.RocketMQ
happenrebalance
, i.ekafka
The redistribution of. Default andkafka
In accordance with theAverage allocation algorithm
。 howeverRocketMQ
It allows user-defined redistribution algorithm and provides rich algorithm support.RocketMQ
Andkafka
There is a problem of repeated consumption.- From the exposed
API
Let’s see,kafka
The client will compareRocketMQ
More flexible. kafka
set upNew consumer groupsThere are no additional restrictions on where to start consumption;RocketMQ
It only works when there is a lot of old messages piling up.