Kafka offset management

Time:2019-12-10

0. Offset management

  • Offset is a long value, which only corresponds to a message.
  • The consumer submits the offset to this special topic of the consumer offset. For specific group consumption, record which partition of this topic. The calculation formula is as follows:
brokerId = Math.abs(groupId.hashCode()) % partitionCount 
//Partitioncount is the number of partitions of consumer offsets, which is 50 by default

1. Auto submit offset

Setting enable.auto.commit to true will automatically commit the offset
。 When consumer calls the poll () method, it will check the interval of {auto. Commit. Interval. MS} (default 5S). If it is satisfied, it will automatically submit the offset. But it may lead to repeated consumption of messages. For example, the offset is submitted once every 5S, but the rebalance is sent in the 3rd S. at this time, the offset will not be submitted, but the previous 3S message will be consumed repeatedly.

2. Manual submission

(1) commitsync, synchronous submission

The commitsync() function submits the offset and blocks waiting for a response from the server. If no unrecoverable errors occur, commitsync () will try until it succeeds

    while(true) {
        ConsumerRecords<String, String> records = consumer.poll(100);
        
        for (ConsumerRecord<String, String> record : records) {
        }
        try {
            consumer.commitSync();
        } catch (Exception e) {
        }
    }
(2) commitasync(), asynchronous submission
  • The commitasync() function submits the request asynchronously, but commitasync() will try only once and will not send it again if it fails. Because it is possible that after the first failure to submit the offset, the consumer consumes a new batch of data and is ready to submit the offset. If the new offset is submitted successfully (the new offset is larger than the old one), and the old offset is submitted successfully again, when rebalancing occurs, a duplicate message will appear. Commitasync() supports callback functions.
(3) synchronous and asynchronous combined submission
try {
        AtomicInteger atomicInteger = new AtomicInteger(0);
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(5);
            for (ConsumerRecord<String, String> record : records) {
            }
            consumer.commitAsync(new OffsetCommitCallback() {
                private int marker = atomicInteger.incrementAndGet();
                @Override
                public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets,
                                       Exception exception) {
                    //Verify when you want to retry when an error occurs. The incrementandget method of atomicinteger above is used for incremental guarantee. When it fails, check whether the current submission is the latest one.
                    if (exception != null) {
                        if (marker == atomicInteger.get()) consumer.commitAsync(this);
                    }
                }
            });
        }
    } catch (WakeupException e) {
        // ignore for shutdown
    } finally {
        consumer.commitSync(); //Block
        consumer.close();
        System.out.println("Closed consumer and we are done");
    }
(4) submit specific offset
        Map<TopicPartition,OffsetAndMetadata> currentOffsets = Maps.newHashMap();
        int count = 0;
        try {
            while(true) {
                ConsumerRecords<String, String> records = consumer.poll(100);
                for (ConsumerRecord<String, String> record : records) {
                    System.out.printf("topic = %s,partition = %s,offset = %s,customer = %s",
                            record.topic(),record.partition(),record.offset(),record.key());
                    currentOffsets.put(new TopicPartition(record.topic(),record.partition()),
                            new OffsetAndMetadata(record.offset() + 1));
                    if( count % 1000 == 0) {
                        // no callback
                        consumer.commitAsync(currentOffsets,null);
                    }
                    count ++;
                }
            }
        } finally {
            try {
                consumer.commitSync();
            } finally {
                consumer.close();
            }
        }

3. Offset cache

On the broker, there is a cache about the offset, which records the latest offset information of the (topic, group ID) dimension. When the consumer client commits the offset, the offset will synchronously update the cache and write it to the corresponding log file. When the broker goes down, the corresponding broker in partition replica will rebuild the cache from the log file.