Kafka log cleanup


At present, there are two main strategies for Kafka log cleaning:

  1. Log deletion
  2. Log compression: log compression is to keep the last message according to the key.

Kafka provideslog.cleanup.policyThe default value is delete. You can also select compact.

Log deletion

To configure Default value Explain
log.retention.check.interval.ms 300000 (5 minutes) Detection frequency
log.retention.hours 168 (7 days) Log retention hours
log.retention.minutes Log retention time minutes
log.retention.ms Log retention time MS
file.delete.delay.ms 60000 (1 minute) Delay execution delete time
log.retention.bytes -1 infinity Run retention log file maximum
log.retention.bytes 1073741824 (1G) Log file maximum

Kafka will check whether the logs need to be deleted according to the periodic detection frequency. The log deletion strategies are mainly as follows:

  • Time based delete policy
  • Delete policy based on file size
  • Based on log file start offset
Time based deletion

Kafka will find the last record in the timestamp index file of the log segment. If the last timestamp is less than 0, the latest modification time will be taken.

After confirming the log segments to be deleted, you need to delete them as follows:

  1. Remove the log segments to be deleted from the concurrentskiplistmap of the log segments maintained in the log object to ensure that no thread reads these log segments.
  2. Add a. Delete suffix to all files in the log segment
  3. There will be a delay task named “delete file” in Kafka to delete these invalid log data

If there is data in the current log segment that needs to be deleted, Kafka will segment it first, create a new active log segment, and then delete it.

Delete based on size
  1. Calculate the sum of the log sizes that need to be deleted (current log size – maximum allowed log files)
  2. Then start from the first log segment to find the collection of log segments that can be deleted
  3. Delete last
Based on log file start offset

The starting offset of the log file is equal to the benchmark offset of the first log segment by default, but it changes with the deletion of data.

The judgment rule based on the starting offset of the log file is that if the starting offset of the next log segment of a log segment is less than the starting offset of the log file, the log segment can be added to the deletion queue and finally deleted.