Kafka compression details (first draft)

Time:2020-2-11

Kafka compression

Generalization

If you need to understand Kafka compression, you need to understand Kafka’s storage format

Kafka storage format

RecordBatch

baseOffset: int64
batchLength: int32
partitionLeaderEpoch: int32
magic: int8 (current magic value is 2)
crc: int32
attributes: int16
    bit 0~2:
        0: no compression
        1: gzip
        2: snappy
        3: lz4
        4: zstd
    bit 3: timestampType
    bit 4: isTransactional (0 means not transactional)
    bit 5: isControlBatch (0 means not a control batch)
    bit 6~15: unused
lastOffsetDelta: int32
firstTimestamp: int64
maxTimestamp: int64
producerId: int64
producerEpoch: int16
baseSequence: int32
records: [Record]

Record

length: varint
attributes: int8
    bit 0~7: unused
timestampDelta: varint
offsetDelta: varint
keyLength: varint
key: byte[]
valueLen: varint
value: byte[]
Headers => [Header]

Record Header

headerKeyLength: varint
headerKey: String
headerValueLength: varint
Value: byte[]

Kafka compression details (first draft)

Note: picture source. It is recommended to read this article to better understand the evolution of Kafka message format

Message comparison

0000 0000 0000 0000 0000 0040 0000 0000
02e3 0171 9400 0000 0000 0000 0001 6ad9
0153 7e00 0001 6ad9 0153 7eff ffff ffff
ffff ffff ffff ffff ff00 0000 011c 0000
0006 6b65 790a 7661 6c75 6500

0000 0000 0000 0001    0000 0054 0000 0000
02e5 cb48 0600 0100 0000 0000 0001 6ad9
5427 af00 0001 6ad9 5427 afff ffff ffff
ffff ffff ffff ffff ff00 0000 011f 8b08
0000 0000 0000 0093 6160 6060 cb4e ade4
2a4b cc29 4d65 0000 55dc 0454 0f00 0000 

Analysis in a visual way

76B =======================Header============================
0000 0000 0000 0000      =>    first offset              =>    0
0000 0040                =>    length                    =>    64
0000 0000                =>    partition leader epoch    =>    0
02                       =>     magic                    =>    2
e3 0171 94               =>    crc32                     =>    3808522644
00 00                    =>    attributes                =>    0
00 0000 00               =>    last offset delta         =>    0
00 0001 6ad9 0153 7e     =>    first timestamp           =>    1558418903934
00 0001 6ad9 0153 7e     =>    max timestamp             =>    1558418903934
ff ffff ffff ffff ff     =>    producer id               =>    -1
ff ff                    =>    producer epoch            =>    -1
ff ffff ff               =>    first sequence            =>    -1
00 0000 01               =>    record count              =>    1

=======================Records===========================================
1C = > length = > 14
00 = > arrtibutes = > discard
00 = > timestamp delta = > 0
00 = > offset delta = > 0
06 = > key length = > 3
6b65 79                   =>    key                        =>    "key"
0A = > value length = > 5
7661 6c75 65              =>    value                      =>    "value"
00 = > headers counts = > 0

Turn on message compression

0000 0000 0000 0000 0000 0040 0000 0000
02e3 0171 9400 0000 0000 0000 0001 6ad9
0153 7e00 0001 6ad9 0153 7eff ffff ffff
ffff ffff ffff ffff ff00 0000 011c 0000
0006 6b65 790a 7661 6c75 6500
================Here's 76b uncompressed data=====================
0000 0000 0000 0001        first offset
0000 0054                length
0000 0000                partition leader epoch
02                        magic
e5 cb48 06                crc32
00 01                    attributes
00 0000 00                last offset delta
00 0001 6ad9 5427 af     first timestamp
00 0001 6ad9 5427 af     max timestamp
ff ffff ffff ffff ff     producer id
ff ff                     producer epoch
ff ffff ff                 first sequence
00 0000 01                 record count

1f 8b08 0000 0000 0000 0093 6160 6060 cb4e ade4 2a4b cc29 4d65 0000 55dc 0454 0f00 0000

conclusion

Message compression is only for the records section

Recommended Today

[Redis5 source code learning] analysis of the randomkey part of redis command

baiyan Command syntax Command meaning: randomly return a key from the currently selected databaseCommand format: RANDOMKEY Command actual combat: 127.0.0.1:6379> keys * 1) “kkk” 2) “key1” 127.0.0.1:6379> randomkey “key1” 127.0.0.1:6379> randomkey “kkk” Return value: random key; nil if database is empty Source code analysis Main process The processing function corresponding to the keys command is […]