H264 bitstream structure
Whether parsing video files or transmitting them over the network, they are actually a string of bytes. H264 code stream is a string of bytes organized and arranged according to certain rules
Perspective of intuitive understanding
Angle of bitstream function
From the perspective of bitstream function, it can be divided into two layers: nal layer and VCL layer
- Nal network extraction layer: it is responsible for packaging and transmitting data in the appropriate way required by the network
- VCL video coding layer: including core compression engine and syntax level definitions of blocks, macroblocks and chips. The design goal is to encode efficiently independent of the network as much as possible
Bit stream analysis
It can be understood that there are Nalu units one by one
A Nalu unit is divided into two parts: nal header and RBSP (raw byte sequence payload)
A frame of image (I frame, P frame and B frame) mentioned above is a Nalu unit. In addition to representing images, Nalu unit can also contain other types of data, such as PPS and SPS. The details are listed in the next section
The previously mentioned VCL layer, or VCL data, refers to the compressed bit stream segment generated by video coding, which is called sodb (string of data bits),
Sodb is the original frame of RBSP, that is, RBSP contains sodb data
A Nalu consists of a fixed length header and RBSP
The structure of nal header is as follows:
When an error occurs in network transmission, it will be set to 1 and tell the receiver to lose the unit; Otherwise, 0
Used to indicate the importance of the current Nalu. The larger the value, the more important it is
When the decoder cannot decode, it can lose the Nalu with importance of 0
Indicates the type of Nalu data, including the following:72.png
The following should be noted:
- 1-4：I / P / B frame, if nal_ ref_ If IDC is 0, it means I frame; otherwise, it means P / B frame
- 5：IDR frame, a kind of I frame, which tells the decoder that the previously dependent decoding parameter set (SPS \ PPS, etc. to appear next) can be refreshed.
- 6: SEI, the full English name of supplementary enhancement information, is translated as “supplementary enhancement information”, which provides a method to add additional information to the video bitstream.
- 7：SPS, full name: sequence parameter set, translated as “sequence parameter set”. SPS stores a set of global parameters of coded video sequence. Therefore, this type saves the parameters related to the coding sequence.
- 8: PPS, full name: picture parameter set, translated as “image parameter set”. This type saves parameters related to the overall image.
- 9: Au separator, the full name of Au is access unit. It is a collection of one or more nalus, representing a complete frame.
SPS and PPS need to appear before the I-frame, otherwise the decoder cannot decode. The frequency of SPS and PPS also depends on different application scenarios. For a local H264 stream, it may only appear once before the first I-frame, but for a live stream, SPS or PPS should be inserted before each I-frame, because the time when the client enters the live stream is uncertain
The structure of RBSP is as follows:
This part has not been studied yet. It seems that ffmpeg has been implemented during parsing
Each Nalu is preceded by a start code 0x00 00 01 (or 0x00 00 01) as the delimiter of the Nalu
The following is an H264 code stream:
Three representative frames are analyzed:
- 00 00 00 01 67
00 01 is a Nalu start, 67 is a header, binary is 0110 0111, nal_ unit_ Type is 00111, that is, 7 is SPS frame
- 00 00 00 01 68
68 binary is 0110 1000, NAL_ unit_ Type is 00111, that is, 8 is SPS frame
- 00 00 00 01 65
65 binary is 0110 0101, NAL_ unit_ Type is 00101, that is, 5 is an IDR frame
H264 more detailed hierarchy
I haven’t figured out how many slices are available, so I’ll revise them later