H264 series — code stream composition and layered structure

Time:2021-12-5

Android ffmpeg thematic structure

H264 bitstream structure

Whether parsing video files or transmitting them over the network, they are actually a string of bytes. H264 code stream is a string of bytes organized and arranged according to certain rules

Perspective of intuitive understanding

From large to small, it is divided into: video sequence, image frame, slice, macroblock and sub block

H264 series -- code stream composition and layered structure

66.png

Angle of bitstream function

From the perspective of bitstream function, it can be divided into two layers: nal layer and VCL layer

  • Nal network extraction layer: it is responsible for packaging and transmitting data in the appropriate way required by the network
  • VCL video coding layer: including core compression engine and syntax level definitions of blocks, macroblocks and chips. The design goal is to encode efficiently independent of the network as much as possible

Bit stream analysis

It can be understood that there are Nalu units one by one

H264 series -- code stream composition and layered structure

68.png

A Nalu unit is divided into two parts: nal header and RBSP (raw byte sequence payload)

A frame of image (I frame, P frame and B frame) mentioned above is a Nalu unit. In addition to representing images, Nalu unit can also contain other types of data, such as PPS and SPS. The details are listed in the next section

The previously mentioned VCL layer, or VCL data, refers to the compressed bit stream segment generated by video coding, which is called sodb (string of data bits),
Sodb is the original frame of RBSP, that is, RBSP contains sodb data

NALU

A Nalu consists of a fixed length header and RBSP

H264 series -- code stream composition and layered structure

1720840-8f2f0d6c98874fa8.jpg

HEADER

The structure of nal header is as follows:

H264 series -- code stream composition and layered structure

1720840-0db292febd810304.jpg
  • forbidden_zero_bit
    When an error occurs in network transmission, it will be set to 1 and tell the receiver to lose the unit; Otherwise, 0
  • nal_ref_idc
    Used to indicate the importance of the current Nalu. The larger the value, the more important it is
    When the decoder cannot decode, it can lose the Nalu with importance of 0
  • nal_unit_type
    Indicates the type of Nalu data, including the following:

    H264 series -- code stream composition and layered structure

    72.png

The following should be noted:

  • 1-4:I / P / B frame, if nal_ ref_ If IDC is 0, it means I frame; otherwise, it means P / B frame
  • 5:IDR frame, a kind of I frame, which tells the decoder that the previously dependent decoding parameter set (SPS \ PPS, etc. to appear next) can be refreshed.
  • 6: SEI, the full English name of supplementary enhancement information, is translated as “supplementary enhancement information”, which provides a method to add additional information to the video bitstream.
  • 7:SPS, full name: sequence parameter set, translated as “sequence parameter set”. SPS stores a set of global parameters of coded video sequence. Therefore, this type saves the parameters related to the coding sequence.
  • 8: PPS, full name: picture parameter set, translated as “image parameter set”. This type saves parameters related to the overall image.
  • 9: Au separator, the full name of Au is access unit. It is a collection of one or more nalus, representing a complete frame.

SPS and PPS need to appear before the I-frame, otherwise the decoder cannot decode. The frequency of SPS and PPS also depends on different application scenarios. For a local H264 stream, it may only appear once before the first I-frame, but for a live stream, SPS or PPS should be inserted before each I-frame, because the time when the client enters the live stream is uncertain

RBSP

The structure of RBSP is as follows:

H264 series -- code stream composition and layered structure

1720840-7a3721b879284848.jpg

This part has not been studied yet. It seems that ffmpeg has been implemented during parsing

Each Nalu is preceded by a start code 0x00 00 01 (or 0x00 00 01) as the delimiter of the Nalu

The following is an H264 code stream:

H264 series -- code stream composition and layered structure

SouthEast.jpg

Three representative frames are analyzed:

  • 00 00 00 01 67
    00 01 is a Nalu start, 67 is a header, binary is 0110 0111, nal_ unit_ Type is 00111, that is, 7 is SPS frame
  • 00 00 00 01 68
    68 binary is 0110 1000, NAL_ unit_ Type is 00111, that is, 8 is SPS frame
  • 00 00 00 01 65
    65 binary is 0110 0101, NAL_ unit_ Type is 00101, that is, 5 is an IDR frame

H264 more detailed hierarchy

H264 series -- code stream composition and layered structure

71.png

I haven’t figured out how many slices are available, so I’ll revise them later

reference resources:
https://www.jianshu.com/p/82cc851df834