Audio and video development tour (56) – H264 / AVC basic structure

Time:2022-5-25

Starting from this article, we enter the learning and practice of h264, which is mainly divided into three stages

  1. Learn H264 basic structure and bitstream protocol; 2. Understand the specific coding and compression technology; 3. Analyze and understand the relevant open source libraries x264 and h264bitstream.

In this article, let’s learn the basic structure of h264

catalogue

  1. H264 / AVC of the scheme
  2. H264 hierarchy – VCL and nal
  3. Nalu head parsing
  4. NALU payload
  5. Characteristics of I / P / B frame
  6. Slice and macroblock
  7. data
  8. harvest

1、 Objective and scheme of h264 / AVC

The standard of audio and video coding is formulated by the standard development organization, which is mainly composed of two major organizations: ISO (International Organization for standardization and International Electrotechnical Commission) and ITU-T (Telecommunications Standardization Department of the International Telecommunication Union)

MPEG-1, MPEG-2, MPEG-4 part2, formulated by ISO
H. 261, h.262 and H.263 are formulated by ITU-T
H264 / MPEG-4 part10 and hevc (h265) are jointly formulated by ISO and ITU-T

H. 264’s main objectives:

1) High video compression ratio, which is about twice as high as H.263 and MPEG-4, has been basically realized;
2) Good network affinity can be applied to various transmission networks.
Therefore, the function of H.264 is divided into two layers, namely video coding layer (VCL) and network extraction layer (NAL).
VCL data is the output of encoding processing, which represents the compressed and encoded video data sequence.
Before VCL data transmission or storage, these encoded VCL data are mapped or encapsulated into NAL units.

***Main process of h264 coding**

Audio and video development tour (56) - H264 / AVC basic structure

compression technique
H264 / MPEG-4 part10 is a widely used coding standard protocol
In order to achieve the goal, H264 adds the following compression technology on the basis of H263

1. Bidirectional motion compensation
2. Variable block motion compensation in small blocks
3. Quarter pixel motion compensation
4. Loop filter
5. Variable length coding
6. Weighted forecast
7. Scalable video coding
8. Multi view coding, etc

2、 H264 hierarchy – VCL and nal

In the previous section, in order to achieve the two goals of h264, we layered the functions of h264,Video coding layer (VCL)andNal (network abstraction layer)
Among them, the VCL (video coding layer) video coding layer includes the core compression engine and the syntax level definitions of blocks, macroblocks and slices. The design goal is to encode efficiently independent of the network as far as possible and be responsible for effectively representing the content of video data.
The nal (network abstraction layer) network extraction layer is responsible for adapting the bit string generated by VCL to various networks and multiple environments, covering all syntax levels above chip level; A Nalu unit is usually composed of [Nalu header] + [Nalu payload],
Nal encapsulates the VCL

Audio and video development tour (56) - H264 / AVC basic structure

Picture from:VCL & NAL (H.264/AVC)

3、 Nalu head parsing

In order to analyze h264, we first extract the video through the following command

Reserved encoding format: ffmpeg - I test mp4 -vcodec copy -an test_ copy. h264

Mandatory format: ffmpeg - I test mp4 -vcodec libx264 -an test. h264

Then open the extracted H264 file with 010editor, as shown below:

Audio and video development tour (56) - H264 / AVC basic structure

H264 is divided into two stream formats, one is annex-b format (the format shown in the figure above) and the other is RTP packet stream format.
Annex-b format is the default output format. The division of data unit uses [startcode] (0x000001 or 0x00000001) as the starting code.

Nalu units are often composed of [Nalu header] [Nalu payload],

Nalu header is a byte immediately after startcode, which is divided into three blocks according to bits.

Audio and video development tour (56) - H264 / AVC basic structure

among1st placeIndicates the prohibition bit. If it is 1, the use of the Nalu unit is prohibited, and if it is 0, it can be used.
Bit 2-3Is the reference level (NRI, nal ref IDC), indicating importance. The larger the value, the more important it is. For example, in the process of frame loss, it is through these two bits to judge whether the frame is dependent, and then decide whether it can be discarded.
The last fiveIndicates the type of nlau, and the meaning of its value is shown in the table below

Audio and video development tour (56) - H264 / AVC basic structure

Picture from:https://zhuanlan.zhihu.com/p/71928833
We can see that nal types are divided into two categories, VCL and non VCL. The unit containing image data belongs to VCL NAL units; SPS, PPS, and SEI belong to non VCL NAL units;

Now let’s take a look at the meaning of 06, 67 and 68 in our H264 screenshot?

0x06 — binarization — “00000110” — take the last five digits — “000 00110” value bit 6, check the above table and find that it is SEI, i.eSupplementary and enhanced information unit, a method of adding additional information to a video bitstream

0x67 — binarization — “01100111” — take the last five digits — “000 00111” value bit 7, check the above table and find that it is SPS, i.eSequence parameter set, a set of global parameters of coded video sequence are saved

0x68 — binarization — “01101000” — take the last five digits — “000 01000” value bit 8, check the above table and find that it is PPS, i.eImage parameter set, this type saves the parameters related to the overall image.

In addition to the above, more are 00 00 01 41 or 00 00 01 01. What are 41 and 01?

0x41 — binarization — “01000001” — take the last five bits — “000 00001” value bit 1, check the above table and find that it is a non IDR frame, which can be an I / P / B frame

IDR is an I-frame, which tells the decoder that the previously dependent decoding parameter set can be refreshed

0x01 — binarization — “00000001” — take the last five bits — “000 00001” value bit 1. Check the above table and find that it is a non IDR frame, which can be an I / P / B frame. Compared with 41, this frame is of low importance and can be discarded

Audio and video development tour (56) - H264 / AVC basic structure

4、 Nalu payload

The subject of Nalu involves three important nouns: EBSP, RBSP and sodb. Among them, EBSP is completely equivalent to Nalu subject, and the structural relationship of the three is as follows:
EBSP contains RBSP and RBSP contains sodb.

**Sodb: string of data bits * * original data bitstream, which is the data obtained from the most original encoding / compression

RBSP: Raw Byte Sequence Payload, also known asOriginal byte sequence load。 andSODBThe relationship is as follows:
RBSP = sodb + RBSP trailing bits
introduceRBSP Trailing BitsMake 8-bit byte complement.

EBSP: encapsulated byte sequence payload: extended byte sequence payload

IfRBSPSpecies are also includedStartCode(0x000001or0x00000001What shall I do? So, there isPrevent contention bytes(0x03

Scan when encodingRBSP, if you encounter two consecutive0x00Byte, added right afterPrevent contention bytes(0x03; When decoding, scan the sameEBSP, just reverse the operation.

Audio and video development tour (56) - H264 / AVC basic structure

Picture from:Video and video frames: H264 coding format sorting

Audio and video development tour (56) - H264 / AVC basic structure

Picture from:H264 / AVC syntax and semantic explanation (III): Nalu explanation II (EBSP, RBSP and sodb)

5、 Characteristics of I / P / B frame

In the third section, we learned from the type table of nal. 5 represents IDR frame and 1 represents non IDR frame. In this section, let’s learn about the I / P / B frame of the video.

Audio and video development tour (56) - H264 / AVC basic structure

I frame: intra picture, key frame, I frame is usually the first frame of each GOP. It is used for intra prediction compression as the reference frame of P / B frame.
The characteristics of I frame are as follows:

It is a full frame compression coded frame. It carries out JPEG compression coding and transmission of the whole frame image information; 
    When decoding, the complete image can be reconstructed only with the data of I frame; 
    I frame describes the details of image background and moving subject; 
    I frame is generated without referring to other pictures; 
    I frame is the reference frame of P frame and B frame (its quality directly affects the quality of subsequent frames in the same group); 
    I frame is the basic frame (first frame) of frame group GOP, and there is only one I frame in a group; 
    Motion vector does not need to be considered in I frame; 
    The information amount of data occupied by I frame is relatively large.

P frame: forward predictive frame, which is mainly used for inter frame coding. Refer to the previous I / P frame to remove time redundancy information. P frame has no complete picture data, but only data different from the previous I / P picture. During decoding, the difference data of this frame needs to be superimposed on the previously cached I / P frame to generate the final picture
The characteristics of P frame are as follows:

P frame is a coded frame separated by 1 ~ 2 frames after I frame; 
    The P frame adopts the method of motion compensation to transmit the difference and motion vector (prediction error) between it and the previous I or P frame; 
    When decoding, the prediction value and prediction error in the I frame must be summed before the complete P frame image can be reconstructed; 
    P frame belongs to inter frame coding of forward prediction. It only refers to the I frame or P frame closest to it in front; 
    The P frame can be the reference frame of the P frame behind it or the reference frame of the B frame before and after it; 
    Since P frame is a reference frame, it may cause the diffusion of decoding errors; 
    Due to the difference transmission, the compression of P frame is relatively high.

B frame: bi directional interpolated prediction frame, which compresses the encoded image of the amount of transmitted data by considering the time redundancy information between the previous I / P frame and the subsequent P frame; To decode B frame, not only the previous cached picture but also the decoded picture shall be obtained, and the final picture shall be obtained through the superposition of the front and rear pictures and the data of this frame. The compression rate of B frame is high, but the CPU consumption during decoding is more.
The characteristics of frame B are as follows:

B frame is predicted by the previous I or P frame and the subsequent P frame; 
    B frame transmits the prediction error and motion vector between it and the previous I or P frame and the subsequent P frame; 
    B frame is a bidirectional prediction coding frame; 
    B frame has the highest compression ratio, because it only reflects the change of motion subject between C reference frames, and the prediction is more accurate; 
    B frame is not a reference frame and will not cause the diffusion of decoding errors.

GOP: there is one between two I framesimage sequence , mainly used to describe the number of frames between one I frame and the next I frame. The first image of a sequence is IDR image (immediate refresh image), and IDR images are I frame images.
Increasing the GOP picture group can effectively reduce the encoded video volume, but it will also reduce the video quality.
The longer the GOP, the higher the proportion of B frames and the higher the rate distortion performance of coding.

useH.264 Video ES ViewerThe tool opens a test 264 file to view the type and VCL number of each nal, data size and frame type.

Audio and video development tour (56) - H264 / AVC basic structure

6、 Slice and macroblock

Relationship among GOP, frame, slice and macro

Audio and video development tour (56) - H264 / AVC basic structure

The main function of chip is to be used as the carrier of macroblock, in order to limit the spread and transmission of error code.
How to limit the spread and transmission of error codes?
Each slice should be transmitted independently of each other. The prediction of a slice (intra slice prediction and inter slice prediction) cannot take the macroblock in other slices as the reference image.

Each slice also contains two parts: header and data:

Audio and video development tour (56) - H264 / AVC basic structure

The fragment header contains information such as fragment type, macroblock type, number of frames, which image the fragment belongs to, and the settings and parameters of the corresponding frame.
In the slice data, there are macroblocks. Here is the place where we want to store pixel (yuv) data

What is a macroblock

Audio and video development tour (56) - H264 / AVC basic structure

Macroblock is the main carrier of video information, because it contains the brightness and chroma information of each pixel. The main work of video decoding is to provide an efficient way to obtain the pixel array in the macroblock from the code stream.

Component: a macroblock consists of a 16 × 16 brightness pixels and an additional 8 × 8 CB and one 8 × 8 CR color pixel blocks. In each image, several macroblocks are arranged in the form of slices.

**Common macroblock types:**
I macroblock: it adopts intra prediction macroblock, which may be located in I / P / B frame (because intra prediction can also be carried out in P and B frames)
P macroblock: unidirectional inter frame prediction is adopted, which only exists in P frames
B macroblock: bidirectional inter frame prediction is adopted, which only exists in B frame

**Relationship between slice type and macroblock type**

I slice: only I macroblock is included. I macroblock uses the decoded pixels in the current slice as the reference for intra prediction (the decoded pixels in other slices cannot be taken as the reference for intra prediction).
P slice: it can pack P and I macroblocks. The P macroblock uses the previously encoded image as the reference image for intra prediction. An intra encoded macroblock can be further divided into macroblocks: 16 × 16、16 × 8、8 × 16 or 8 × 8 luminance pixel blocks (and accompanying color pixels); If you choose 8 × The sub macroblock of 8 can be divided into sub macroblocks with the size of 8 × 8、8 × 4、4 × 8 or 4 × 4 luminance pixel blocks (and accompanying color pixels).
B-slice: it can pack B and I macroblocks, and B-macroblock uses bidirectional reference images (current and incoming encoded image frames) for intra prediction.
SP slice (switching P): used for switching between different coded streams, including P and / or I macroblocks
Si chip: the switch required in the extension level. It contains a special type of coding macroblock called Si macroblock. Si is also a necessary function in the extension level.

We use the h264visa tool to view the Nalu, slice, marcoblock and YUV data of h264 as follows:

Audio and video development tour (56) - H264 / AVC basic structure

7、 Information

This article has a lot of content. I will study the following resources, sort out and describe it in combination with my own understanding, and view and analyze it through the code stream analysis tool. Thanks for the output of the following authors.

  1. Book “full angle explanation of video coding”
  2. Book “new generation video compression coding standard – h.264/avc”
  3. Basic principle of Li chao-h264
  4. Understand video coding H264 structure in simple terms
  5. Video and video frames: H264 coding format sorting
  6. H264 coding summary
  7. VCL & NAL (H.264/AVC)

8、 Harvest

Through the study of this article

  1. Understand the structure of h264 and the layering of VCL and nal
  2. Understand the meaning and type of a byte corresponding to Nalu head, SPS, PPS, SEI, IDR, non IDR, etc., as well as the characteristics of I / P / B frames
  3. Understand the structure of Nalu payload
  4. Understand the definition and purpose of frame segmentation and macroblock.
  5. Better understanding through bitstream analysis tools combined with practice.

Thank you for reading
In the next article, we will analyze and learn the intra prediction of h264 coding technology. Welcome to pay attention to the official account “audio and video development journey” and learn and grow together.
Welcome to exchange

Recommended Today

Use of NFC developed by IOS

1、 Understanding NFCIn fact, the official documents on the use of NFC in IOS development have been very detailed. First go to the official documents:Document address:https://developer.apple.com/documentation/corenfc?language=objcNFC enabled devices:1. The mobile phone on IOS 11 only supports the function of reading, and you can read labels through the NFC function of the mobile phone2. The system […]