Audio and video learning notes


This is my learning audio and video notes on XX. Com, mainly including the introduction of audio and video and the actual combat of ffmpeg. The notes are arranged according to the class process. Everyone’s foundation is different. I only write what I think I need to write down in my notes; I have five years of IOS development to learn about it~~

When I finish all my study, I will rearrange it. It’s full of dry goods. I advise you to collect it!

Getting started with audio and video FAQs

April 23, 2021

Ffmpeg related commands:

  1. Streaming: ffmpeg – re – I video address – C copy – f flv streaming server address

    -Re: push according to the original video rate to solve the problem of failure caused by different audio and video rates from the original video
    -c: V copy: copy the original s video coding method to solve the problem of unclear streaming

  2. Broadcast stream: slowly improve and supplement

  3. Collect Audio: ffmpeg – f (frame name) avfoundation – I (device, name or index): 0 file name

  4. Playing PCM data: ffplay – AR (sampling rate) 44100 – AC (number of channels) 2 – f (sampling size) f32le file name

  5. Ffmpeg generate AAC file command: ffmpeg – I (input file) xxx.mp4 – VN (video none filter video) – C: a (encoder codec: Audio) libfdk_ AAC (the best encoder) – AR 44100 – channels 2 – Profile: a (a = audio for setting parameters for codec) AAC_ he_ v2 xxx.aac
    The default ffmpeg does not have FDK_ For the AAC encoder, I need to download the FDK and then reinstall it. My Mac installation is quite tortuous. I need to use brew install homebrew ffmpeg / ffmpeg / ffmpeg -- with FDK AAC. If the brew version is low and the download is unsuccessful, some things may not be downloaded and need to be updated. When the update is completed, I say that the command line tool version is too low and needs to be updated, I directly used the brew prompt installation command to update the system directly, and it took me a whole afternoon to finish it;

For more parameters of libfdk codec, please visit:

Linux Basics:

1. Basic command

    1. LS displays the subdirectories of the current directory

    2. CD enters a directory

    3. PWD get the current directory

    4. MKDIR create folder

    5. CP copy a file to another place

    6. RM deletes a single file RM -rf circularly deletes subdirectories

    7. Sudo operates as an administrator

    8. PKG config link c / C + + Library

    9. Echo writes data to a file. If there is no such file, it will be created, such as echo "dada" test.txt

    10. Cat view file content

    11. Which gets the directory address of an environment variable

    12. | grep find the details of a command

2. Detailed command address of vim basic command: [HTTPS: //] (HTTPS: //) 

    1.: W save

    2.: Q exit

    3. I editing documents

    4. H cursor moves left, J moves up, K moves down, l moves right   

3. Environment variables in Linux 

Address of environment variable in MAC: ~ /. Bash_ Profile uses the source command to make the environment variable effective, and the PKG config command directly obtains the PKG_ CONFIG_ Find the. PC file from the address in the path environment, such as: PKG config -- LIBS -- cflags library name, where -- LIBS refers to the address of the Library -- cflags library header file

Learn more about address:

1. The path executable binary file can be called as a command after being added

    2. PKG_ CONFIG_ Path is used to place the. PC library, the same as LD_ LIBRARY_ PATH

    3. LD_ LIBARARY_ Path is used to place the. So library. If the installed. So library is not under / usr / lib: / usr / lib64: / lib: / lib64: / usr / local / lib: / usr / local / lib64, it needs to be added to the environment variable before the system can call the library

Compile and install ffmpeg under mac

Download ffmpeg address:[ ]( )

Compile ffmpeg  

CD to the downloaded ffmpeg file and execute the following command 

. / configure -- prefix = installation address (/ usr / local / ffmpeg) -- enable debug = 3 (allow debugging) -- Disable static -- enable shared make - J (specify how many processes are running concurrently to increase compilation speed) 4 make install

C compilation and execution

1. Compilation: GCC / clang - G (debug mode) - O (specify the name of the output executable) bin XXX. C;

2. Execution:.. / file name

Fundamentals of C language:

1. The pointer can operate on the pointer or obtain the content pointed to by the pointer. Obtain the content of the pointer * variable name  

2. Allocation and release of heap memory

    1. Memory allocation void * V = malloc (size) malloc belongs to the "stdlib" library;

    2. Free memory (V) v = null;

3. Some difficult uses of C language

    1. Function pointer declaration: the polymorphism in the return value type (* function pointer variable name) (formal parameter list) object-oriented is actually implemented with the function pointer, as well as the message in the runtime in IOS development_ Invoke and message_ The send method also uses the function pointer to send messages;

    2. File operation

        1. FILE *file 

         2. Open the file * fopen (path, mode), where mode w = = write r = = read; 

         3. Write file fwrite (string pointer, size of each character, total size, file);

         4. Read the file FREAD (buffer, size of each character, total size, file);

         5. Close the file Fclose (file);
Audio and video learning notes

Audio processing flow:

1. Processing flow of live broadcast client

    1. Audio and video acquisition 

    2. Audio and video coding: lossy coding lossless coding

    3. Transmit to the viewing end 

    4. Viewing end decoding rendering

2. Audio data flow

    1. Convert the collected analog signal into digital signal, and the data format is generally PCM;

    2. The compression code is AAC / MP3, etc;

    3. Generating multimedia files is equivalent to setting a set from the outside, such as MP4 / flv;

Fundamentals of sound:

1. Human auditory range: 20Hz ~ 20KHz

2. The frequency of normal people speaking is 85Hz - 1100hz

3. Three elements of sound:

    1. Tone: the speed of the audio is from low to high. The higher the boys girls children audio, the better the sound will be;

    2. Volume: amplitude of object vibration;

    3. Timbre: harmonics are composed of sounds of many different frequencies,
Audio and video learning notes

Analog-to-digital conversion: it is to convert analog signals into digital signals, that is, analog signals can be converted into square waves that can be recognized by the computer

1. Quantize and sample the sound, such as the following figure: sample a section of frequency every 0.25. In fact, the general sampling rate is 48000 times; The larger the sampling rate, the larger the data, and the higher the degree of restoration

2. Sampling size: how many bits are used to store a sample. Commonly used is 16bit;

3. Sampling rate: common sampling: 8K 16K 32K 44.1k 48K; The larger the sampling rate, the larger the data, and the higher the degree of restoration;

4. Number of channels: mono, dual and multichannel

PCM is the data sampled in one second = sampling size * sampling frequency * number of channels;
Audio and video learning notes

Audio raw data

PCM raw audio data

Wav can store both original data and compressed data, that is, a header is added to the original data to facilitate identification and processing;
Audio and video learning notes
Audio and video learning notes

April 25, 2021

audio recording

  1. Audio acquisition

    • open device
      When introducing the ffmpeg dynamic library, I need to sign the dynamic library. After signing, the compiler will still report an error and no signature. My operation is very strange. I need to delete the library from Xcode, and then import the library one by one to avoid an error. If I trust it at one time, an error will also be reported
      void openDevice(AVFormatContext **context) {
          //Register device
        //Set the acquisition method avfoundation under Mac OS sdhow under Windows alsa under Linux
        AVInputFormat *format = av_find_input_format("avfoundation");
        //Turn on the audio device
        //The identification format inside is [: ] where 0 is written to obtain the first audio device
        char *deviceName = ":0";
        AVDictionary *options = NULL;
        int result = avformat_open_input(&*context, deviceName, format,     &options);
        if (result != 0) {
            char error[1024];
            char *errorStr = av_make_error_string(error, 1024, result);
            Printf ("failed to open audio device:% s", errorstr);
    • Read data
      //Read data from context
      void read_audio_frame(AVFormatContext **context) {
        AVPacket pkt;
        int result = 0;
        int count = 0;
        while (count < 500) {
            result = av_read_frame(*context, &pkt);
            if (pkt.size > 0) {
                av_log(NULL, AV_LOG_INFO, "audio frame size == %d  data == %p  count == %d\n", pkt.size,,count);
            } else {
                //Resource is temporarily unavailable because the device is not ready and is still processing data because it is obtained too frequently
                if (result != -35) {
                    char error[1024] = {0,};
                    av_make_error_string(error, 1024, result);
                    av_log(NULL, AV_LOG_ERROR, "read audio frame failured: %s    errorcode == %d\n",error,result);
        //Turn off audio recording
        av_log(NULL, AV_LOG_DEBUG, "recorder  finished");
    • Store data to a file
      Open file
      const char *adrress = "/Users/wangning/Desktop/learning/ady_audio.pcm";
      //Create and open file w write B binary + create
      FILE *file = fopen(adrress, "wb+");

      Start writing

      //Write data to file
      void write_audio_file(FILE *file,AVPacket pkt) {
         fwrite(, pkt.size, 1, file);
         //In order to improve efficiency, the system does not write to the file immediately when writing, but stores the data in the buffer and copies it to the file when a certain amount is reached. There may be a problem with the device, resulting in data loss. Add fflush (file) after writing;
         if (pkt.size == 0) {
             Printf ("file write complete");
  • play

    Ffplay plays PCM data: ffplay – AR (sampling rate) 44100 – AC (number of channels) 2 – f (sampling size) f32le file name

April 26, 2021

Audio coding principle:

  1. Lossy compression

    Excluding the audio signal outside the human auditory range and the masked audio signal, the signal masking can be divided into frequency domain masking and time domain masking;

  • Frequency domain masking effect: below 70dB and above 20Hz ~ 20000hz, the two frequencies are similar, and the sound with low intensity is removed

    Audio and video learning notes
    • Time domain masking effect: with the passage of time, the noise is removed before and after a time period of high sound intensity. The former is masked for 50ms, and the latter is more than 200ms. The closer the sound intensity in this period, it will be masked.

      Audio and video learning notes
2. Lossless coding: including entropy coding: Huffman coding (replacing a long string with a small binary number, the higher the frequency, the smaller the coding, the lower the frequency, and the longer the coding) arithmetic coding (using decimal) Shannon coding

Audio coding process: firstly, the data is processed through time-domain to frequency-domain converter and psychological model. The former converts the data into data of multiple frequency bands, and then eliminates the unnecessary frequency band data. The latter will remove the range sound and some composite sounds heard by non-human ears. Finally, the two are combined through quantization coding and lossless coding to form bitstream data, Before that, there will be some auxiliary data, and then the data will become very small;

Audio and video learning notes

Common audio encoders: Opus, AAC, Ogg, Speex, iLBC, AMR, G.711. The most commonly used encoder is opus AAC. Among them, opus is often used for live broadcasting, webrtc uses opus by default, and AAC is the most widely used codec; Ogg charges; Speex supports echo cancellation; G. 711 is generally used for fixed line, with serious sound loss;

Audio and video learning notes
Audio and video learning notes

AAC encoder: it is the most widely used and the most popular one at present. It is mainly used to learn this encoder

Audio and video learning notes


At present, AAC he V1 has been replaced by V2;

Audio and video learning notes
Audio and video learning notes

There are two formats for headers in AAC:

  • ADIF (audio data interchange format): you can determine the beginning of this data, which is equivalent to adding a header in front of the AAC data. The header will contain some information of the AAC data for easy encoding and decoding. The feature is that it can only be decoded from the beginning, not from the middle of audio data. This format is often used in disk files
  • ADTs (audio data transport format): there will be a synchronization word in the data of each frame, so it can start decoding at any position, just like streaming data;

ADTs structure: composed of 7-9 bytes

  • 1-12bit: all are 1, that is 0xfff, indicating synchronization words;

  • 13: Encoding specification 0 = MPEG-4, 1 = MPEG-2;

  • 14 ~ 15: always 0;

  • 16: Whether there is protection 1 means there is no CRC 0 means there is CRC;

  • 17-18: indicates the audio type of MPEG-4: AAC LC, AAC he V1, AAC he v2

  • 19-22: indicates the sampling rate

  • Add the rest later

    Audio and video learning notes


The corresponding meaning of each decimal number:

Audio Object Type: 1. AAC main 2. AAC LC 5. SBR == HE V1 29. ps == HE V2

Sample frequency: 0:96000hz, 1:88200hz, etc

adoptwebsiteYou can see the meaning in more detail

April 27, 2021

Audio resampling

Converts the value of the audio triplet (sample rate, sample size, number of channels) to another set of values

Application scenario:

The audio data collected from the equipment is inconsistent with the requirements of the encoder;
The audio data required by the speaker is inconsistent with the audio data to be played;
Convenient operation: for example, echo cancellation changes multi-channel into mono;

How to determine whether resampling is required

  • Understand the parameters of audio devices
  • View ffmpeg source code

To resample

Create resampling context
Set parameters
Initialize resampling
API: libswresample library is required

  1. swr_ alloc_ set_ Opts get context
  • out_ ch_ Layout: indicates that the sound channel can also be a layout (speaker layout)_ CH_ LAYOUT_ Stereo stereo
  • out_ sample_ FMT: output sampling format 16 = AV_ SAMPLE_ FMT_ S16 or 32 = AV_ SAMPLE_ FMT_ FLT
  • av_ sample_ fmt_ s16 in_ ch_ Layout: input channel layout
  • in_ sample_ FMT input sampling format
  • in_ sample_ Rate: the sampling rate entered
  • The last two bits are log related 0, null;
  1. swr_ Init initialization context
  2. swr_ Convert start conversion out: output result buffer out_ Count: number of samples per channel in: input buffer in_ Count: the number of samples of a single channel entered
  • Because the resampled data needs to be reconstructed
  • Create input and output buffers AV_ sample_ array_ and_ samples audio_ Data: the address of the output buffer, in which the number of samples NB_ Samples = = pkt. Size / (32 / 8) / 2 linessize: buffer size align: align 0
  • You also need to copy memcpy the data of pkt into an input data group by bytes, and you need to reference string. H

    Audio and video learning notes
  • Write output data to file

    Audio and video learning notes
  1. swr_ Free release and I / O buffer release

April 29, 2021

Ffmpeg encoding:

Create encoder avcodec

  1. avcodec_ find_ The encoder searches by name and ID
  • AV_ CODEC_ ID_ AAC | opus other encoders

  • “libfdk_aac”

Create context avcodexcontext

  1. avcodec_ alloc_ Context3 represents the third version
  • sample_ fmt = av_ sample_ FMT_ S16 AAC encoder does not support FLT 32 bits

  • chnnel_ layout = AV_ CH_ LAYOUT_ Stereo (or chanels = 2)

  • sample_rate = 44100

  • bit_ rate = 64000; (KB bit rate) optional setting

  • profile = FF_ PROFILE_ AAC_ HE_ V2; (only bit_rate = 0 is useful) optional setting

Turn on the encoder

  1. avcodex_opne2

Send data to encoder – there is a buffer inside the encoder to buffer part of the data before encoding


  1. avcodec_ send_ Frame sends data to encoder avframe
  • av_ frame_ Alloc heap initialization frame

nb_ Samples 512 samples per data frame of a single channel

Format size of each sample AV_ sample_ fmt_ s16

channel_ Layout channel AV_ ch_ layout_ stereo

  • av_ frame_ get_ Buffer allocates the size of the buffer in the frame

Also determine the buffer of the frame

Store the resampled data memcpy into frame – > data

Then insert the data in the frame into the encoder context avcodec_ send_ Frame, this function will return an int. when the result > = 0, it indicates that there is data in the encoding buffer;

  1. avcodec_ receive_ Packet read encoded data avpacket
  • av_ packet_ Alloc allocates the encoded data space

Because there is a buffer in the encoder context, in which multiple frames will be cached, not every frame will have a packet. Therefore, it is necessary to judge whether the encoder data is > = 0 through a while loop, and then through avcodec_ receive_ Packet to obtain a packet, this function will also return an int. if the return value > = 0 indicates that the acquisition is successful. If it fails, directly exit the encoding. The return value of this value has other meanings. It needs to be judged that eagain indicates that the encoder has no data or that there is data but not enough encoding. This eagain needs to be wrapped into an auxiliary (I don’t know why) averror with averror_ EOF indicates that there is no data at all;

  1. Finally, write the encoded data to the file pkt – > data, and the data format is AAC;

When recording is stopped, because there may still be data in the encoded buffer, get the encoded data again and put it into the file before closing it finally;

  1. Release resources

Release frame (av_frame_free) and packet (av_packet_frame) at the end;

————————Keep Fighting————————