Audio Fundamentals


Physical properties of sound – Vibration

Sound is a physical phenomenon caused by the vibration of an object, such as the string sound of a violin. The vibration of an object changes the pressure of the air around it. This sudden strong and weak change propagates around in the form of waves. When it is received by the human ear, we hear the sound.

Taking the speaker as an example, the diaphragm vibrates when the speaker makes sound. In the following figure, a small piece of paper is put on the diaphragm. The vibration of the diaphragm causes the small piece of paper to “dance hip-hop”.

Audio Fundamentals


The vibration of the diaphragm will cause the air next to the diaphragm to vibrate, then cause a wider range of air to vibrate with it, and finally the air next to the ear begins to vibrate.

Audio Fundamentals

Air vibration


If we only focus on a single air molecule, we can find that the track of its back and forth vibration is a curve of sine or cosine function.

Audio Fundamentals

Single air molecule

Sound has amplitude, and the subjective feeling of amplitude is the size of sound. The amplitude of sound depends on the maximum offset of the average air pressure wave distance (also known as equilibrium state).

Audio Fundamentals


The distance from the equilibrium position to the maximum displacement position is calledamplitude(Amplitude)。


The time it takes for air molecules to vibrate completely back and forth is calledcycle(period) in seconds (s).

Audio Fundamentals

One cycle


The number of times an object vibrates back and forth per second is calledfrequency(frequency), that is, one third of the cycle.

  • The unit is one second (1 / s), also known as Hertz (Hz)
  • For example, 440hz means that the object vibrates back and forth 440 times per second
  • Therefore, frequency is used to express the speed of object vibration

theoretically,Human vocal frequency is 85Hz ~ 1100hz, humans onlyCan hear 20Hz ~ 20000hzSound between.

  • Below 20Hz is called infrasound
  • Higher than 20000 Hz is called ultrasonic


PCM (pulse code modulation). Human ears hear analog signals, and PCM is a technology that converts sound from analog signals to digital signals.

How to record the sound (vibration of sound source)? Sound belongs to analog signal, but digital signal (binary coding) is more convenient for computer processing and storage, so it needs to beanalog signal(analog signal) converted todigital signal(digital signal). This process can be called audio digitization.

The common technical scheme for digitizing audio isPulse code modulationPCMPulse code modulation), the main process is: sampling → quantization → coding.

Audio Fundamentals

Analog signal to digital signal


The waveform of analog signal is infinitely smooth and can be regarded as composed of countless points. Because the storage space is relatively limited, the points of waveform must be sampled in the process of digital coding.samplingSampling: collecting samples of analog signals at regular intervals is a process of discretizing analog signals (converting continuous signals into discrete signals) in time.

sampling rate

The number of samples collected per second is calledsampling rate(sampling rate, sampling frequency). For example, the sampling rate of 44.1KHz means that 44100 samples are collected in one second.

sampling theorem

according tosampling theorem Nyquist Shannon sampling theorem (Nyquist Shannon sampling theory) knows that only when the sampling rate is higher than the highest frequency of the sound signal2Times, the collected sound signal can be uniquely restored to the original sound. The highest sound frequency that the human ear can feel is 20000 Hz. Therefore, in order to meet the auditory requirements of the human ear, it is necessary to take at least 40000 samples per second (40 kHz sampling rate). This is why the sampling rate of a common CD is 44.1KHz. The sampling rate of telephone, wireless interphone, wireless microphone, etc. is 8kHz.


quantification(quantification): digitize the sample value of each sampling point.

Bit depth

Bit depth(sampling precision, sampling size, bit depth): how many binary bits are used to store the sample value of a sampling point. The higher the bit depth, the more accurate the amplitude. The common CD adopts the bit depth of 16bit, which can represent 65536 (2)16)A different value. DVDs use a bit depth of 24 bit, and most telephone devices use a bit depth of 8 bit.

Audio Fundamentals

Comparison of different sampling rates and bit depths


code: convert the sampled and quantized digital data into binary code stream.

Other concepts


A single channel generates a set of acoustic data, and a dual channel (stereo) generates two sets of acoustic data.

How big is the 1-minute stereo PCM data with sampling rate of 44.1KHz and bit depth of 16bit?

  • Sampling rate * bit depth * number of channels * time
  • 44100 * 16 * 2 * 60 / 8 ≈ 10.34MB

10.34mb per minute, which is unacceptable to most users. There are only two ways to reduce the size of audio data without changing the audio duration: reducing the sampling index and compression. It is not advisable to reduce the sampling index, which will lead to the decline of audio quality and poor user experience. Therefore, experts have developed various compression schemes.

Bit rate

Bit rate(bit rate) refers to the number of bits transmitted or processed per unit time. The unit is: bits per second (bit / s or BPS), and kilobits per second (kbit / s or kbps), megabits per second (Mbit / s or Mbps), gigabits per second (Gbit / s or Gbps), terabits per second (Tbit / s or TBPs).

What is the bit rate of stereo PCM data with sampling rate of 44.1KHz and bit depth of 16bit?

  • Sampling rate * bit depth * number of channels
  • 44100 * 16 * 2 = 1411.2Kbps

Generally, the higher the sampling rate and bit depth, the better the quality of digital audio. It can be seen from the calculation formula of bit rate that the higher the bit rate, the better the quality of digital audio.

Common audio coding and file formats

It should be noted that:Audio file format is not equal to audio coding。 For example:

  • WAVIt’s just a file format, not an encoding

  • FLACIt is both a file format and an encoding

The following is a brief introduction to common audio coding and file formats, which will be introduced in detail when necessary in the future.

name lossless compression File extension
Monkey’s Audio ✔️ .ape
FLAC(Free Lossless Audio Codec) ✔️ .flac
ALAC(Apple Lossless Audio Codec) ✔️ .m4a/.caf
MP3(MPEG Audio Layer III) .mp3
WMA(Windows Media Audio) .wma
AAC(Advanced Audio Coding) .acc/.mp4/.m4a
Vorbis .ogg
Speex .spx
Opus .opus
Ogg .ogg
WAV(Waveform Audio File Format) .wav
AIFF(Audio Interchange File Format) .aiff、.aif

file format


Wav (waveform audio file format) is an audio file format developed by IBM and Microsoft. The extension is.wav, PCM coding is usually adopted, which is commonly used in Windows system.

Wav file formatAs shown in the figure below, there is a 44 byte file header in front, followed by audio data (such as PCM data).

Audio Fundamentals

Wav file format
Audio Fundamentals

Wav file format

Lossy and lossless

According to the sampling rate and bit depth, compared with the signals in nature, audio coding can only be infinitely close at most. Any digital audio coding scheme is lossy because it can not be completely restored. At present, the highest fidelity level can be achieved by PCM coding. Therefore, PCM is conventionally calledUndamagedAudio coding is widely used for material preservation and music appreciation, CD, DVD and commonWAVIt is applied in all documents.

However, it does not mean that PCM can ensure the absolute fidelity of the signal, and PCM can only achieve infinite proximity to the greatest extent. We habitually include MP3DamagingThe category of audio coding is relative to PCM coding. It is difficult to be truly lossless, just like using numbers to express the PI. No matter how high the accuracy is, it is only infinitely close, not really equal to the value of the PI.