Physical properties of sound – Vibration
Sound is a physical phenomenon caused by the vibration of an object, such as the string sound of a violin. The vibration of an object changes the pressure of the air around it. This sudden strong and weak change propagates around in the form of waves. When it is received by the human ear, we hear the sound.
Taking the speaker as an example, the diaphragm vibrates when the speaker makes sound. In the following figure, a small piece of paper is put on the diaphragm. The vibration of the diaphragm causes the small piece of paper to “dance hip-hop”.
The vibration of the diaphragm will cause the air next to the diaphragm to vibrate, then cause a wider range of air to vibrate with it, and finally the air next to the ear begins to vibrate.
If we only focus on a single air molecule, we can find that the track of its back and forth vibration is a curve of sine or cosine function.
Sound has amplitude, and the subjective feeling of amplitude is the size of sound. The amplitude of sound depends on the maximum offset of the average air pressure wave distance (also known as equilibrium state).
The distance from the equilibrium position to the maximum displacement position is calledamplitude（Amplitude）。
The time it takes for air molecules to vibrate completely back and forth is calledcycle(period) in seconds (s).
The number of times an object vibrates back and forth per second is calledfrequency(frequency), that is, one third of the cycle.
- The unit is one second (1 / s), also known as Hertz (Hz)
- For example, 440hz means that the object vibrates back and forth 440 times per second
- Therefore, frequency is used to express the speed of object vibration
theoretically,Human vocal frequency is 85Hz ~ 1100hz, humans onlyCan hear 20Hz ~ 20000hzSound between.
- Below 20Hz is called infrasound
- Higher than 20000 Hz is called ultrasonic
How to record the sound (vibration of sound source)? Sound belongs to analog signal, but digital signal (binary coding) is more convenient for computer processing and storage, so it needs to beanalog signal(analog signal) converted todigital signal(digital signal). This process can be called audio digitization.
The common technical scheme for digitizing audio isPulse code modulation（PCMPulse code modulation), the main process is: sampling → quantization → coding.
The waveform of analog signal is infinitely smooth and can be regarded as composed of countless points. Because the storage space is relatively limited, the points of waveform must be sampled in the process of digital coding.samplingSampling: collecting samples of analog signals at regular intervals is a process of discretizing analog signals (converting continuous signals into discrete signals) in time.
The number of samples collected per second is calledsampling rate(sampling rate, sampling frequency). For example, the sampling rate of 44.1KHz means that 44100 samples are collected in one second.
according tosampling theorem Nyquist Shannon sampling theorem (Nyquist Shannon sampling theory) knows that only when the sampling rate is higher than the highest frequency of the sound signal2Times, the collected sound signal can be uniquely restored to the original sound. The highest sound frequency that the human ear can feel is 20000 Hz. Therefore, in order to meet the auditory requirements of the human ear, it is necessary to take at least 40000 samples per second (40 kHz sampling rate). This is why the sampling rate of a common CD is 44.1KHz. The sampling rate of telephone, wireless interphone, wireless microphone, etc. is 8kHz.
quantification(quantification): digitize the sample value of each sampling point.
Bit depth(sampling precision, sampling size, bit depth): how many binary bits are used to store the sample value of a sampling point. The higher the bit depth, the more accurate the amplitude. The common CD adopts the bit depth of 16bit, which can represent 65536 (2)16）A different value. DVDs use a bit depth of 24 bit, and most telephone devices use a bit depth of 8 bit.
code: convert the sampled and quantized digital data into binary code stream.
A single channel generates a set of acoustic data, and a dual channel (stereo) generates two sets of acoustic data.
How big is the 1-minute stereo PCM data with sampling rate of 44.1KHz and bit depth of 16bit?
- Sampling rate * bit depth * number of channels * time
- 44100 * 16 * 2 * 60 / 8 ≈ 10.34MB
10.34mb per minute, which is unacceptable to most users. There are only two ways to reduce the size of audio data without changing the audio duration: reducing the sampling index and compression. It is not advisable to reduce the sampling index, which will lead to the decline of audio quality and poor user experience. Therefore, experts have developed various compression schemes.
Bit rate(bit rate) refers to the number of bits transmitted or processed per unit time. The unit is: bits per second (bit / s or BPS), and kilobits per second (kbit / s or kbps), megabits per second (Mbit / s or Mbps), gigabits per second (Gbit / s or Gbps), terabits per second (Tbit / s or TBPs).
What is the bit rate of stereo PCM data with sampling rate of 44.1KHz and bit depth of 16bit?
- Sampling rate * bit depth * number of channels
- 44100 * 16 * 2 = 1411.2Kbps
Generally, the higher the sampling rate and bit depth, the better the quality of digital audio. It can be seen from the calculation formula of bit rate that the higher the bit rate, the better the quality of digital audio.
Common audio coding and file formats
It should be noted that:Audio file format is not equal to audio coding。 For example:
WAVIt’s just a file format, not an encoding
FLACIt is both a file format and an encoding
The following is a brief introduction to common audio coding and file formats, which will be introduced in detail when necessary in the future.
|name||lossless compression||File extension|
|FLAC（Free Lossless Audio Codec）||✔️||.flac|
|ALAC（Apple Lossless Audio Codec）||✔️||.m4a/.caf|
|MP3（MPEG Audio Layer III）||❌||.mp3|
|WMA（Windows Media Audio）||❌||.wma|
|AAC（Advanced Audio Coding）||❌||.acc/.mp4/.m4a|
|WAV（Waveform Audio File Format）||.wav|
|AIFF（Audio Interchange File Format）||.aiff、.aif|
Wav (waveform audio file format) is an audio file format developed by IBM and Microsoft. The extension is.wav, PCM coding is usually adopted, which is commonly used in Windows system.
Wav file formatAs shown in the figure below, there is a 44 byte file header in front, followed by audio data (such as PCM data).
Lossy and lossless
According to the sampling rate and bit depth, compared with the signals in nature, audio coding can only be infinitely close at most. Any digital audio coding scheme is lossy because it can not be completely restored. At present, the highest fidelity level can be achieved by PCM coding. Therefore, PCM is conventionally calledUndamagedAudio coding is widely used for material preservation and music appreciation, CD, DVD and commonWAVIt is applied in all documents.
However, it does not mean that PCM can ensure the absolute fidelity of the signal, and PCM can only achieve infinite proximity to the greatest extent. We habitually include MP3DamagingThe category of audio coding is relative to PCM coding. It is difficult to be truly lossless, just like using numbers to express the PI. No matter how high the accuracy is, it is only infinitely close, not really equal to the value of the PI.