Video processing to make video live


Web developers have always wanted to use audio and video in the web, but earlier, traditional web technologies could not embed audio and video in the web, so some patented technologies such as flash and Silverlight became very popular in dealing with these contents.

These technologies can work normally, but they have a series of problems, including the inability to support HTML / CSS features, security problems, and feasibility problems.

Fortunately, when the HTML5 standard was released, it included many new features, including<video>and<audio>Tags and some JavaScript APIs are used to control them. With the continuous development of communication technology and network technology, audio and video has become an indispensable part of everyone’s life. In addition, with the gradual popularization of 5g technology, there will be more room for imagination in the field of real-time audio and video.

Next, this article will start from eight aspects and take you to explore the front-end video player and mainstream streaming media technology. After reading this article, you will understand the following:

  • Why is the video source address of video elements in some web pages in the form of blob URL;
  • What are the concepts related to HTTP range request and streaming media technology;
  • Understand the concepts of HLS and dash, adaptive bit rate streaming technology and streaming media encryption technology;
  • Understand flv file structure and flv JS’s functional characteristics, use restrictions and internal working principle;
  • Understand MSE (media source extensions) API and related usage;
  • Understand the principle of video player, multimedia packaging format and the difference between MP4 and fragmented MP4 packaging format;

Finally, it will introduce how to realize the functions of player screenshot, how to generate GIF based on screenshot, how to use canvas to play video and how to realize chroma keying.

1、 Traditional playback mode

Most web developers<video>No stranger. In the following HTML fragment, we declare a<video>Element and set the relevant attributes, and then<source>Label setting video source and video format:

<video id="mse" autoplay=true playsinline controls="controls">
   <source src="" type="video/mp4">
   Your browser does not support video tags

After the above code is rendered in the browser, a video player will be displayed in the page, as shown in the following figure:

Video processing to make video live

(image source:…

Through the chrome developer tool, we can know when to play「xgplayer-demo-720p.mp4」Three HTTP requests were sent for video files:

Video processing to make video live

In addition, it can be clearly seen from the figure that the status codes of the first two HTTP request responses are「206」。 Here, let’s analyze the request header and response header of the first HTTP request:

Video processing to make video live

In the request header above, there is onerange: bytes=0-Header information, which is used to detect whether the server supports the range request. If present in the responseAccept-RangesHeader (and its value is not “None”), it indicates that the server supports range requests.

In the response header above,Accept-Ranges: bytesThe unit indicating the scope isbytes。 hereContent-LengthIt is also valid information because it provides the full size of the video to be downloaded.

1.1 request specific scope from server

If the server supports range requests, you can use the range header to generate such requests. This header indicates which part or parts of the file the server should return.

1.1.1 single scope

We can request some part of the resource. Here, we use the rest client extension in Visual Studio code to test. In this example, we use the range header to request www.example The first 1024 bytes of the com # home page.

Video processing to make video live

For those initiated using the rest client“Single scope request”, the server will return the status code as「206 Partial Content」Response from. And in the response header「Content-Length」The header is now used to represent the size of the previously requested range (not the size of the entire file).「Content-Range」The response header indicates the location of this part of the content in the whole resource.

1.1.2 multiple ranges

The range header also supports requesting multiple parts of a document at once. The request scope is separated by a comma. For example:

$ curl -i -H "Range: bytes=0-50, 100-150"

For this request, the following response information will be returned:

Video processing to make video live

Because we are multiple parts of the request document, each part will have an independent「Content-Type」and「Content-Range」Information, and the boundary parameter is used to divide the response body.

1.1.3 conditional range request

When requesting more resource fragments again, you must ensure that the resource has not been modified since the previous fragment was received.

「If-Range」The request header can be used to generate a conditional range request: if the conditions are met, the conditional request will take effect, and the server will return a response with status code 206 partial and the corresponding message body. If the condition is not met, the status code is returned「200 OK」And return the whole resource at the same time. The header can be connected with「Last-Modified」Verifier or「ETag」Use together, but not both.

1.1.4 response to scope request

There are three states related to scope requests:

  • If the request is successful, the server will return「206 Partial Content」Status code.
  • If the requested range is out of range (the range value exceeds the size of the resource), the server will return「416 Requested Range Not Satisfiable」(the requested range cannot be met) status code.
  • If the scope request is not supported, the server will return「200 OK」Status code.

For the remaining two requests, Po will not analyze them in detail. Interested partners can use the chrome developer tool to view the specific request message.

Through the third request, we can know that the size of the whole video is about 7.9 MB. If the playing video file is too large or the network is unstable, it will lead to a long waiting time when playing, which seriously reduces the user experience.

So how to solve this problem? To solve this problem, we can use streaming media technology. Next, let’s introduce streaming media.

2、 Streaming media

Streaming media refers to the technology and process of compressing a series of media data, sending data in segments on the Internet, and transmitting video and audio on the Internet for viewing. This technology enables data packets to be sent like running water; If you do not use this technology, you must download the entire media file before using it.

Streaming media actually refers to a new way of media transmission, including sound stream, video stream, text stream, image stream, animation stream, etc., rather than a new media. The main technical feature of streaming media is streaming transmission, which enables data to be transmitted like water. Streaming transmission refers to the general term of media technology transmitted through the network. There are two main ways to realize streaming: sequential streaming and real time streaming.

Common streaming media protocols on the network at present:

Video processing to make video live

It can be seen from the above table that different protocols have different advantages and disadvantages. In actual use, we usually choose the optimal streaming media transmission protocol under the condition of platform compatibility. For example, for live broadcasting in the browser, it is good to choose http-flv protocol. The performance is better than RTMP + flash, and the delay can be the same as or even better than RTMP + Flash.

Due to the large delay of HLS, it is generally only suitable for video on demand scenes. However, due to its good compatibility at the mobile terminal, it can also be applied to live broadcast scenes under the condition of accepting high delay.

At this point, I believe some small partners will be curious about the intuitive difference between the use of streaming media technology and the traditional playback mode for video elements. Next, Po Ge takes the common HLS streaming protocol as an example to briefly compare the differences between them.

Video processing to make video live

By observing the above figure, we can clearly see that when using HLS streaming media network transmission protocol,<video>elementsrcProperty usesblob://agreement. When it comes to the protocol, we have to talk about blob and blob URL.

2.1 Blob

Blob (binary large object) represents a large object of binary type. In database management system, binary data is stored as a collection of single individuals. Blobs are usually video, sound, or multimedia files.“In JavaScript, blob type objects represent immutable raw data similar to file objects.”

BlobBy an optional stringtype(usually MIME type) andblobPartsform:

Video processing to make video live

Mime (Multipurpose Internet mail extensions) Multipurpose Internet mail extension type is a type of method that sets a file with an extension to be opened by an application. When the extension file is accessed, the browser will automatically open it by using the specified application. It is mostly used to specify the file names customized by some clients and the opening methods of some media files.

Common MIME types are: hypertext markup language text HTML text / HTML, PNG image Png image / PNG, plain text TXT text / plain, etc.

In order to feel the blob object more intuitively, let’s use the blob constructor to create a myblob object, as shown in the following figure:

Video processing to make video live

As you can see, the myblob object contains two attributes: size and type. amongsizeProperty is used to represent the size of the data in bytes,typeIs a string of MIME type. Blobs do not necessarily represent data in JavaScript native format. such asFileInterface basedBlob, it inherits the function of blob and extends it to support files on the user’s system.

2.2 Blob URL/Object URL

Blob URL / object URL is a pseudo protocol that allows blob and file objects to be used as URL sources for images, downloading binary data links, etc. In the browser, we useURL.createObjectURLMethod to create a blob URL, which receives aBlobObject and create a unique URL for it in the form ofblob:<origin>/<uuid>, corresponding examples are as follows:


Inside the browser for eachURL.createObjectURLThe generated URL stores a URL → blob mapping. Therefore, such URLs are short but accessibleBlob。 The generated URL is valid only when the current document is open. However, if the blob URL you visit no longer exists, you will receive a 404 error from the browser.

The above blob URL looks good, but in fact it also has side effects. Although the mapping of URL → blob is stored, the blob itself still resides in memory and the browser cannot release it. The mapping is automatically cleared when the document is unloaded, so the blob object is then released. However, if the application life is long, it will not happen soon. Therefore, if we create a blob URL, it will exist in memory even if we no longer need the blob.

To solve this problem, we can callURL.revokeObjectURL(url)Method to remove the reference from the internal mapping, allowing the blob to be deleted (if there are no other references) and freeing memory.

2.3 Blob vs ArrayBuffer

In fact, in the front end, exceptBlob objectIn addition, you may encounterArraybuffer object。 It is used to represent a general, fixed length raw binary data buffer. Instead of directly manipulating the contents of arraybuffer, you need to create a typedarray object or DataView object, which represents the buffer in a specific format, and use this object to read and write the contents of the buffer.

Blob object and arraybuffer object have their own characteristics. The differences between them are as follows:

  • Unless you need to use the write / edit capabilities provided by arraybuffer, blob format may be the best.
  • Blob objects are immutable, while arraybuffer can be operated through typedarrays or DataView.
  • Arraybuffer exists in memory and can be operated directly. Blobs can be located on disk, cache memory, and other unavailable locations.
  • Although blob can be passed directly to other functions as parameters, such aswindow.URL.createObjectURL()。 However, you may still need a file API such as FileReader to use with blob.
  • Blob and arraybuffer objects can be converted to each other:

    • Using FileReaderreadAsArrayBuffer()Method to convert blob object into arraybuffer object;
    • Use blob constructor, such asnew Blob([new Uint8Array(data]);, you can convert an arraybuffer object to a blob object.

In the front-end Ajax scenario, in addition to the common JSON format, we may also use blob or arraybuffer objects:

function GET(url, callback) {
  let xhr = new XMLHttpRequest();'GET', url, true);
  xhr.responseType = 'arraybuffer'; // or xhr.responseType = "blob";
  xhr.onload = function(e) {
    if (xhr.status != 200) {
      alert("Unexpected status code " + xhr.status + " for " + url);
      return false;
    callback(new Uint8Array(xhr.response)); // or new Blob([xhr.response]);

In the above example, the pass is XHR By setting different data types for responsetype, we can get the corresponding data types according to the actual needs. After introducing the above contents, let’s first introduce the HLS streaming media transmission protocol, which is widely used at present.

3、 HLS

3.1 introduction to HLS

HTTP live streaming (abbreviated as HLS) is a streaming media network transmission protocol based on HTTP proposed by apple. It is a part of Apple’s QuickTime X and iPhone software system. Its working principle is to divide the whole stream into small HTTP based files to download, and only download some at a time. When the media stream is playing, the client can choose to download the same resource from many different standby sources at different rates, allowing the streaming media session to adapt to different data rates.

In addition, when the user adjusts the intensity of the video stream to provide an excellent reproduction effect.

Video processing to make video live

(image source:…

Initially, only IOS supported HLS. But now HLS has become a proprietary format, and almost all devices support it. As its name implies, HLS (HTTP live streaming) protocol transmits video content through a standard HTTP web server. This means that you can distribute HLS content without integrating any special infrastructure.

HLS has the following features:

  • HLS will play videos encoded using H.264 or hevc / h.265 codecs.
  • HLS will play audio encoded using AAC or MP3 codec.
  • HLS video streams are generally cut into 10 second segments.
  • The transmission / packaging format of HLS is MPEG-2 ts.
  • HLS supports DRM (Digital Rights Management).
  • HLS supports various advertising standards, such as VAT and vpaid.

Why did Apple propose the HLS protocol? In fact, its main purpose is to solve some problems existing in the RTMP protocol. For example, RTMP protocol does not use standard HTTP interface to transmit data, so it may be shielded by firewall in some special network environments. However, because HLS uses HTTP protocol to transmit data, it will not be shielded by firewall under normal circumstances. In addition, it is also easy to transmit media streams through CDN (content distribution network).

3.2 HLS adaptive bitstream

HLS is an adaptive bit rate streaming protocol. Therefore, HLS stream can dynamically adapt the video resolution to everyone’s network situation. If you are using high-speed WiFi, you can stream HD video on your phone. However, if you are on a bus or subway with limited data connection, you can watch the same video at a lower resolution.

When starting a streaming media session, the client will download an extended M3U (m3u8) playlist file containing metadata to find available media streams.

Video processing to make video live

(image source:…

For your understanding, we use HLS JS, an HLS client implemented by JavaScript. For the online example provided, let’s take a look at the specific m3u8 files.



By observing the m3u8 file corresponding to master playlist, we can know that the video supports the following five videos with different definition:

  • 1920×1080(1080P)
  • 1280×720(720P)
  • 848×480(480P)
  • 512×288
  • 320×184

The media playlists corresponding to different definition videos will be defined in their m3u8 files. Here we take 720p video as an example to view its corresponding m3u8 file:


When the user selects a video with a certain definition, the media playlist (m3u8 file) corresponding to the definition will be downloaded, and the information of each segment will be listed in the list. The transmission / encapsulation format of HLS is MPEG-2 TS (MPEG-2 transport stream), which is a standard format for transmitting and storing various data including video, audio and communication protocols. It is used in digital television broadcasting systems, such as DVB, ATSC, IPTV and so on.

“It should be noted that with some ready-made tools, we can combine multiple TS files into MP4 format video files.”If we want to protect video copyright, we can consider using symmetric encryption algorithm, such as AES-128 to encrypt slices symmetrically. When the client plays, first obtain the symmetric encryption key according to the key server address configured in the m3u8 file, and then download the fragment. When the fragment download is completed, decrypt and play with the matching symmetric encryption algorithm.

Small partners interested in the above process can refer to the video HLS encrypt project on GitHub, which introduces the solution of video encryption based on HLS streaming media protocol in simple terms and provides complete example code.

Video processing to make video live

(image source:…

After introducing the HLS (HTTP live streaming) technology launched by apple, let’s introduce another dynamic adaptive stream based on HTTP – dash.

4、 Dash

4.1 dash introduction

“Dynamic adaptive streaming over HTTP (dash, also known as mpeg-dash) is an adaptive bit rate streaming technology, which enables high-quality streaming media to be transmitted over the Internet through traditional HTTP network servers.”Similar to Apple’s HTTP live streaming (HLS) scheme, mpeg-dash will decompose the content into a series of small HTTP based file segments. Each segment contains a very short length of playable content, and the total length of the content may be up to several hours.

The content will be made into alternative segments with multiple bit rates to provide multiple bit rate versions for selection. When the content is played back by the mpeg-dash client, the client will automatically choose which alternative to download and play according to the current network conditions. The client will select the highest bit rate clip that can be downloaded in time to play, so as to avoid playing jamming or re buffering events. Because of this, mpeg-dash client can seamlessly adapt to changing network conditions and provide high-quality playback experience, with less jam and re buffering incidence.

Mpeg-dash is the first adaptive bitrate streaming solution based on HTTP. It is also an international standard. Mpeg-dash should not be confused with transport protocol – mpeg-dash uses TCP transport protocol.“Unlike HLS, HDS and smooth streaming, dash doesn’t care about codecs, so it can accept content encoded in any coding format, such as h.265, H.264, VP9, etc.”

Although HTML5 does not directly support mpeg-dash, some JavaScript implementations of mpeg-dash allow mpeg-dash to be used in web browsers through HTML5 media source extensions (MSE). Other JavaScript implementations, such as bitcash player, support the use of HTML5 encrypted media extension to play mpeg-dash with DRM. When combined with webgl, the adaptive bitrate stream of mpeg-dash based on HTML5 can also realize the real-time and on-demand efficient streaming of 360 ° video.

4.2 important concepts of dash

  • MPD: the manifest of media file, which is similar to the m3u8 file of HLS.
  • Representation: corresponds to an alternative output. For example, 480p video, 720p video and 44100 sampled audio are described by representation.
  • Segment: each representation will be divided into multiple segments. Segments are divided into four categories, among which the most important ones are: initialization segment (each presentation contains one init segment) and media segment (the media content of each presentation contains several media segments).

Video processing to make video live

(image source:…

In China, bilibilibili began to use dash technology in 2018. Why choose dash technology. Interested friends can read why we use dash.

After talking so much, I believe some friends will be curious about what MPD files look like? Here, let’s take a look at the MPD file in the dash example of watermelon video player:

<?xml version="1.0"?>
<!-- MPD file Generated with GPAC version 0.7.2-DEV-rev559-g61a50f45-master  at 2018-06-11T11:40:23.972Z-->
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" minBufferTime="PT1.500S" type="static" mediaPresentationDuration="PT0H1M30.080S" maxSegmentDuration="PT0H0M1.000S" profiles="urn:mpeg:dash:profile:full:2011">
 <ProgramInformation moreInformationURL="">
  <Title>xgplayer-demo_dash.mpd generated by GPAC</Title>
 <Period duration="PT0H1M30.080S">
  <AdaptationSet segmentAlignment="true" maxWidth="1280" maxHeight="720" maxFrameRate="25" par="16:9" lang="eng">
   <ContentComponent id="1" contentType="audio" />
   <ContentComponent id="2" contentType="video" />
   <Representation id="1" mimeType="video/mp4" codecs="mp4a.40.2,avc3.4D4020" width="1280" height="720" frameRate="25" sar="1:1" startWithSAP="0" bandwidth="6046495">
    <AudioChannelConfiguration schemeIdUri="urn:mpeg:dash:23003:3:audio_channel_configuration:2011" value="2"/>
    <SegmentList timescale="1000" duration="1000">
     <Initialization range="0-1256"/>
      <SegmentURL mediaRange="1257-1006330" indexRange="1257-1300"/>
      <SegmentURL mediaRange="1006331-1909476" indexRange="1006331-1006374"/>
      <SegmentURL mediaRange="68082016-68083543" indexRange="68082016-68082059"/>


When playing a video, the watermelon video player will automatically request the corresponding slice to play according to the MPD file.

Video processing to make video live

We have mentioned BiliBili before, and then we have to mention a famous open source project – flv JS, but before introducing it, we need to understand the flv streaming media format.

5、 Flv

5.1 flv file structure

Flv is the abbreviation of flash video. Flv streaming media format is a video format developed with the launch of Flash MX. Because the file it forms is very small and the loading speed is very fast, it makes it possible to watch video files on the network. Its emergence effectively solves the problems that after the video files are imported into flash, the volume of the exported swf files is huge and can not be used well on the network.

Flv file consists of FLV header and flv body, and flv body consists of a series of tags:

Video processing to make video live

5.1.1 flv header file

Flv header file: (9 bytes)

  • 1-3: the first three bytes are the file format identification (flv 0x46 0x4c 0x56).
  • 4-4: the fourth byte is the version (0x01).
  • 5-5: the first 5 bits of the 5th byte are reserved and must be 0.

    • The 6th bit audio type flag (typeflagsaudio) of the 5th byte.
    • The 7th bit of the 5th byte is also reserved and must be 0.
    • The 8th bit video type flag (typeflagsvideo) of the 5th byte.
  • 6-9: the fourth byte of 6-9 is reserved, and its data is 00000009.
  • The length of the entire file header is generally 9 (3 + 1 + 1 + 4).

5.1.2 tag basic format

Tag type information, with a fixed length of 15 bytes:

  • 1-4: the length of the previous tag (4 bytes), and the first tag is 0.
  • 5-5: tag type (1 byte); 0x8 audio; 0x9 video; 0x12 script data.
  • 6-8: tag content size (3 bytes).
  • 9-11: timestamp (3 bytes, milliseconds) (the first tag is always 0. If it is a script tag, it is 0).
  • 12-12: timestamp extension (1 byte) changes the timestamp to 4 bytes (to store flv time information for a longer time), and this byte is used as the highest bit of the timestamp.

In the process of FLV playback, the playback sequence is played according to the timestamp sequence of tag. Any time setting data format added to the file will be ignored.

  • 13-15: streamid (3 bytes) is always 0.

The detailed structure diagram of FLV format is shown in the following figure:

Video processing to make video live

HTML5 in browser<video>It does not support direct playback of FLV video format, so flv is required JS this open source library to realize the function of playing flv video format.

5.2 flv. JS introduction

flv. JS is an HTML5 flash video (flv) player written in pure JavaScript. Its bottom layer depends on media source extensions. In the actual operation process, it will automatically parse the FLV format file and feed it to the native HTML5 video tag to play audio and video data, making it possible for the browser to play flv without the help of flash.

5.2.1 flv. JS features

  • Support playing H.264 + AAC / MP3 encoded flv files;
  • Support playing multi segment video;
  • Support playing HTTP flv low latency real-time stream;
  • Support playing flv real-time stream based on websocket transmission;
  • Compatible with chrome, Firefox, Safari 10, ie11 and edge;
  • Very low overhead, support browser hardware acceleration.

5.2.2 flv. JS limitations

  • Mp3 audio codec cannot run on ie11 / edge;
  • HTTP flv live streaming does not support all browsers.

5.2.3 flv. Use of JS

<script src="flv.min.js"></script>
<video id="videoElement"></video>
<script> if (flvjs.isSupported()) {
        var videoElement = document.getElementById('videoElement');
        var flvPlayer = flvjs.createPlayer({
            type: 'flv',
            url: ''

5.3 flv. JS working principle

flv. JS works by converting the flv file stream into an ISO bmff (fragmented mp4) segment, and then feeding the MP4 segment to HTML5 through the media source extensions API<video>Element. flv. The design architecture of JS is shown in the following figure:

Video processing to make video live

(image source:…

About flv JS working principle is introduced in more detail. Interested kids can read the article of real-time interactive streaming media player of pepper open source project. Now we have introduced HLS JS and flv JS these two mainstream streaming media solutions, in fact, their success is inseparable from the silent support of the behind the scenes hero of media source extensions. Therefore, next, Po will take you to meet MSE (media source extensions).

6、 MSE


Media source extensions API (media source extensions) provides the function of realizing web-based streaming media without plug-ins. Using MSE, media streams can be created through JavaScript and can be created by usingaudioandvideoElement to play.

In recent years, we have been able to play video and audio on Web applications without plug-ins. However, the existing architecture is too simple, which can only meet the needs of playing the whole track at one time, and can not split / merge several buffer files. Early streaming media mainly used flash for service and flash media server for video streaming through RTMP protocol.

After the implementation of media source extension (MSE), the situation is different. MSE enables us to put the usual single media filesrcReplace value with referenceMediaSourceObject (a container containing information such as the preparation status of the media file to be played), and multiple referencesSourceBufferObject representing multiple different media blocks that make up the entire stream.

To facilitate your understanding, let’s take a look at the basic MSE data flow:

Video processing to make video live

MSE allows us to control more accurately according to the size and frequency of content acquisition, or memory usage details (such as when the cache is recycled). It is the basis for building adaptive bitrate stream clients (such as dash or HLS clients) based on its extensible API.

Creating MSE compatible media in modern browsers is very time-consuming and laborious, and also consumes a lot of computer resources and energy. In addition, external applications must be used to convert the content to an appropriate format. Although the browser supports various media containers compatible with MSE, the formats of H.264 video coding, aac audio coding and MP4 container are very common, so MSE needs to be compatible with these mainstream formats. In addition, MSE also provides an API for developers to detect whether containers and codecs are supported at run time.

6.2 mediasource interface

Mediasource is the interface of the media source extensions API that represents the htmlmediaelement object of the media resource. The mediasource object can be attached to the htmlmediaelement and played on the client. Before introducing the mediasource interface, let’s take a look at its structure diagram:

Video processing to make video live

(picture source)——…

To understand the structure diagram of mediasource, we must first introduce the main process of playing a video stream by the client audio and video player:

Get streaming media – > de protocol – > de encapsulation – > audio and video decoding – > audio playback and video rendering (audio and video synchronization needs to be processed).

Video processing to make video live

Because the collected original audio and video data is relatively large, in order to facilitate network transmission, we usually use encoders, such as common H.264 or AAC, to compress the original media signal. The most common media signals are video, audio and subtitles. For example, movies in daily life are composed of different media signals. In addition to moving pictures, most movies also contain audio and subtitles.

Common video codecs include H.264, hevc, VP9 and AV1. The audio codecs are AAC, MP3 or opus. Each media signal has many different codecs. Let’s take the demo of watermelon video player as an example to intuitively experience the audio track, video track and caption track:

Video processing to make video live

Now let’s introduce the mediasource interface.

6.2.1 status

enum ReadyState {
    "Closed", // indicates that the current source is not attached to the media element.
    "Open", // the source has been opened by the media element, and the data will be added to the sourcebuffer object
    "Ended" // the source is still attached to the media element, but endofstream() has been called.

6.2.2 abnormal flow termination

enum EndOfStreamError {
    "Network", // stop playing and send a network error signal.
    "Decode" // the playback is terminated and a decoding error signal is sent.

6.2.3 constructor

interface MediaSource : EventTarget {
    readonly attribute SourceBufferList    sourceBuffers;
    readonly attribute SourceBufferList    activeSourceBuffers;
    readonly attribute ReadyState          readyState;
             attribute unrestricted double duration;
             attribute EventHandler        onsourceopen;
             attribute EventHandler        onsourceended;
             attribute EventHandler        onsourceclose;
    SourceBuffer addSourceBuffer(DOMString type);
    void         removeSourceBuffer(SourceBuffer sourceBuffer);
    void         endOfStream(optional EndOfStreamError error);
    void         setLiveSeekableRange(double start, double end);
    void         clearLiveSeekableRange();
    static boolean isTypeSupported(DOMString type);

6.2.4 properties

  • MediaSource.sourceBuffers——Read only: returns a sourcebufferlist object containing the list of sourcebuffer objects of this mediasource.
  • MediaSource.activeSourceBuffers——Read only: returns a sourcebufferlist object containing this mediasource The objects of the sourcebuffer subset in sourcebuffers – that is, the list of objects that provide the currently selected video tracks, enabled audio tracks, and displayed / hidden text tracks
  • MediaSource.readyState——Media source: it has been attached to the current collection (or it has not been attached to a buffer. Media source), or it has not been attached to the current collection (or it has not been attached to a media source) Endofstream() is closed.
  • MediaSource.duration: gets and sets the duration of streaming media currently being pushed.
  • onsourceopen: set the event handler corresponding to the SourceOpen event.
  • onsourceended: set the event handler corresponding to the sourceended event.
  • onsourceclose: set the event handler corresponding to the sourceclose event.

6.2.5 method

  • MediaSource.addSourceBuffer(): create a new sourcebuffer with the given MIME type and add it to the sourcebuffers list of mediasource.
  • MediaSource.removeSourceBuffer(): deletes the specified sourcebuffer from the sourcebuffers list in this mediasource object.
  • MediaSource.endOfStream(): indicates the end of the flow.

6.2.6 static method

  • MediaSource.isTypeSupported(): returns a Boolean value indicating whether the given MIME type is supported by the current browser – which means whether the sourcebuffer object of this MIME type can be successfully created.

6.2.7 use examples

var vidElement = document.querySelector('video');
if (window.MediaSource) { // (1)
  var mediaSource = new MediaSource();
  vidElement.src = URL.createObjectURL(mediaSource);
  mediaSource.addEventListener('sourceopen', sourceOpen); 
} else {
  console.log("The Media Source Extensions API is not supported.")
function sourceOpen(e) {
  var mime = 'video/mp4; codecs="avc1.42E01E, mp4a.40.2"';
  var mediaSource =;
  var sourceBuffer = mediaSource.addSourceBuffer(mime); // (2)
  var videoUrl = 'hello-mse.mp4';
  fetch(videoUrl) // (3)
    .then(function(response) {
      return response.arrayBuffer();
    .then(function(arrayBuffer) {
      sourceBuffer.addEventListener('updateend', function(e) { (4)
        if (!sourceBuffer.updating && mediaSource.readyState === 'open') {
      sourceBuffer.appendBuffer(arrayBuffer); // (5)

The above example describes how to use MSE API. Next, let’s analyze the main workflow:

  • (1) Judge whether the current platform supports the media source extensions API. If so, create a mediasource object and bind the SourceOpen event handler.
  • (2) Create a new sourcebuffer with the given MIME type and add it to the sourcebuffers list of mediasource.
  • (3) Download the video stream from the remote stream server and convert it into an arraybuffer object.
  • (4) Add an updateend event handler function for the sourcebuffer object to close the video stream after the transmission is completed.
  • (5) Add the converted arraybuffer format video stream data to the sourcebuffer object.

The above is just a brief introduction to MSE API. If you want to know more about its practical application, you can learn more about it「hls.js」or「flv.js」project Next, Po will introduce the basic multimedia container format of audio and video.

7、 Multimedia packaging format

Generally, a complete video file is composed of audio and video. Common avi, RMVB, MKV, ASF, WMV, MP4, 3gp, flv and other files can only be regarded as a packaging format. H. 264, hevc, VP9 and AV1 are video coding formats, and MP3, AAC and opus are audio coding formats.“For example, after an H.264 video encoded file and an aac audio encoded file are encapsulated according to the MP4 packaging standard, we can get a video file with MP4 suffix, that is, our common MP4 video file.”

The main purpose of audio and video coding is to compress the volume of the original data, and the packaging format (also known as multimedia container), such as MP4 and MKV, is used to store / transmit the encoded data and organize the audio, video, subtitles and other data according to certain rules. At the same time, it also contains some meta information, such as what coding types and time stamps are contained in the current stream. The player can match the decoder and synchronize audio and video according to these information.

In order to better understand the multimedia packaging format, let’s review the principle of video player.

7.1 principle of video player

Video player refers to the software that can play the video stored in the form of digital signal, and also refers to the electronic device products with the function of playing video. Most video players (except a few waveform files) carry decoders to restore compressed media files. Video players also have a set of built-in conversion frequency and buffer algorithms. Most video players also support playing audio files.

The basic processing flow of video playback roughly includes the following stages:

(1) Solution agreement

Delete the signaling data from the original streaming media protocol data, and only retain the audio and video data, such as the data transmitted by RTMP protocol. After decoding the protocol, output the data in FLV format.

(2) Unpacking

Separate audio and video compression and coding data. Common packaging formats include MP4, MKV, RMVB, flv and avi. Thus, the compressed and encoded video and audio data are put together. For example, after unpacking the data in FLV format, the h.264 encoded video code stream and AAC encoded audio code stream are output.

(3) Decode

Video and audio compression coding data are restored to uncompressed video and audio original data. Audio compression coding standards include AAC, MP3, AC-3, etc. video compression coding standards include H.264, MPEG2, vc-1, etc. after decoding, uncompressed video color data such as yuv420p, RGB and uncompressed audio data such as PCM are obtained.

(4) Audio and video synchronization

Send the synchronously decoded audio and video data to the system sound card and graphics card respectively for playback.

Video processing to make video live

After understanding the principle of video player, the next step is to introduce the multimedia packaging format.

7.2 multimedia packaging format

For digital media data, the container is something that can store multimedia data together, just like a packing box. It can pack audio and video data, integrate the original two independent media data, and of course, store only one type of media data.

“Sometimes, multimedia container is also called encapsulation format. It just provides a” shell “for encoded multimedia data, that is, all processed audio, video or subtitles are packaged into a file container and presented to the audience. This packaging process is called encapsulation.”Common packaging formats include MP4, MOV, TS, flv, MKV, etc. Here we introduce the familiar MP4 packaging format.

7.2.1 MP4 packaging format

MPEG-4 Part 14 (MP4) is one of the most commonly used container formats End of MP4 file. It can be used for dynamic adaptive streaming over HTTP (DASH) or for Apple’s HLS streaming. Mp4 is based on ISO basic media file format (MPEG-4 Part 12), which is based on QuickTime file format. MPEG represents the dynamic image expert group and is a cooperation between the international organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). MPEG was established to set standards for audio and video compression and transmission.

Mp4 supports a variety of codecs. The commonly used video codecs are H.264 and hevc, while the commonly used audio codec is AAC, which is the successor product of the famous MP3 audio codec.

Mp4 is composed of some columns of boxes, and its smallest constituent unit is box. All data in MP4 file is installed in box, that is, MP4 file is composed of several boxes, each box has type and length, and box can be understood as a data object block. Box can contain another box, which is called container box.

An MP4 file will have one and only one firstftypeType box, which is used as the flag of MP4 format and contains some information about the file. There will be only one box after thatmoovType of box (movie box). It is a kind of container box, which can have multiple or no. the structure of media data is described by metadata.

I believe some readers will have questions – what is the actual MP4 file structure? By using mp4box JS, we can easily view the internal structure of local or online MP4 files:

Video processing to make video live

mp4box. JS online address:…

Due to the complex structure of MP4 file (please see the figure below if you don’t believe it), we won’t continue here. Interested readers can read relevant articles by themselves.

Video processing to make video live

Next, let’s introduce the fragmented MP4 container format.

7.2.2 fragmented MP4 packaging format

The MP4 ISO base media file format standard allows boxes to be organized in a fragmented manner, which means that MP4 files can be organized into a structure consisting of a series of short metadata / data box pairs rather than a long metadata / data pair. The structure of fragmented MP4 file is shown in the following figure, which contains only two fragments:

Video processing to make video live

(picture source)——…

The fragmented MP4 file contains three key boxes:moovmoofandmdat

  • Moov (movie metadata box): used to store the meta information of multimedia file level.
  • MDAT (media data box): the same as ordinary MP4 filesmdatIt is used to store media data. The difference is that there is only one ordinary MP4 filemdatBox, and there will be one for each fragment in the fragmented MP4 filemdatBox of type.
  • Movie (movie fragment box): used to store the meta information of fragment level. This type of box does not exist in ordinary MP4 files, but in fragmented MP4 files, each fragment will have onemoofBox of type.

Fragments in fragmented MP4 filemoofandmdatEach fragment can contain an audio track or video track, and will also contain enough meta information to ensure that this part of the data can be decoded separately. The structure of fragment is shown in the following figure:

Video processing to make video live

(picture source)——…

Similarly, use mp4box JS, we can also clearly view the internal structure of the fragmented MP4 file:

Video processing to make video live

We have introduced the two container formats MP4 and fragmented MP4. Let’s summarize the main differences between them with a figure:

Video processing to make video live

8、 Practical examples

8.1 how to realize local video preview

The function of local video preview is mainly usedURL.createObjectURL()Method. URL. The createobjecturl () static method creates a domstring containing a URL representing the object given in the parameter. The lifecycle of this URL is bound to the document in the window that created it. This new URL object represents the specified file object or blob object.

<!DOCTYPE html>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    < title > Video local preview example < / Title >
    <h3>Po Ge: Video local preview example</h3>
    <input type="file" accept="video/*" onchange="loadFile(event)" />
      style="display: none;"
    <script> const loadFile = function (event) {
        const reader = new FileReader();
        reader.onload = function () {
          const output = document.querySelector("#previewContainer");
 = "block";
          output.src = URL.createObjectURL(new Blob([reader.result]));
      }; </script>

8.2 how to realize player screenshot

The screenshot function of the player mainly usesCanvasRenderingContext2D.drawImage()API. Canvasrenderingcontext2d in the canvas 2D API The DrawImage () method provides a variety of ways to draw an image on canvas.

The syntax of DrawImage API is as follows:

void ctx.drawImage(image, dx, dy); 

void ctx.drawImage(image, dx, dy, dWidth, dHeight); 

void ctx.drawImage(image, sx, sy, sWidth, sHeight, dx, dy, dWidth, dHeight);

Where the image parameter represents the element drawn to the context. Allow any canvas image source, such as cssimimagevalue, htmlimageelement, svgimageelement, htmlvideoelement, htmlcanvas element, imagebitmap or offscreen canvas.

<!DOCTYPE html>
<html lang="en">
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    < title > screenshot example of player < / Title >
    <h3>Po: player screenshot example</h3>
    <video id="video" controls="controls" width="460" height="270" crossorigin="anonymous">
      <!-- Please replace with the actual video address -- >
      <source src="" />
    < button onclick = "capturevideo()" > screenshot < / button >
    <script> let video = document.querySelector("#video");
      let canvas = document.createElement("canvas");
      let img = document.createElement("img");
      img.crossOrigin = "";
      let ctx = canvas.getContext("2d");
      function captureVideo() {
        canvas.width = video.videoWidth;
        canvas.height = video.videoHeight;
        ctx.drawImage(video, 0, 0, canvas.width, canvas.height);
        img.src = canvas.toDataURL();
      } </script>

Now we know how to get every frame of video, which is actually combined with GIF JS this library provides GIF coding function, we can quickly realize the function of intercepting video frames and generating GIF animation. Here, Po will not continue to introduce. Interested partners can read the article “using JS to directly intercept video clips to generate GIF animation”.

8.3 how to realize canvas playing video

Using canvas to play video mainly usesctx.drawImage(video, x, y, width, height)To draw the image of the current frame of the video, where the video parameter is the video object in the page. Therefore, if we continuously obtain the current picture of video according to a specific frequency and render it on the canvas, we can realize the function of playing video using canvas.

<!DOCTYPE html>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    < title > play video with canvas < / Title >
    <h3>Po: play video with canvas</h3>
    <video id="video" controls="controls" style="display: none;">
      <!-- Please replace with the actual video address -- >
      <source src="" />
      style="border: 1px solid blue;"
      Play < BTN id = "button" > play
      < button id = "pausebtn" > pause < / button >
    <script> const video = document.querySelector("#video");
      const canvas = document.querySelector("#myCanvas");
      const playBtn = document.querySelector("#playBtn");
      const pauseBtn = document.querySelector("#pauseBtn");
      const context = canvas.getContext("2d");
      let timerId = null;
      function draw() {
        if (video.paused || video.ended) return;
        context.clearRect(0, 0, canvas.width, canvas.height);
        context.drawImage(video, 0, 0, canvas.width, canvas.height);
        timerId = setTimeout(draw, 0);
      playBtn.addEventListener("click", () => {
        if (!video.paused) return;;
      pauseBtn.addEventListener("click", () => {
        if (video.paused) return;
      }); </script>

8.4 how to realize chroma keying (green screen effect)

In the previous example, we introduced the use of canvas to play video. Some friends may have questions about why they want to draw video through canvas. Isn’t the video label “fragrant”? This is because canvas providesgetImageDataandputImageDataMethod enables developers to dynamically change the display content of each frame of image. In this way, we can manipulate the video data in real time to synthesize various visual effects into the video picture being presented.

For example, the tutorial “using canvas to process video” on MDN demonstrates how to use JavaScript code to perform chroma keying (green screen or blue screen effect).

The so-called chroma keying, also known as color embedding, is a de backing synthesis technology. Chroma means solid color, while key means to take away color. Place the photographed person or object in front of the green screen, remove the back and replace it with another background. This technology is widely used in film, TV series and game production, and the color key is also an important link in virtual studio and visual effects.

Let’s take a look at the key codes:

processor.computeFrame = function computeFrame() {
    this.ctx1.drawImage(, 0, 0, this.width, this.height);
    let frame = this.ctx1.getImageData(0, 0, this.width, this.height);
    let l = / 4;
    for (let i = 0; i < l; i++) {
      let r =[i * 4 + 0];
      let g =[i * 4 + 1];
      let b =[i * 4 + 2];
      if (g > 100 && r > 100 && b < 43)[i * 4 + 3] = 0;
    this.ctx2.putImageData(frame, 0, 0);

AbovecomputeFrame()Method is responsible for acquiring a frame of data and performing chroma keying effect. Using chroma keying technology, we can also realize the real-time mask barrage of pure client.