⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Time:2021-9-16

background

Yes, you’re right. It’s front-end multithreading, notNode。 This exploration originated from the recent development. When encountering the development requirements related to video streaming, a special status code was found. Its name is206~

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

In order to prevent this article from being boring, I will start with the effect drawing. (in one sheet)3.7MSize picture as an example).

Animation effect comparison (single thread – left vs 10 threads – right)

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Time comparison (single thread vs 10 threads)

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Is it a little exciting to see here? Then please continue to listen to me. Let’s grab a bag first to see how the whole process happened.

`GET /360_0388.jpg HTTP/1.1
Host: limit.qiufeng.com
Connection: keep-alive
...
Range: bytes=0-102399

HTTP/1.1 206 Partial Content
Server: openresty/1.13.6.2
Date: Sat, 19 Sep 2020 06:31:11 GMT
Content-Type: image/jpeg
Content-Length: 102400
....
Content-Range: bytes 0-102399/3670627

... (file stream here)`

You can see that there is one more field in the requestRange: bytes=0-102399, the server also has one more fieldContent-Range: bytes 0-102399/3670627, and the returned status code is206.

thatRangeWhat is it? I still remember writing an article about file download a few days ago, which mentioned the way to download large files. There is a name calledRangeSomething, butLastAs a systematic overview of file download, there is norangeGive a detailed introduction.

All of the following codes are inhttps://github.com/hua1995116/node-demo/tree/master/file-download/example/download-multiple

Basic introduction to range

Origin of range

RangeYesHTTP/1.1A new field in. This feature is also the core mechanism of Xunlei that supports multi-threaded download and breakpoint download. (introductory copy, excerpt)

First, the client initiates aRange: bytes=0-xxxIf the server supports range, it will be added to the response headerAccept-Ranges: bytesTo represent a request that supports range, and then the client can initiate a request with range.

The server passes the in the request headerRange: bytes=0-xxxTo determine whether range processing is performed. If this value exists and is valid, only the requested part of the file content is sent back, the response status code becomes 206, indicating partial content, and the content range is set. If it is invalid, a 416 status code is returned, indicating request range not satisfactory. If there is no range in the request header, the server will respond normally and will not set content range, etc.

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

The format of range is:

Range:(unit=first byte pos)-[last byte pos]

NamelyRange: unit (e.g. bytes) = start byte position - end byte position

Let’s take an example. Suppose we enable multi-threaded downloading. We need to divide a 5000 byte file into four threads for downloading.

  • Range: bytes = 0-1199 first 1200 bytes
  • Range: bytes = 1200-2399 second 1200 bytes
  • Range: bytes = 2400-3599 the third 1200 bytes
  • Range: bytes = 3600-5000 last 1400 bytes

The server gives a response:

First response

  • Content-Length:1200
  • Content-Range:bytes 0-1199/5000

2nd response

  • Content-Length:1200
  • Content-Range:bytes 1200-2399/5000

3rd response

  • Content-Length:1200
  • Content-Range:bytes 2400-3599/5000

4th response

  • Content-Length:1400
  • Content-Range:bytes 3600-5000/5000

If each request succeeds, the response header returned by the server has a content range field. The content range is used in the response header to tell the client how much data has been sent. It describes the response coverage and the entire entity length. General format:

Content-Range: bytes (unit first byte pos) - [last byte pos]/[entity length]NamelyContent range: byte start byte position - end byte position / file size

Browser support

Mainstream browsers currently support this feature.

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Server support

Nginx

After nginx version 1.9.8, (plus ngx_http_slice_module) is automatically supported by default. You canmax_rangesSet to0To cancel this setting.

Node

Node does not provide access to by defaultRangeMethod, you need to write your own code for processing.

router.get('/api/rangeFile', async(ctx) => {
    const { filename } = ctx.query;
    const { size } = fs.statSync(path.join(__dirname, './static/', filename));
    const range = ctx.headers['range'];
    if (!range) {
        ctx.set('Accept-Ranges', 'bytes');
        ctx.body = fs.readFileSync(path.join(__dirname, './static/', filename));
        return;
    }
    const { start, end } = getRange(range);
    if (start >= size || end >= size) {
        ctx.response.status = 416;
        ctx.body = '';
        return;
    }
    ctx.response.status = 206;
    ctx.set('Accept-Ranges', 'bytes');
    ctx.set('Content-Range', `bytes ${start}-${end ? end : size - 1}/${size}`);
    ctx.body = fs.createReadStream(path.join(__dirname, './static/', filename), { start, end });
}) 

Or you can usekoa-sendThis library.

https://github.com/pillarjs/send/blob/0.17.1/index.js#L680

Range practice

Architectural Overview

Let’s take a look at the overview of the process architecture diagram. Single thread is very simple. You can download it normally. If you don’t understand, please refer to meLastarticle. Multithreading will be more troublesome. You need to download by piece. After downloading, you need to merge and then download. (for download methods such as blob, please refer toLast

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Server code

It’s simple, that’s rightRangeMade compatible.

router.get('/api/rangeFile', async(ctx) => {
    const { filename } = ctx.query;
    const { size } = fs.statSync(path.join(__dirname, './static/', filename));
    const range = ctx.headers['range'];
    if (!range) {
        ctx.set('Accept-Ranges', 'bytes');
        ctx.body = fs.readFileSync(path.join(__dirname, './static/', filename));
        return;
    }
    const { start, end } = getRange(range);
    if (start >= size || end >= size) {
        ctx.response.status = 416;
        ctx.body = '';
        return;
    }
    ctx.response.status = 206;
    ctx.set('Accept-Ranges', 'bytes');
    ctx.set('Content-Range', `bytes ${start}-${end ? end : size - 1}/${size}`);
    ctx.body = fs.createReadStream(path.join(__dirname, './static/', filename), { start, end });
}) 

html

Then write HTML. There’s nothing to say. Write two buttons to show it.

<!-- html -->
< button id = "download1" > serial download < / button >
< button id = "download2" > multithreaded download < / button >
<script></script>

JS public parameters

const m = 1024 * 520;  //  Slice size
const url = ' http://localhost:8888/api/rangeFile?filename=360_ 0388.jpg'; //  Address to download

Single threaded part

Single thread download code, directly request toblobMethod, and then useblobURLDownload by.

download1.onclick = () => {
    Console.time ("direct download");
    function download(url) {
        const req = new XMLHttpRequest();
        req.open("GET", url, true);
        req.responseType = "blob";
        req.onload = function (oEvent) {
            const content = req.response;
            const aTag = document.createElement('a');
            aTag.download = '360_0388.jpg';
            const blob = new Blob([content])
            const blobUrl = URL.createObjectURL(blob);
            aTag.href = blobUrl;
            aTag.click();
            URL.revokeObjectURL(blob);
            Console.timeend ("direct download");
        };
        req.send();
    }
    download(url);
}

Multithreaded part

First, send a head request to obtain the file size, and then calculate the sliding distance of each partition according to the length and the set partition size. adoptPromise.allIn the callback of, useconcatenateFunction to merge the partitioned buffer into a blob, and then useblobURLDownload by.

// script
function downloadRange(url, start, end, i) {
    return new Promise((resolve, reject) => {
        const req = new XMLHttpRequest();
        req.open("GET", url, true);
        req.setRequestHeader('range', `bytes=${start}-${end}`)
        req.responseType = "blob";
        req.onload = function (oEvent) {
            req.response.arrayBuffer().then(res => {
                resolve({
                    i,
                    buffer: res
                });
            })
        };
        req.send();
    })
}
//Merge buffer
function concatenate(resultConstructor, arrays) {
    let totalLength = 0;
    for (let arr of arrays) {
        totalLength += arr.length;
    }
    let result = new resultConstructor(totalLength);
    let offset = 0;
    for (let arr of arrays) {
        result.set(arr, offset);
        offset += arr.length;
    }
    return result;
}
download2.onclick = () => {
    axios({
        url,
        method: 'head',
    }).then((res) => {
        //Get the length to split the block
        Console.time ("concurrent download");
        const size = Number(res.headers['content-length']);
        const length = parseInt(size / m);
        const arr = []
        for (let i = 0; i < length; i++) {
            let start = i * m;
            let end = (i == length - 1) ?  size - 1  : (i + 1) * m - 1;
            arr.push(downloadRange(url, start, end, i))
        }
        Promise.all(arr).then(res => {
            const arrBufferList = res.sort(item => item.i - item.i).map(item => new Uint8Array(item.buffer));
            const allBuffer = concatenate(Uint8Array, arrBufferList);
            const blob = new Blob([allBuffer], {type: 'image/jpeg'});
            const blobUrl = URL.createObjectURL(blob);
            const aTag = document.createElement('a');
            aTag.download = '360_0388.jpg';
            aTag.href = blobUrl;
            aTag.click();
            URL.revokeObjectURL(blob);
            Console.timeend ("concurrent download");
        })
    })
}

Complete example

https://github.com/hua1995116/node-demo

`//Enter directory
cd file-download
//Start
node server.js
//Open 
http://localhost:8888/example/download-multiple/index.html`

Due to Google browser’s restrictions on a single domain name in http / 1.1, the maximum concurrency of a single domain name is 6.5

This can be reflected in the discussion of source code and official personnel.

Discussion address

https://bugs.chromium.org/p/chromium/issues/detail?id=12066

Chromium source code

// https://source.chromium.org/chromium/chromium/src/+/refs/tags/87.0.4268.1:net/socket/client_socket_pool_manager.cc;l=47
// Default to allow up to 6 connections per host. Experiment and tuning may
// try other values (greater than 0).  Too large may cause many problems, such
// as home routers blocking the connections!?!?  See http://crbug.com/12066.
//
// WebSocket connections are long-lived, and should be treated differently
// than normal other connections. Use a limit of 255, so the limit for wss will
// be the same as the limit for ws. Also note that Firefox uses a limit of 200.
// See http://crbug.com/486800
int g_max_sockets_per_group[] = {
    6,   // NORMAL_SOCKET_POOL
    255  // WEBSOCKET_SOCKET_POOL
}; 

Therefore, in order to match this feature, I divide the file into six segments, each of which is520kb(yes, write a code with a number that loves you), that is, open 6 threads to download.

I downloaded it six times with a single thread and multiple threads respectively. It seems that the speed is the same. So why is it different from what we expected?

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Explore the causes of failure

I began to compare the two requests carefully and observe the speed of the two requests.

6 threads concurrent

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Single thread

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

If we calculate at the speed of 3.7m and 82ms, it will be about 46KB in 1ms, but the actual situation can be seen that 533KB will be downloaded for about 20ms on average (the connection time and pure content download time have been planed).

I went to find some information and understood that there was something called downlink speed and uplink speed.

The actual transmission speed of the network is divided into uplink speed and downlink speed,Uplink rateIs the speed of sending data, and downlink is the speed of receiving data. ADSL is a transmission mode realized according to our habit of surfing the Internet and sending data, which is relatively small compared with downloading data. We said for 4mbroadband, our L theoretical maximum download speed is 512k / s, which is the downlink speed– Baidu Encyclopedia

What is our current situation?

Compare the server to a big water pipe. Let me use the diagram to simulate the downloading of a single thread and multiple threads. The server side is on the left and the client side is on the right. (all the following cases are considered in the ideal case, just for the purpose of simulating the process, without considering the race state influence of other programs.)

Single thread

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Multithreading

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Yes, because our server is a large water pipe, the flow rate is certain, and there is no limit on our client. If it is a single thread, it will run at the maximum speed of the user. If it is multithreading, taking three threads as an example, it is equivalent to that each thread runs one-third of the speed of the original thread. The combined speed is no different from that of a single thread.

Now I’ll explain it in several cases. What kind of situation will our multithreading take effect?

The server bandwidth is larger than the user bandwidth, and there is no restriction

In fact, the situation we encounter is similar.

The server bandwidth is much larger than the user bandwidth, limiting the single connection speed

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

If the server limits the download speed of a single broadband, this is also the case in most cases. For example, baidu cloud, for example, clearly you have a 10m broadband, but the actual download speed is only 100kb / S. in this case, we can start multi-threaded downloading, because it often limits the download of a single TCP, Of course, the online environment does not mean that users can open unlimited threads, but there will be restrictions, which will limit the maximum TCP of your current IP. In this case, the upper limit of download is often the maximum speed of your users. According to the above example, if you have reached the maximum speed by opening 10 threads, no matter how large, your entry has been restricted, then each thread will seize the speed, and it is useless to open more threads.

Improvement scheme

Because I haven’t found a relatively simple way to control the download speed of node, I introduced nginx.

We control the speed of each TCP connection at 1m / s.

Add configurationlimit_rate 1M;

preparation

1.nginx_conf

server {
    listen 80;
    server_name limit.qiufeng.com;
    access_log  /opt/logs/wwwlogs/limitqiufeng.access.log;
    error_log  /opt/logs/wwwlogs/limitqiufeng.error.log;

    add_header Cache-Control max-age=60;
    add_header Access-Control-Allow-Origin *;
    add_header Access-Control-Allow-Methods 'GET, OPTIONS';
    add_header Access-Control-Allow-Headers 'DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization,range,If-Range';
    if ($request_method = 'OPTIONS') {
        return 204;
    }
    limit_rate 1M;
    location / {
        Root your static directory;
        index index.html;
    }
}

2. Configure local host

`127.0.0.1 limit.qiufeng.com` 

Check the effect. The speed is basically normal. Multi thread download is faster than single thread. The basic speed is 5-6:1, but it is found that if you click several times quickly during the download process, you can use itRangeThe download will be faster and faster (it is suspected here that nginx has done some caching, so there is no in-depth study for the time being).

Modify the download address in the code
const url = 'http://localhost:8888/api/rangeFile?filename=360_0388.jpg';
become
const url = 'http://limit.qiufeng.com/360_0388.jpg';

Test download speed

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Remember what it said, aboutHTTP/1.1Only 6 requests can be concurrent in the same site, and the redundant requests will be placed in the next batch. howeverHTTP/2.0Without this limitation, multiplexing replacesHTTP/1.xofSequence and blocking mechanism。 Let’s upgradeHTTP/2.0Let’s test it.

A certificate needs to be generated locally. (method of generating certificate:https://juejin.im/post/6844903556722475021)

server {
    listen 443 ssl http2;
    ssl on;
    ssl_certificate /usr/local/openresty/nginx/conf/ssl/server.crt;
    ssl_certificate_key /usr/local/openresty/nginx/conf/ssl/server.key;
    ssl_session_cache shared:le_nginx_SSL:1m;
    ssl_session_timeout 1440m;

    ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers RC4:HIGH:!aNULL:!MD5;
    ssl_prefer_server_ciphers on;
    server_name limit.qiufeng.com;
 
    access_log  /opt/logs/wwwlogs/limitqiufeng2.access.log;
    error_log  /opt/logs/wwwlogs/limitqiufeng2.error.log;

    add_header Cache-Control max-age=60;
    add_header Access-Control-Allow-Origin *;
    add_header Access-Control-Allow-Methods 'GET, OPTIONS';
    add_header Access-Control-Allow-Headers 'DNT,X-Mx-ReqToken,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization,range,If-Range';
    if ($request_method = 'OPTIONS') {
        return 204;
    }
    limit_rate 1M;
    location / {
        Root is the prefix path where you store the project / node demo / file download /;
        index index.html;
    }
}

10 threads

`Modify the size of a single download
const m = 1024 * 400;`

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

12 threads

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

24 threads

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Of course, the more threads, the better. After testing, it is found that when the number of threads reaches a certain number, the speed will be slower. The following is a rendering of 36 concurrent requests.

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Practical application exploration

What’s the use of so many process downloads? Yes, it was also said at the beginning that this fragmentation mechanism is the core mechanism of Xunlei and other download software.

Netease cloud classroom

https://study.163.com/course/courseLearn.htm?courseId=1004500008

When we open the console, we can easily find the download URL and directly a streaking MP4 download address.

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Input our test script from the console.

//The test script is too long, and if you read the above article carefully, you should be able to write code. I really can't write the following code.
https://github.com/hua1995116/node-demo/blob/master/file-download/example/download-multiple/script.js

Direct download

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Multithreaded Download

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

It can be seen that the Netease cloud classroom has no restrictions on the download speed of a single TCP, and the improvement speed is not so obvious.

Baidu cloud

Let’s test the web version of Baidu cloud.

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Take a 16.6m file as an example.

Open the interface of Baidu cloud disk on the web and click download

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Click pause at this time to openChrome - > more - > download content - > right click Copy download link

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Still use the above Netease cloud course to download the course script. It’s just that you need to change the parameters.

`Change the URL to the corresponding Baidu cloud download link
M is changed to 1024 * 1024 * 2. The appropriate partition size~`

Direct download

The speed limit of Baidu Yunduo’s single TCP connection is really inhuman. It took 217 seconds!!! For a 17m file, we usually suffer a lot from it. (except VIP players)

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Multithreaded Download

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Because it is http / 1.1, we just need to start 6 or more threads to download. The following is the speed of multi-threaded download, which takes about 46 seconds.

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Let’s feel the speed difference through this figure.

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

It’s really fragrant. It’s free and only depends on our front-end to realize this function. It’s too TM fragrant. Don’t you hurry to try??

Scheme defect

1. There are certain restrictions on the upper limit of large files

becauseblobThere is an upper limit size limit in major browsers, so this method still has some defects.

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

2. The server limits the speed of a single TCP

Generally, there will be restrictions. At this time, it depends on the width and speed of the user.

ending

The article is written in a hurry, and the expression may not be particularly accurate. If there are mistakes, you are welcome to point out.

Looking back, do you have a web version of Baidu cloud acceleration plug-in? If not, create a web version of Baidu cloud download Plug-in ~.

Series articles

reference

Nginx bandwidth control:https://blog.huoding.com/2015/03/20/423

Openresty deploy HTTPS and enable http2 support:https://www.gryen.com/articles/show/5.html

Let’s talk about the range of http:https://dabing1022.github.io/2016/12/24/ Let’s talk about the range and content range of HTTP/

last

If my article helps you, I hope you can help me. Welcome to my WeChat official account.Notes of autumn wind, replyGood friendSecond, you can add wechat and join the communication group,Notes of autumn windWill always be with you.

⚡ Front end multi-threaded large file download practice, speed up 10 times, grasp Baidu cloud disk

Recommended Today

Java Engineer Interview Questions

The content covers: Java, mybatis, zookeeper, Dubbo, elasticsearch, memcached, redis, mysql, spring, spring boot, springcloud, rabbitmq, Kafka, Linux, etcMybatis interview questions1. What is mybatis?1. Mybatis is a semi ORM (object relational mapping) framework. It encapsulates JDBC internally. During development, you only need to pay attention to the SQL statement itself, and you don’t need to […]