Upload optimization of large size files


Upload optimization of large size files

Author: TJ

In the development process, we received such a problem feedback that uploading files over 100 MB on the website often failed, and it took a long time to try again, which was difficult for users who needed to upload large-scale files. So what should I do to upload quickly? Even if it fails to send again, it can continue to upload from the place where it was interrupted last time? Here’s the answer~

Warm tips: CooperationDemo source codeBetter reading together

Overall thinking

The first step is to combine the background of the project to investigate the optimized solution.
File upload failure is a commonplace problem. The common solution is to slice a large file into several small files, and upload it through the parallel request interface. After all the requests are responded to, merge all the partitioned files on the server side. When the fragment upload fails, it can be judged when uploading again. Only the part that failed last time can be uploaded to reduce the waiting time of users and relieve the pressure of the server. This is a piecemeal upload.

Big file upload

So how to achieve large file fragment upload?

The flow chart is as follows:

Upload optimization of large size files

It is divided into the following steps:

1. File MD5 encryption

MD5 is the unique identification of a file. You can use MD5 to query the upload status of a file.

According to the modification time, file name and last modification time of the file, thespark-md5MD5 of the generated file. It should be noted that large size files need to be partitioned to read files, and the contents of the read files should be added to thespark-md5Until the file is read, the final hash code is returned to the callback function. Here you can add a progress bar for file reading as needed.

Upload optimization of large size files

The implementation method is as follows:

//Modification time + file name + last modification time -- > MD5
md5File (file) {
  return new Promise((resolve, reject) => {
    let blobSlice =
      File.prototype.slice ||
      File.prototype.mozSlice ||
    let chunkSize = file.size / 100
    let chunks = 100
    let currentChunk = 0
    let spark = new SparkMD5.ArrayBuffer()
    let fileReader = new FileReader()
    fileReader.onload = function (e) {
      console.log('read chunk nr', currentChunk + 1, 'of', chunks)
      spark.append(e.target.result) // Append array buffer
      if (currentChunk < chunks) {
      } else {
        let cur = +new Date()
        console.log('finished loading')
        // alert(spark.end() + '---' + (cur - pre)); // Compute hash
        let result = spark.end()
    fileReader.onerror = function (err) {
      console.warn('oops, something went wrong.')
    function loadNext () {
      let start = currentChunk * chunkSize
      let end =
        start + chunkSize >= file.size ? file.size : start + chunkSize
      fileReader.readAsArrayBuffer(blobSlice.call(file, start, end))

2. Query file status

After the front-end gets the MD5 of the file, it queries from the background whether there is a file named asMD5If it exists, all files under the folder will be listed to get the list of uploaded slices. If it does not exist, the list of uploaded slices will be empty.
Upload optimization of large size files

//Check MD5 of file
checkFileMD5 (file, fileName, fileMd5Value, onError) {
  const fileSize = file.size
  const { chunkSize, uploadProgress } = this
  this.chunks = Math.ceil(fileSize / chunkSize)
  return new Promise(async (resolve, reject) => {
    const params = {
      fileName: fileName,
      fileMd5Value: fileMd5Value,
    const { ok, data } = await services.checkFile(params)
    if (ok) {
      this.hasUploaded = data.chunkList.length
    } else {

3. File segmentation

The core of file upload optimization is file fragmentation. The slice method in blob object can cut the file. File object inherits blob object, so file object also has slice method.

The size variable of each fragment file is defined as chunksize. The number of chunks is obtained by the file size, file size and chunk size. The for loop and file.slice () the file is sliced with the serial number of 0 – N, and compared with the uploaded slice list to get all the non uploaded fragments, and push them to the request list requestlist.

Upload optimization of large size files

async checkAndUploadChunk (file, fileMd5Value, chunkList) {
  let { chunks, upload } = this
  const requestList = []
  for (let i = 0; i < chunks; i++) {
    let exit = chunkList.indexOf(i + '') > -1
    //If it already exists, you do not need to upload the current block
    if (!exit) {
      requestList.push(upload(i, fileMd5Value, file))
  console.log({ requestList })
  const result =
    requestList.length > 0
      ? await Promise.all(requestList)
        .then(result => {
          console.log({ result })
          return result.every(i => i.ok)
        .catch(err => {
          return err
      : true
  console.log({ result })
  return result === true

4. Upload and fragment

Call Promise.all All slices are uploaded simultaneously, and the slice serial number, slice file and file MD5 are transferred to the background.

After receiving the upload request, the background first checks theMD5 fileIf the folder does not exist, the folder will be created, and then thefs-extraTo move the slice from the temporary path to the slice folder, the result is as follows:

Upload optimization of large size files

When all the pieces are uploaded successfully, the server will be informed to merge. When one piece fails to upload, it will prompt “upload failed”. When uploading again, the file MD5 is used to get the upload status of the file. When the server already has a slice corresponding to the MD5, it means that the slice has been uploaded, and there is no need to upload again. When the server cannot find the MD5 When the corresponding slice is used, it means that the slice needs to be uploaded. The user only needs to upload this part of the slice to upload the whole file. This is the breakpoint continuation of the file.

Upload optimization of large size files

//Upload chunk
upload (i, fileMd5Value, file) {
  const { uploadProgress, chunks } = this
  return new Promise((resolve, reject) => {
    let { chunkSize } = this
    //Construct a form, formdata is new to HTML5
    let end =
      (i + 1) * chunkSize >= file.size ? file.size : (i + 1) * chunkSize
    let form = new FormData()
    form.append ('data',  file.slice (I * chunksize, end)) // the slice method of the file object is used to cut out part of the file
    form.append ('total ', chunks) // total number of slices
    form.append ('index ', I) // what is the current film
    form.append('fileMd5Value', fileMd5Value)
      .then(data => {
        if (data.ok) {
        console.log({ data })
      .catch(err => {

5. Upload progress

Although it is much faster to upload large files in batches than a single upload of large files, there is still a period of loading time. At this time, the prompt of upload progress should be added to display the file upload progress in real time.

The XMLHttpRequest of native JavaScript provides a progress event, which returns the size and total size of the file that has been uploaded. Project usageaxiosEncapsulating Ajax can be added in configonUploadProgressMethod to monitor the progress of file upload.

Upload optimization of large size files

const config = {
  onUploadProgress: progressEvent => {
    var complete = (progressEvent.loaded / progressEvent.total * 100 | 0) + '%'
services.uploadChunk(form, config)

6. Merge and partition

After uploading all the files into slices, the front-end actively informs the server to merge. When the server receives the request, the server takes the initiative to merge the slices, and finds the folder with the same name in the file upload path of the server through the file MD5. As can be seen from the above, file fragmentation is named according to the fragment number, while the fragmentation upload interface is asynchronous, which can not guarantee that the slices received by the server are spliced according to the request order. Therefore, before merging the fragment files in the folder, sort them according to the file name, and then pass theconcat-filesMerge the fragment files to get the files uploaded by users. At this point, the upload of large files is completed.

Upload optimization of large size files

Upload optimization of large size files

Node code:

//Merge files
exports.merge = {
  validate: {
    query: {
      fileName: Joi.string()
        . description ('File name '),
      md5: Joi.string()
        . description ('File MD5 '),
      size: Joi.string()
        . description ('File size '),
  permission: {
    roles: ['user'],
  async handler (ctx) {
    const { fileName, md5, size } = ctx.request.query
    let { name, base: filename, ext } = path.parse(fileName)
    const newFileName = randomFilename(name, ext)
    await mergeFiles(path.join(uploadDir, md5), uploadDir, newFileName, size)
      .then(async () => {
        const file = {
          key: newFileName,
          name: filename,
          mime_type: mime.getType(`${uploadDir}/${newFileName}`),
          path: `${uploadDir}/${newFileName}`,
          provider: 'oss',
          owner: ctx.state.user.id,
        const key = encodeURIComponent(file.key)
          .replace(/%/g, '')
        file.url = await uploadLocalFileToOss(file.path, key)
        file.url = getFileUrl(file)
        const f = await File.create(omit(file, 'path'))
        const files = []
        ctx.body = invokeMap(files, 'toJSON')
      .catch(() => {
        throw  Boom.badData ('large file fragmentation merge failed, please try again later ~ ')


This paper describes some methods to optimize the upload of large-scale files

  1. Blob.slice The file is sliced, and multiple slices are uploaded simultaneously. After all slices are uploaded, the server is informed to merge and upload large files into slices;
  2. The onprogress of native XMLHttpRequest monitors the progress of file upload, and gets the file upload progress in real time;
  3. Spark-md5 calculates the file MD5 according to the file content, and gets the unique identification of the file, which is bound with the file upload status;
  4. Before slice upload, query the list of uploaded slices through file MD5, and upload only the slices that have not been uploaded, so as to realize breakpoint continuous transmission.

referenceDemo source codeCan quickly start the above functions, I hope this article can help you, thank you for reading Mei A kind of

Welcome to the bump lab blog: aotu.io

Or focus on the official account of bump Laboratory (AOTULabs).

Upload optimization of large size files