One day, I suddenly saw this article when I was browsing a gold mine,Front end large file uploadI have studied similar principles before, but I haven’t been able to do it once by myself. I always feel a little empty. Recently, I spent some time to prepare (explode) an example to share with you.
problem
Knowing the time available to provide a response can avoid problems with timeouts. Current implementations select times between 30 and 120 seconds
https://tools.ietf.org/id/draft-thomson-hybi-http-timeout-00.html
If a file is too large, such as audio and video data, downloaded excel tables, etc., if the waiting time during uploading exceeds 30 ~ 120s and the server does not return data, it may be considered as timeout, which means that the uploaded file will be interrupted.
Another problem is that in the process of uploading large files, the data uploaded to the server is interrupted or timed out due to server problems or other network problems. This is because the uploaded data will not be saved, resulting in a waste of uploading.
principle
Large file uploading uses the principle of dividing large files into pieces to split a large file into several small files and upload them respectively. Then, after the small file upload is completed, the server is notified to merge the files. Thus, the large file upload is completed.
This method of uploading solves several problems:
- Request timeout due to too large file
- Split a request into multiple requests (the number of popular browsers is generally 6 by default,Number of concurrent uploads of homologous requests), increase the number of concurrency, and improve the speed of file transfer
- The data of the small file is convenient for the server to save. In case of network interruption, the uploaded data can not be uploaded again when uploading the next time
realization
File slicing
File
Interface is based onBlob
So we can use the uploaded file objectslice
Method. The specific implementation is as follows:
export const slice = (file, piece = CHUNK_SIZE) => {
return new Promise((resolve, reject) => {
let totalSize = file.size;
const chunks = [];
const blobSlice = File.prototype.slice || File.prototype.mozSlice || File.prototype.webkitSlice;
let start = 0;
const end = start + piece >= totalSize ? totalSize : start + piece;
while (start < totalSize) {
const chunk = blobSlice.call(file, start, end);
chunks.push(chunk);
start = end;
const end = start + piece >= totalSize ? totalSize : start + piece;
}
resolve(chunks);
});
};
Then upload each small file in the form
_chunkUploadTask(chunks) {
for (let chunk of chunks) {
const fd = new FormData();
fd.append('chunk', chunk);
return axios({
url: '/upload',
method: 'post',
data: fd,
})
.then((res) => res.data)
.catch((err) => {});
}
}
Back end usesexpress
, receiving documents adopt[multer](https://github.com/expressjs/multer)
This library
multer
The upload methods include single, array, fields, none, and any. Single file upload is performed by usingsingle
andarray
Both are available. It is easy to use. Throughreq.file
orreq.files
To get the information of the uploaded file
In addition, it needs to passdisk storage
To customize the file name of the uploaded file, and ensure that each uploaded file chunk is unique.
const storage = multer.diskStorage({
destination: uploadTmp,
filename: (req, file, cb) => {
//Specify the returned file name. If it is not specified, it will be generated randomly by default
cb(null, file.fieldname);
},
});
const multerUpload = multer({ storage });
// router
router.post('/upload', multerUpload.any(), uploadService.uploadChunk);
// service
uploadChunk: async (req, res) => {
const file = req.files[0];
const chunkName = file.filename;
try {
const checksum = req.body.checksum;
const chunkId = req.body.chunkId;
const message = Messages.success(modules.UPLOAD, actions.UPLOAD, chunkName);
logger.info(message);
res.json({ code: 200, message });
} catch (err) {
const errMessage = Messages.fail(modules.UPLOAD, actions.UPLOAD, err);
logger.error(errMessage);
res.json({ code: 500, message: errMessage });
res.status(500);
}
}
The uploaded file will be saved inuploads/tmp
Next, here is themulter
It is automatically completed for us. After success, it can be passedreq.files
It can obtain file information, including the name and path of the chunk, so as to facilitate subsequent repository processing.
Why should we ensure that the file name of chunk is unique?
- Because the file name is random, it means that in case of network interruption, if the uploaded partition has not been completed, the database will not have the corresponding storage record, so that the partition will not be found in the next upload. The consequence is that
tmp
There are many free partitions in the directory, which cannot be deleted. - At the same time, when uploading is suspended, the corresponding temporary partition can be deleted according to the name of the chunk (this step is unnecessary,
multer
When the partition is judged to exist, it will be automatically overwritten)
There are two ways to ensure that chunk is unique,
- When cutting files, generate file fingerprints for each chunk(
chunkmd5
) - It is specified by the file fingerprint of the whole file plus the serial number of chunk(
filemd5
+chunkIndex
)
//Modify the above code
const chunkName = `${chunkIndex}.${filemd5}.chunk`;
const fd = new FormData();
fd.append(chunkName, chunk);
So far, the slicing upload is roughly completed.
File merge
File merging is to read the uploaded files separately and then integrate them into a new file,Compare IO consumption, can be integrated in a new thread.
for (let chunkId = 0; chunkId < chunks; chunkId++) {
const file = `${uploadTmp}/${chunkId}.${checksum}.chunk`;
const content = await fsPromises.readFile(file);
logger.info(Messages.success(modules.UPLOAD, actions.GET, file));
try {
await fsPromises.access(path, fs.constants.F_OK);
await appendFile({ path, content, file, checksum, chunkId });
if (chunkId === chunks - 1) {
res.json({ code: 200, message });
}
} catch (err) {
await createFile({ path, content, file, checksum, chunkId });
}
}
Promise.all(tasks).then(() => {
// when status in uploading, can send /makefile request
// if not, when status in canceled, send request will delete chunk which has uploaded.
if (this.status === fileStatus.UPLOADING) {
const data = { chunks: this.chunks.length, filename, checksum: this.checksum };
axios({
url: '/makefile',
method: 'post',
data,
})
.then((res) => {
if (res.data.code === 200) {
this._setDoneProgress(this.checksum, fileStatus.DONE);
toastr.success(`file ${filename} upload successfully!`);
}
})
.catch((err) => {
console.error(err);
toastr.error(`file ${filename} upload failed!`);
});
}
});
- First, use access to determine whether the partition exists. If it does not exist, create a new file and read the partition content
- If the chunk file exists, read the contents into the file
- After each chunk is read successfully, delete the chunk
Here are a few points to note:
-
If a file has only one chunk, you need to
createFile
Otherwise, the request is always in thepending
Status.await createFile({ path, content, file, checksum, chunkId }); if (chunks.length === 1) { res.json({ code: 200, message }); }
-
makefile
It is necessary to judge whether the file is in the upload status, otherwisecancel
In the status of, it will continue to upload. As a result, after the chunk is uploaded, the chunk file is deleted, but there are records in the database. Therefore, the merged file is problematic.
File second transmission
How to transmit documents every second, think for three seconds, and publish the answers, 3 2. 1….., It’s just a cover up.
Why is it a cover up? Because there is no transfer at all. The files come from the server. There are several problems to be clarified,
- How can I confirm that a file already exists in the server?
- Is the uploaded file information stored in the database or in the client?
- What should I do if the file names are different and the contents are the same?
Question 1: how to judge that the file already exists?
The corresponding fingerprint can be generated for each file upload, but if the file is too large, the time for the client to generate the fingerprint will greatly increase. How to solve this problem?
Remember beforeslice
, file slicing? Large files are not easy to do. In the same way, cut them into small files, and then calculate the MD5 value. Used herespark-md5
This library generates the file hash. Modify the slice method above.
export const checkSum = (file, piece = CHUNK_SIZE) => {
return new Promise((resolve, reject) => {
let totalSize = file.size;
let start = 0;
const blobSlice = File.prototype.slice || File.prototype.mozSlice || File.prototype.webkitSlice;
const chunks = [];
const spark = new SparkMD5.ArrayBuffer();
const fileReader = new FileReader();
const loadNext = () => {
const end = start + piece >= totalSize ? totalSize : start + piece;
const chunk = blobSlice.call(file, start, end);
start = end;
chunks.push(chunk);
fileReader.readAsArrayBuffer(chunk);
};
fileReader.onload = (event) => {
spark.append(event.target.result);
if (start < totalSize) {
loadNext();
} else {
const checksum = spark.end();
resolve({ chunks, checksum });
}
};
fileReader.onerror = () => {
console.warn('oops, something went wrong.');
reject();
};
loadNext();
});
};
Question 2: is the uploaded information stored in the database or in the client?
The information uploaded from the file should be stored in theServer sideIn the database (the client can useIndexDB
)This has several advantages,
- Database service provides a complete set of
CRUD
To facilitate data operation - After the user refreshes the browser or changes the browser, the information uploaded by the file will not be lost
The second point is emphasized here, because the first client can also do
const saveFileRecordToDB = async (params) => {
const { filename, checksum, chunks, isCopy, res } = params;
await uploadRepository.create({ name: filename, checksum, chunks, isCopy });
const message = Messages.success(modules.UPLOAD, actions.UPLOAD, filename);
logger.info(message);
res.json({ code: 200, message });
};
Question 3: what should I do if the file names are different and the contents are the same?
There are also two solutions:
- File copy, directly copy a file, and then update the database records, and add
isCopy
Identification of - File references, database records, plus
isCopy
andlinkTo
Identification of
What is the difference between the two methods:
Using the file copy method, you can delete files more freely, because the original files and copied files exist independently, and the deletion will not interfere with each other. The disadvantage is that there will be many files with the same content;
However, it is troublesome to delete files copied by reference. It is better if the copied files are deleted. If the original files are deleted,You must copy a copy of the source file to any copy file,At the same time, modify theisCopy
byfalse
Before deleting the database record of the original file.
Here is a picture. By the way:
In theory, the way of file reference may be better. Here is a lazy job. The way of file copying is adopted.
//Client
uploadFileInSecond() {
const id = ID();
const filename = this.file.name;
this._renderProgressBar(id);
const names = this.serverFiles.map((file) => file.name);
if (names.indexOf(filename) === -1) {
const sourceFilename = names[0];
const targetFilename = filename;
this._setDoneProgress(id, fileStatus.DONE_IN_SECOND);
axios({
url: '/copyfile',
method: 'get',
params: { targetFilename, sourceFilename, checksum: this.checksum },
})
.then((res) => {
if (res.data.code === 200) {
toastr.success(`file ${filename} upload successfully!`);
}
})
.catch((err) => {
console.error(err);
toastr.error(`file ${filename} upload failed!`);
});
} else {
this._setDoneProgress(id, fileStatus.EXISTED);
toastr.success(`file ${filename} has existed`);
}
}
//Server side
copyFile: async (req, res) => {
const sourceFilename = req.query.sourceFilename;
const targetFilename = req.query.targetFilename;
const checksum = req.query.checksum;
const sourceFile = `${uploadPath}/${sourceFilename}`;
const targetFile = `${uploadPath}/${targetFilename}`;
try {
await fsPromises.copyFile(sourceFile, targetFile);
await saveFileRecordToDB({ filename: targetFilename, checksum, chunks: 0, isCopy: true, res });
} catch (err) {
const message = Messages.fail(modules.UPLOAD, actions.UPLOAD, err.message);
logger.info(message);
res.json({ code: 500, message });
res.status(500);
}
}
File upload suspension and file renewal
The suspension of file uploading actually takes advantage ofxhr
ofabort
Method, because in the caseaxios
,axios
be based onajax
Encapsulates its own implementation.
Here is the code pause Code:
const CancelToken = axios.CancelToken;
axios({
url: '/upload',
method: 'post',
data: fd,
cancelToken: new CancelToken((c) => {
// An executor function receives a cancel function as a parameter
canceler = c;
this.cancelers.push(canceler);
}),
})
axios
One parameter is used in each requestcancelToken
, thiscancelToken
Is a function that can be used to save thecancel
Handle.
Then click Cancel to cancel the upload of each chunk, as follows:
//JQuery is used here to write HTML. Well, it does
$(`#cancel${id}`).on('click', (event) => {
const $this = $(event.target);
$this.addClass('hidden');
$this.next('.resume').removeClass('hidden');
this.status = fileStatus.CANCELED;
if (this.cancelers.length > 0) {
for (const canceler of this.cancelers) {
canceler();
}
}
});
While uploading each chunk, we also need to determine whether each chunk exists? Why?
In case of unexpected network interruption, the information uploaded to the chunk will be saved to the database. Therefore, the existing chunks can be saved without further transmission during the subsequent transmission, which saves time.
So the question is, is it a single detection for each chunk or a pre detection for the existing chunks in the server?
You can also think about this problem for three seconds. After all, it has been debugged for a long time.
3.. 2.. 1……
Look at your code strategy, because after all, everyone writes code differently. The principle is,You cannot block each loop because you need to generate thecancelToken
, if each chunk needs to get data from the server during the cycle, subsequent chunks will not be able to generate a canceltoken, so that subsequent chunks can continue to upload when you click Cancel.
//Client
const chunksExisted = await this._isChunksExists();
for (let chunkId = 0; chunkId < this.chunks.length; chunkId++) {
const chunk = this.chunks[chunkId];
//The code was like this a long time ago
//Generation of canceltoken will be blocked here
// const chunkExists = await isChunkExisted(this.checksum, chunkId);
const chunkExists = chunksExisted[chunkId];
if (!chunkExists) {
const task = this._chunkUploadTask({ chunk, chunkId });
tasks.push(task);
} else {
// if chunk is existed, need to set the with of chunk progress bar
this._setUploadingChunkProgress(this.checksum, chunkId, 100);
this.progresses[chunkId] = chunk.size;
}
}
//Server side
chunksExist: async (req, res) => {
const checksum = req.query.checksum;
try {
const chunks = await chunkRepository.findAllBy({ checksum });
const exists = chunks.reduce((cur, chunk) => {
cur[chunk.chunkId] = true;
return cur;
}, {});
const message = Messages.success(modules.UPLOAD, actions.CHECK, `chunk ${JSON.stringify(exists)} exists`);
logger.info(message);
res.json({ code: 200, message: message, data: exists });
} catch (err) {
const errMessage = Messages.fail(modules.UPLOAD, actions.CHECK, err);
logger.error(errMessage);
res.json({ code: 500, message: errMessage });
res.status(500);
}
}
File renewal means uploading files again. There is nothing to say about this. The main thing is to solve the above problem.
$(`#resume${id}`).on('click', async (event) => {
const $this = $(event.target);
$this.addClass('hidden');
$this.prev('.cancel').removeClass('hidden');
this.status = fileStatus.UPLOADING;
await this.uploadFile();
});
Progress feedback
Progress feedback is usedXMLHttpRequest.upload
,axios
The corresponding method is also encapsulated, and two progress items need to be displayed here
- Progress per chunk
- Total progress of all chunks
The progress of each chunk will be based on the uploadedloaded
andtotal
There is nothing to say here.
axios({
url: '/upload',
method: 'post',
data: fd,
onUploadProgress: (progressEvent) => {
const loaded = progressEvent.loaded;
const chunkPercent = ((loaded / progressEvent.total) * 100).toFixed(0);
this._setUploadingChunkProgress(this.checksum, chunkId, chunkPercent);
},
})
The total progress is accumulated according to the loading amount of each chunk, and thenfile.size
To calculate.
constructor(checksum, chunks, file) {
this.progresses = Array(this.chunks.length).fill(0);
}
axios({
url: '/upload',
method: 'post',
data: fd,
onUploadProgress: (progressEvent) => {
const chunkProgress = this.progresses[chunkId];
const loaded = progressEvent.loaded;
this.progresses[chunkId] = loaded >= chunkProgress ? loaded : chunkProgress;
const percent = ((this._getCurrentLoaded(this.progresses) / this.file.size) * 100).toFixed(0);
this._setUploadingProgress(this.checksum, percent);
},
})
_setUploadingProgress(id, percent) {
// ...
// for some reason, progressEvent.loaded bytes will greater than file size
const isUploadChunkDone = Number(percent) >= 100;
// 1% to make file
const ratio = isUploadChunkDone ? 99 : percent;
}
One thing to note here is that,loaded >= chunkProgress ? loaded : chunkProgress
The purpose of this judgment is that some films may need to be replayed during the continuation process**0**
Start uploading. If you don’t judge this way, it will cause the progress bar to jump.
Database configuration
Database usessequelize
+ mysql
, the initialization code is as follows:
const initialize = async () => {
// create db if it doesn't already exist
const { DATABASE, USER, PASSWORD, HOST } = config;
const connection = await mysql.createConnection({ host: HOST, user: USER, password: PASSWORD });
try {
await connection.query(`CREATE DATABASE IF NOT EXISTS ${DATABASE};`);
} catch (err) {
logger.error(Messages.fail(modules.DB, actions.CONNECT, `create database ${DATABASE}`));
throw err;
}
// connect to db
const sequelize = new Sequelize(DATABASE, USER, PASSWORD, {
host: HOST,
dialect: 'mysql',
logging: (msg) => logger.info(Messages.info(modules.DB, actions.CONNECT, msg)),
});
// init models and add them to the exported db object
db.Upload = require('./models/upload')(sequelize);
db.Chunk = require('./models/chunk')(sequelize);
// sync all models with database
await sequelize.sync({ alter: true });
};
deploy
The deployment of production environment adoptsdocker-compose
, code as follows:
Dockerfile
FROM node:16-alpine3.11
# Create app directory
WORKDIR /usr/src/app
# A wildcard is used to ensure both package.json AND package-lock.json are copied
# where available ([email protected]+)
COPY package*.json ./
# If you are building your code for production
# RUN npm ci --only=production
# Bundle app source
COPY . .
# Install app dependencies
RUN npm install
RUN npm run build:prod
docker-compose.yml
version: "3.9"
services:
web:
build: .
# sleep for 20 sec, wait for database server start
command: sh -c "sleep 20 && npm start"
ports:
- "3000:3000"
environment:
NODE_ENV: prod
depends_on:
- db
db:
image: mysql:8
command: --default-authentication-plugin=mysql_native_password
restart: always
ports:
- "3306:3306"
environment:
MYSQL_ROOT_PASSWORD: pwd123
One thing to note is that you need to wait for the database service to start before starting itweb
Service, otherwise an error will be reported, so a 20 second delay is added to the code.
Deploy to heroku
-
create
heroku.yml
build: docker: web: Dockerfile run: web: npm run start:heroku
-
modify
package.json
{ "scripts": { "start:heroku": "NODE_ENV=heroku node ./bin/www" } }
-
deploy to heroku
# create heroku repos heroku create upload-demos heroku stack:set container # when add addons, remind to config you billing card in heroku [important] # add mysql addons heroku addons:create cleardb:ignite # get mysql connection url heroku config | grep CLEARDB_DATABASE_URL # will echo => DATABASE_URL: mysql://xxxxxxx:[email protected]/heroku_9ab10c66a98486e?reconnect=true # set mysql database url heroku config:set DATABASE_URL='mysql://xxxxxxx:[email protected]/heroku_9ab10c66a98486e?reconnect=true' # add heroku.js to src/db/config folder # use the DATABASE_URL which you get form prev step to config the js file module.exports = { HOST: 'xx-xxxx-east-xx.cleardb.com', USER: 'xxxxxxx', PASSWORD: 'xxxxxx', DATABASE: 'heroku_9ab10c66a98486e', }; # push source code to remote git push heroku master
Summary
So far, all the problems have been solved. The general feeling is that there are so many details to be dealt with. Some things can’t just be seen. It takes time to do them so that we can better understand the principles and have more motivation to learn new knowledge.
You never know what you have to do.
In the code warehousegithubThere are also many details, including local server development configuration, log storage, etc. those interested canfork
Understand. It is not easy to create ⭐ ️ ⭐ ️。