DWQA QuestionsCategory: Artificial IntelligenceHttp: what's going on behind file uploads?
wuhuaji asked 4 days ago

Although I know that the general process of file upload is to add attributes to the page form:enctype=multipart/form-dataThen select the file through < input type = file > to submit the file to the specified server. In this process, the submitted form will be different from the general post. The HTTP body is probably:

Content-Type: multipart/form-data; boundary=---------------------------14579331036932498511351460782
Content-Length: 418
-----------------------------14579331036932498511351460782
Content-Disposition: form-data; name="userfile1"; filename="备注说明.txt"
Content-Type: text/plain
1.±ê×¢ÒÔiPhone6s ÆÁÄ»³ß´çΪ±ê×¼£»
2.Èç¹ûÐèÒª²»Í¬³ß´çµÄicon£¬ÔÙ¸øÎÒ˵¡£
-----------------------------14579331036932498511351460782
Content-Disposition: form-data; name="hehe"
tewtw
-----------------------------14579331036932498511351460782--

Here, I want to explore how the server side accepts the data stream and saves the data stream as a file, that is to say, from the HTTP level, how to operate all this? The materials and blogs I searched for are mostly explained from the language level, and most of the languages are encapsulated directly, such as in PHP$_ In files and node, formalizable is often quoted, and the principle is not explained.
I wonder if there is a great God, can you give me some advice?

jokester replied 4 days ago

There’s an RFC at the beginning of ten pages:7578

jokester replied 4 days ago

OK, thank you for your reply

3 Answers
Mu Yi answered 4 days ago

In other words, PHP is really the language level has been done, you can not see, but node.js You can look at the source code directly
such asbusboyinlib/types/multipart.js
, orformidableinlib/multipart_parser.js
The protocol standard for uploading files has been mentioned above, seeRFC 7578

wuhuaji replied 4 days ago

thank

Baby temper answered 4 days ago

I tried to write it myself. The general process is as follows:
1. Receive header to\r\nend.
It looks like this:

POST / HTTP/1.1
Host: xxx.com
Connection: keep-alive
Content-Length: 34360
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary9JYnHmp5Rtqoc1iQ
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.8

2. Check the HTTP method. If it is post, the value of the content length header is obtained. If this header exists and is not 0, then it is ready to receive the body. Until the content length content is received, the request is considered to be received.
3. In the previous step, we got the body. Now we need to judge the body content type. We get the content type header. If it isContent-Type: application/x-www-form-urlencodedThen use & to divide, and then = to get the form data. Like this:

_token=JHjUoYi5R9jgU1ZFgsQYOvEpTx02dSGhqW4pWbNrKaT&email=nnn%40xxx.com&password=xxx&login=1

If it isContent-Type: multipart/form-data; boundary=----WebKitFormBoundaryRazVhL46T6okNz7WThat means we are receiving binary data. And use the boundary value as the boundary string (used to split multiple blocks)
(it needs to be said here that headers is not a key value array, but a key array. In many HTTP protocol implementations, the getheader method returns an array instead of a string.)
The binary body looks like this

------WebKitFormBoundary9JYnHmp5Rtqoc1iQ
Content-Disposition: form-data; name="file"; filename="A.jpg"
Content-Type: image/jpeg
{binary}

------WebKitFormBoundary9JYnHmp5Rtqoc1iQ--

The {binary} is where the binary of the image file is located. At this point, we can get the upload file from the request.
ItalicsDuring the actual operation, there are some small pits. For example, when the client uploads more than 1KB data in the post, there will be one in the headersExpect: 100-continueIf the server receives the header, it must send it to the clientHTTP/1.1 100 Continue\r\n\r\nThe client will continue to send body data.

Bwish replied 4 days ago

Clear a little bit, but still muddled

Brother Tan replied 4 days ago

Here is a question I want to ask. If you upload an image, the body is binary data, but I actually read why the length of body is different from that of content length,

StormerZ answered 4 days ago

In fact, regardless of HTTP, it is a courier. In form, enctype = multipart / form data means that the form is a file stream. It is packaged and sealed and sent to the server. The server will unpack the file stream and restore it to a file.
The existence of the network can be ignored. You can use any language to read a file locally. In fact, the working process is the same. Open the target file, read the file stream, write the file stream to another file, and close the stream.
View the file flow operation of the relevant language. It’s nothing deep. It’s usually packaged. At the bottom, the file is read to the array in a specific format.
Maybe if you don’t answer to the point, just be the top of the Gang: D

wuhuaji replied 4 days ago

thank