HTTP streaming

Time:2021-10-18

The work is related to distributed storage. In such a business scenario, large files are transmitted to the server through HTTP protocol. It cannot be loaded into memory at one time and assembled into the body of the request. How to solve this problem? The simplest idea is to upload large files into small files. HTTP streaming provides us with the corresponding solution.

Let’s take a look at some knowledge points:

Keepalive mode

After learning the knowledge of University, I only remember a few key words about http: http is stateless and connectionless.

This means that the HTTP protocol adopts the “request response” mode. When using the normal mode, i.e. non keepalive mode, each request / response client and server must create a new connection and disconnect immediately after completion;

Later, the world of the web became more and more wonderful. A web page may be nested with a variety of resources, such as pictures and videos. In order to solve the performance loss caused by frequent TCP connections, the keep alive mode was proposed. When the keep alive mode (also known as persistent connection and connection reuse) is used, the keep alive function keeps the connection from the client to the server valid. When subsequent requests to the server occur, the keep alive function avoids establishing or re establishing the connection.

The underlying implementation of TCP includes a keepalive timer. When no data passes through a data stream, the server will send an ACK request without data to the client every other period of time. If the client replies, it indicates that the connection still exists. If no reply is received, the server will ack multiple times. If no reply is received after a certain number of times, the connection is closed by default.

HTTP streaming

Keep alive mode is off by default in HTTP 1.0. You need to add “connection: keep alive” in the HTTP header to enable keep alive; Keep alive is enabled by default in HTTP 1.1. It will only be closed if “connection: close” is added. However, if you want to complete a keep alive connection, you still need the joint support of the client and server. If one end directly closes the socket after processing the request, the fairy can’t guarantee the connection.

It solves the problem that there is no need to establish connections repeatedly and frequently. The second problem follows. How to judge the end of the data flow?

In the request response mode, every time an HTTP request is sent, the client will actively close the connection. After reading all the body data, the server considers that the request has been completed and starts processing on the server. However, in the keep alive mode, this problem is obviously not so simple. For example, a keep alive HTTP connection continuously sends two pictures through the underlying TCP channel. For the server, how to judge whether these are two pictures? Instead of treating them as data in the same file? For another example, in the normal mode, the server will close the connection after sending the response, and the client will read the EOF (- 1). However, in the keep alive mode, the server will not actively close the connection, and the client naturally cannot read the EOF.

Method for judging end of data flow

HTTP provides us with two ways

  1. Content-Length

    This is a very intuitive way. Add a message in front of the data to be transmitted to tell the opposite end how much data will be transmitted, so that it can be read on the other side

    After this length of data, it can be considered that acceptance has been completed.

    If the content length cannot be predicted in advance, for example, the data source is still being generated, and I don’t know when it will end. There is a second way.

  2. Use the message header field, transfer encoding: chunk

    If you want to send data to the client while generating data, the server needs to use “transfer encoding: chunked” instead of content length.

    Chunk encoding occurs when data is divided into pieces. Chunked coding will be formed by concatenating several chunks, marked by oneLength is 0Mark the end of the chunk. Each chunk is divided into two parts: header and body. The header content specifies the total number of characters in the body(Hexadecimal digit)And quantity unit (generally not written). The body part is the actual content of the specified length, which is used between the two partsCarriage return line feed (CRLF)separate. The content in the last chunk with a length of 0 is called footer, which is some additional header information (usually directly ignored).

Packet capture verification

Seeing is better than hearing. It’s respectful to catch it first when using Wireshark.
Chunk encoding method to capture packets:
Server code:

func indexHandler(w http.ResponseWriter, r *http.Request) {
   //fmt.Print(r.Body.Read())
 fmt.Fprint(w, "hello world")
}
func main() {
   http.HandleFunc("/report", indexHandler)
   http.ListenAndServe(":8000", nil)
}

Client code

func main() {
   pr, rw := io.Pipe()
   go func(){
      for i := 0; i < 100; i++ {
         rw.Write([]byte(fmt.Sprintf("line:%drn", i)))
      }
      rw.Close()
   }()
   http.Post("localhost:8000/","text/pain", pr)
}

HTTP streaming

HTTP streaming
HTTP streaming

It can be seen from the results that the bottom layer of HTTP protocol sends data through the same TCP connection.
The content of each TCP packet is the data we write. Simultaneous HTTP requests

Method 2: content length
Client code

func main() {
   count := 10
 line := []byte("linern")
   pr, rw := io.Pipe()
   go func() {
      for i := 0; i < count; i++ {
         rw.Write(line)
         time.Sleep(500 * time.Millisecond)
      }
      rw.Close()
   }()
   //Construct request object
 request, err := http.NewRequest("POST", "http://localhost:8000/report", pr)
   if err != nil {
      log.Fatal(err)
   }
   //Calculate contentlength in advance
 request.ContentLength = int64(len(line) * count)
   //Initiate request
 http.DefaultClient.Do(request)
}

HTTP streaming
HTTP streaming
It is still transmitted through multiple TCP packets.

summary

In order to solve the problem that big data cannot be spliced into the request body at one time, HTTP streaming transmission is adopted, that is, it is transmitted through HTTP while reading into memory.

Keep alive mode avoids the repeated establishment of TCP connection when transmitting multiple messages, which lays a foundation for streaming a large amount of data.

Content length and transfer encoding provide solutions for judging the length of message content.

Set the content length method, and the client side continuously transmits data to the server through HTTP.

In the chunk mode, the client side contracts the data to the server in pieces.