Breaking the barriers between application layer and transport layer TCP packet sticking problem


Transport layer Perspective

There are two major protocols in the transport layer: TCP and UDP. TCP is a reliable streaming protocol and UDP is an unreliable datagram protocol. However, the reliability here is limited to the end-to-end communication. When using TCP protocol, the data sent by the sender when calling the write function through the application program can be received orderly and completely by the receiver, and the data transmitted by the sender can also be received orderly and completely when the receiver calls the read function.

There is a problem here. TCP is based on flow. What is the concept of “packet” in the TCP packet sticking problem? The packet here refers to the byte block. When TCP sends 50 bytes of data to the peer at one time, a packet of size 50 is generated, which is divided into 20 bytes and 30 bytes and sent to the opposite end, resulting in two packets.

When the sender sends two messages, hello and world, they may be regarded as a message without boundary due to the behavior of writing function. From a macro point of view, the sender sends HelloWorld, and the receiver receives HelloWorld orderly and completely, but the receiver can’t distinguish hello and world, and the receiver on the transport layer has no ability to divide the message boundary.

Breaking the barriers between application layer and transport layer TCP packet sticking problem

The behavior of writing functions

When the sender calls the write function, we see that some data has been sent, but where is the data sent? The answer is send buffer.

TCP provides a send buffer and a receive buffer. The write function stores the data in the transmit buffer temporarily, and the network protocol stack decides when to send the data. Similarly, the read function will temporarily store the received data in the receive buffer.

When the sender calls the write function twice in succession to send hello and world respectively, hello and world will be placed in the send buffer and the boundary cannot be distinguished. The operating system will send HelloWorld as a message or split it into multiple messages to send, and the opposite end will do the same because it is unable to distinguish the boundary.

How to divide the boundary

It has been repeatedly mentioned that there must be boundaries between messages. However, due to the fact that TCP regards messages as streams, it is impossible to solve this problem in TCP. At this time, it is the turn of the application layer to generate electricity with love.

First of all, let me conclude
1. The application layer sends a fixed size byte block each time
2. The application layer provides clear boundary characters
3. The application layer adds a length pointer variable for the peer to read data

May we change our thinking and analyze the design of HTTP protocol?

HTTP protocol

Breaking the barriers between application layer and transport layer TCP packet sticking problem
HTTP protocol request message is divided into three parts: request line, header field and content.
We noticed that there is a newline character at the end of each line in the request line and header field, which is the explicit boundary character in the HTTP protocol.
There is a content length field in the HTTP header field. This field indicates the number of bytes of the content, which is the length pointer variable. Similar design also has the commonly used iovec vector.
As for fixed size byte blocks, the 20 bytes specified in the first part of TCP protocol is such a design.

Application layer Perspective

The role of application layer is to help TCP determine the message boundary, and the means to determine the message boundary is protocol.
After this point is clear, we can design our own protocol in the application layer and implement it with code, so as to solve the problem of TCP packet sticking in the application layer.
Breaking the barriers between application layer and transport layer TCP packet sticking problem

How many kinds of TCP packets are possible?

There is a classic question in the interview: when TCP sends n-byte packets, how many sending methods are there.
Here is a list of violence:
The sender sends 1 byte data first, and then n-1 byte data
The sender sends 2 bytes of data first, and then n-2 bytes of data
The sender sends n-1 byte data first and then 1 byte data
The sender sends n bytes of data directly
In this way, we let the sender send x bytes first, and then send the remaining N-X bytes to form a simple dynamic programming. The answer is 2 ^ (n-1).