Distributed system can be summarized as a whole composed of multiple processes in different physical locations. In order to ensure that the whole provides services effectively and efficiently, each node may need to communicate to exchange information, and the process of exchanging information mostly uses TCP protocol. TCP is a transport layer protocol over IP layer. There are two important protocols in this transport layer: TCP and UDP. For application layer developers, the two protocols are used most, which is also one of the knowledge points that some interviewers must ask
Both TCP and UDP are based on the rules of IP + port. What do you mean? In other words, processes using TCP and UDP need a port to read and write data.
TCP protocol is reliable and connection oriented. The process of establishing a connection will go through three handshakes. Why three handshakes instead of two or four?
Speaking of this problem, we can abstract a scene. How can we determine whether one end is communicating with the other end? In fact, it’s very simple. Messages sent from one end to the other end can be answered smoothly, which means that both ends are connected. The three handshakes of TCP protocol just illustrate this point. As long as a and B communicate with each other for three handshakes, a can get B’s reply, and B can also get a’s reply.
Based on the IP layer to send the message, in the network is unable to determine whether the correct arrival of the other party. TCP protocol adds a series of data structures and algorithms to IP protocol to ensure the correct arrival of TCP data.
- TCP packets are numbered. This is mainly to solve the problem of order. If there is no number, how can the other party determine the order? On the other hand, the numbering is used by the sender to confirm which packets have arrived correctly
- When a TCP packet is received by the receiver, the receiver needs to send a confirmation packet to the sender. After receiving the confirmation packet, the sender will modify the status of the corresponding packet. Because each packet has a timeout mechanism, the sender will try again after the timeout
- TCP is oriented to byte stream. When it is sent, it sends a byte stream, which is TCP’s own state maintenance.
Having said so much, we can actually regard TCP as a stateful protocol, which can adjust its sending state according to many factors, such as network conditions, receiving conditions of the other party, etc.
Compared with TCP protocol, UDP is much simpler
- UDP protocol does not need to establish a connection, which means that the sender can send as long as he knows the IP and port of the other party. Based on this, he can broadcast.
2. UDP protocol is not responsible for reliable delivery, because it does not have a lot of algorithms and data structures to guarantee as TCP protocol.
- UDP is based on datagram form, sending and receiving one by one. Moreover, the sending of UDP data will not change due to the blocking of network environment
Based on the above characteristics, UDP can be used in applications with good network environment or insensitive to packet loss. After abandoning some column characteristics such as retransmission and sequence, UDP has a very fast processing speed and is widely used in some insensitive but high real-time scenarios.
With TCP and UDP protocols, the cornerstone of interprocess communication has been established. But you can’t write three handshakes of TCP every time you communicate. In order to shield these complex processes and make the communication program simple, the concept of socket is abstracted from TCP and UDP protocols.
The so-called socket is the abstraction of two-way communication endpoint between application processes on different hosts in the network. A socket is the end of process communication on the network, which provides a mechanism for application layer process to exchange data by using network protocol. In terms of its position, socket connecting application process and network protocol stack is the interface for application program to communicate through network protocol, and the interface for application program to interact with network protocol root
Socket distinguishes between server and client. The process of establishing a connection between local socket and a remote socket is actually the process of three handshakes of TCP protocol. Once the socket connection is established, the read or write method abstracted from socket can be used for communication.
Socket needs to specify the IP protocol to be used, as well as TCP or UDP protocol. The socket of the server based on TCP protocol needs to bind a port to listen and accept the socket connection of the client. This is also a difference between the server socket and the client socket.
For UDP, the process is a bit different. UDP is not connected. First, it doesn’t need three handshakes. Second, it doesn’t need listen and connect, but it still needs IP and port bind. Otherwise, when the remote data arrives, the system will not find the receiving program. UDP has no connection state, so it does not need to establish a set of sockets for each connection. Instead, it can communicate with multiple clients with only one socket. It is also because there is no connection state. Every time you call SendTo and recvfrom, you need to pass in the IP address and port.
TCP based socket has a send buffer and a receive buffer in the kernel. The duplex mode of TCP and the sliding window of TCP depend on the two independent buffers and the data filling state of the buffer. The receiving buffer caches the data into the kernel. If the corresponding application does not call the read method of socket to read the data, the data will be cached in the receiving buffer. If the receiving buffer is full, the other socket will be informed to adjust the size of the sending window of the other socket. This is the implementation of sliding window. If the other party continues to send data, the receiving party will discard the received data when the receiving buffer is not read, which is the flow control of TCP. For UDP, there is no real send buffer, as long as there is data, it will send, no matter whether the other party can receive correctly or not, which is one of the reasons for UDP packet loss, but UDP socket and TCP socket will have receive buffer, and the behavior is the same.
Write at the end
Some interviewers blow water and call it HTTP long connection, which is actually inaccurate. Long connection and short connection are for TCP protocol. HTTP is just an application layer protocol based on TCP / IP protocol. Generally speaking, there are many data structures and algorithms designed by TCP and UDP. Here is just a rough description. Interested students can study the book of TCP protocol.
More wonderful articles