This article is arranged during my sophomore year of studying computer network. Most of the content comes from Xie Xiren’s book computer network.
In order to make the content easier to understand, I reconstructed the previous arrangement, and provided some related diagrams for easy understanding.
- 1. Overview of computer network
- 2. Physical layer
- 3. Data link layer
- 4. Network layer
- 5. Transport layer
- 6. Application layer
- My open source project recommendation
1. Overview of computer network
1.1. Basic terms
- Node: the nodes in the network can be computers, hubs, switches or routers, etc.
- LinkA physical circuit from one node to another. There are no other intersections in the middle.
- Host: a computer connected to the Internet.
- ISP（Internet Service Provider）: Internet service provider (ISP).
- IXP（Internet eXchange Point）The main function of Internet switching point IXP is to allow two networks to connect directly and exchange packets without forwarding packets through a third network.
- RFC(Request For Comments)“Request for comment” means “request for comment”. It contains almost all the important text materials about the Internet.
- Wan (wide area network): the task is to transport the data sent by the host over a long distance.
- Man (metropolitan area network): used to interconnect multiple LANs.
- LAN (local area network): most schools or enterprises have multiple interconnected LANs.
- Pan (personal area network): a network connecting electronic devices for personal use with wireless technology in the place of personal work.
- PacketUnit of data transmitted on the Internet. It consists of header and data segment. Grouping is also called packet, and the head can be called packet head.
- Store and forward: the router receives a packet, first checks whether the packet is correct, and filters out the collision packet error. After confirming that the packet is correct, take out the destination address, find the output port address to send through the look-up table, and then send the packet out.
- Bandwidth: in computer network, it means the “highest data rate” that can be passed from one point to another in unit time. It is often used to indicate the ability of the communication line of the network to transmit data. The unit is “bits per second”, denoted B / s.
- Throughput: refers to the amount of data passing through a certain network (or channel or interface) in unit time. Throughput is more often used as a measure of real-world networks in order to know how much data can actually pass through the network. Throughput is limited by the bandwidth of the network or the rated rate of the network.
1.2. Summary of important knowledge points
- Computer network (referred to as the network) connects many computers together, and the Internet connects many networks together, is the network of the network.
- Internet, which starts with small letter I, is a general term. It generally refers to a network composed of multiple computer networks connected with each other. The communication protocol (i.e. communication rules) between these networks can be arbitrary. Internet (Internet) which starts with capital letter I is a special term. It refers to the world’s largest, open and specific Internet which is connected by many networks. It adopts TCP / IP protocol as the communication rule, and its predecessor is ARPANET. The recommended translation of Internet is Internet, which is now commonly called Internet.
- Router is the key component to realize packet switching, and its task is to forward the received packets, which is the most important function of the core part of the network. Packet switching uses store and forward technology, which means that a message (the whole block of data to be sent) is divided into several packets and then transmitted. Before sending messages, the longer messages are divided into smaller equal length data segments. A group is formed by adding some headers composed of necessary control information to the front of each data terminal. Grouping is also called packet. Packet is a data unit transmitted in the Internet. Because the header of packet contains important control information such as destination address and source address, each packet can independently choose the transmission path in the Internet and correctly deliver to the end of packet transmission.
- Internet can be divided into edge part and core part according to its working mode. The host is at the edge of the network, and its function is to process information. It is composed of a large number of networks and routers connecting these networks. Its function is to provide connectivity and switching.
- Computer communication is the communication between processes (running programs) in a computer. The communication mode of computer network is client server mode (C / S mode) and peer-to-peer connection mode (P2P mode).
- Both client and server refer to the application process involved in communication. The customer is the service requester and the server is the service provider.
- According to the different scope, computer network is divided into Wan, man, LAN and pan.
- The most commonly used performance indicators of computer network are: rate, bandwidth, throughput, delay (transmission delay, processing delay, queuing delay), delay bandwidth product, round trip time and channel utilization.
- Network protocol is protocol, which is a rule established for data exchange in network. Each layer of computer network and its protocol set are called network architecture.
- The five layer architecture consists of application layer, transportation layer, network layer, data link layer and physical layer. TCP and UDP are the most important protocols in transport layer, and IP is the most important protocol in network layer.
The following contents will introduce the five layer architecture of computer networkPhysical layer + data link layer + network layer (Internet layer) + transport layer + application layer。
2. Physical layer
2.1. Basic terms
- Data: the entity that carries the message.
- SignalElectrical or electromagnetic representation of data. In other words, the signal is suitable for transmission on the transmission medium.
- Code: when using the waveform in time domain (or time domain for short) to represent the digital signal, it represents the basic waveform of different discrete values.
- Simplex: there can only be communication in one direction without interaction in the opposite direction.
- Half duplex: both sides of the communication can send information, but they can’t send it at the same time (of course, they can’t receive it at the same time).
- Full duplex: both sides of communication can send and receive information at the same time.
- distortion: loss of authenticity, mainly refers to the received signal and the transmitted signal is different, there are wear and decay. The factors that affect the degree of distortion are: 1. Symbol transmission rate 2. Signal transmission distance 3. Noise interference 4. Transmission media quality
- Nessler criterionIn any channel, there is an upper limit on the efficiency of symbol transmission. If the transmission rate exceeds the upper limit, there will be serious inter symbol interference, which makes it impossible for the receiver to determine the symbol.
- Shannon’s theorem: in the channel with limited bandwidth and noise, in order not to produce error, the data transmission rate of information has an upper limit.
- Baseband signal: signal from the source. A digital or analog signal that is not modulated.
- Bandpass signal: after the baseband signal is modulated by carrier, the frequency range of the signal is moved to a higher frequency band for transmission in the channel (that is, only in a certain frequency range can pass through the channel), where the modulated signal is the band-pass signal.
- Modulation: the process of processing the information of the signal source and adding it to the carrier signal to make it suitable for transmission in the channel.
- Signal to noise ratio: the ratio of the average power of the signal to the average power of the noise, denoted as S / n. Signal to noise ratio (DB) = 10 * log10 (s / N).
- Channel multiplexing: multiple users share the same channel（ Not necessarily at the same time).
- Bit rate: the number of bits transmitted per second.
- Baud rate: the number of times the carrier modulation state changes per unit time. The modulation rate of carrier for data signal.
- Multiplexing: method of sharing channel.
- ADSL（Asymmetric Digital Subscriber Line ）: asymmetric digital subscriber line.
- Fiber coaxial hybrid network (HFC Network): a broadband access network for residents developed on the basis of CATV network with wide coverage
2.2. Summary of important knowledge points
- The main task of physical layer is to determine some characteristics related to transmission media interface, such as mechanical characteristics, electrical characteristics, functional characteristics and process characteristics.
- A data communication system can be divided into three parts: source system, transmission system and destination system. The source system includes source point (or source station, source) and transmitter, and the destination system includes receiver and destination.
- The purpose of communication is to deliver messages. For example, voice, text and image are all messages, and data is the entity that carries messages. Signal is the electrical or electromagnetic representation of data.
- According to the different values of the parameters representing the message in the signal, the signal can be divided into analog signal (or continuous signal) and digital signal (or discrete signal). When the time domain waveform is used to represent the digital signal, the basic waveforms representing different discrete values are called symbols.
- According to the way of information interaction, communication can be divided into one-way communication (or simplex communication), two-way alternate communication (or half duplex communication), two-way simultaneous communication (full duplex communication).
- The signal from the source is called baseband signal. The signal must be modulated to transmit on the channel. Modulation can be divided into baseband modulation and bandpass modulation. The most basic methods of band-pass modulation are amplitude modulation, frequency modulation and phase modulation. There are more complex modulation methods, such as quadrature amplitude modulation.
- To improve the transmission rate of data on the channel, better transmission media or advanced modulation technology can be used. But the data transmission rate can not be arbitrarily increased.
- Transmission media can be divided into two categories, namely guided transmission media (twisted pair, coaxial cable, optical fiber) and non guided transmission media (wireless, infrared, atmospheric laser).
- PON is widely used between optical trunk lines and users. Passive optical network does not need to be equipped with power supply, and its long-term operation cost and management cost are very low. The most popular passive optical networks are EPON and GPON.
2.3.1. What does the physical layer do?
The main thing the physical layer does isTransparently transmit bitstreams. The main tasks of the physical layer can also be described as determining some characteristics of the interface with the transmission media, namely: mechanical characteristics (some physical properties of the connector used for the interface, such as shape and size), electrical characteristics (the range of voltage appearing on each line of the interface cable), functional characteristics (the meaning of a certain level of voltage appearing on a certain line), Process characteristics (the order of occurrence of various possible events for different functions).
The physical layer considers how to transmit data bit stream on the transmission media connecting various computers, not the specific transmission media.There are many kinds of hardware devices and transmission media in the existing computer network, and there are many different ways of communication. The role of the physical layer is to shield the differences between these transmission media and communication means as much as possible, so that the data link layer above the physical layer can not feel these differences. In this way, the data link layer can only consider the completion of the protocols and services of this layer, without considering the specific transmission media and communication means of the network.
2.3.2. Several common channel multiplexing technologies
- Frequency division multiplexing (FDM): all users occupy different bandwidth resources at the same time.
- Time division multiplexing (TDM): all users occupy the same bandwidth at different times.
- Statistical TDMThe improved time division multiplexing can significantly improve the channel utilization.
- Code division multiplexing (CDM): users use different code types specially selected, so there is no interference between users. The signal sent by this system has strong anti-jamming ability, and its spectrum is similar to white noise, so it is not easy to be found by the enemy.
- Wavelength division multiplexing (WDM)WDM is the frequency division multiplexing of light.
2.3.3. Several commonly used broadband access technologies, mainly ADSL and FTTx
Asymmetric digital subscriber line (ADSL) is one of the broadband access methods for users to the Internet. The fast version of ASDL is very high speed digital subscriber line (VDSL), HFC (a residential broadband access network developed on the basis of CATV network with wide coverage) and FTTx (i.e. fiber to.
3. Data link layer
3.1. Basic terms
- LinkA physical link from a node to an adjacent node.
- Data linkAdd the hardware and software of the protocol to the link to form the data link.
- Cyclic redundancy check (CRC)In order to ensure the reliability of data transmission, CRC is a widely used error detection technology in data link layer.
- Frame: a data link layer transmission unit, consisting of a data link layer header and the packets it carries.
- MTU（Maximum Transfer Uint ）: maximum transfer unit. The maximum length of the data portion of the frame.
- BER (bit error rate): the ratio of transmitted error bits to the total number of transmitted bits over a period of time.
- PPP（Point-to-Point Protocol ）: peer to peer protocol. That is, the data link layer protocol used by user computer and ISP to communicate. The following is a schematic diagram of PPP frame:
- MAC address (media access control or medium access control)Media access control, or physical address or hardware address, is used to define the location of network devices. In the OSI model, the third network layer is responsible for IP address, and the second data link layer is responsible for MAC address. Therefore, a host will have a MAC address, and each network location will have a dedicated IP address. Address is an important identifier to identify a system. “The name indicates the resource we are looking for, the address indicates where the resource is, and the route tells us how to get there.
- BridgeA network interconnection device used for data link layer relay and connecting two or more LANs.
- SwitchIn a broad sense, the switch refers to a communication system to complete the exchange of information equipment. Here, the switch working in the data link layer refers to the switching hub, which is essentially a multi interface bridge
3.2. Summary of important knowledge points
- A link is a physical link from a node to an adjacent node. The data link adds some necessary hardware (such as network adapter) and software (such as Protocol Implementation) to the link
- The data link layer mainly usesPoint to point channelandBroadcast channelTwo.
- The protocol data unit of data link layer transmission is frame. There are three basic problems in the data link layerEncapsulation into frames，Transparent transmissionandError detection
- Cyclic redundancy check CRCIt is a method of error detection, and the Frame Check Sequence FCS is the redundant code added after the data
- Point to Point Protocol PPPIt is a protocol that is most used in data link layer. Its characteristics are: simple, only detect errors without correcting errors, no serial number, no flow control, and can support multiple network layer protocols at the same time
- PPPoE is a link layer protocol for broadband Internet hosts
- The advantages of LAN are: it has broadcasting function and can easily access the whole network from one station; It is convenient for the expansion and gradual evolution of the system; It improves the reliability, availability and survivability of the system.
- The communication between the computer and the external LAN needs communication adapter (or network adapter), which is also called network interface card or network card.The hardware address of the calculator is in the ROM of the adapter。
- Ethernet adopts the connectionless working mode, which does not number the transmitted data frames and does not require the other party to send back the confirmation. When the destination station receives the frame with error, it discards it and does nothing else
- The protocol used in Ethernet has the function of conflict detectionCarrier monitoring multi access CSMA / CD. The characteristics of the agreement are as follows:Monitor before sending, and monitor while sending. Once a collision occurs on the bus, stop sending immediately. Then according to the backoff algorithm, wait for a period of random time to send again.Therefore, each site in their own data after a short period of time, there is the possibility of this encounter collision. Equal contention for Ethernet channel among Ethernet stations
- Ethernet adapter has filtering function, it only receives unicast frame, broadcast frame and multicast frame.
- Ethernet can be expanded in physical layer by using hub (the expanded Ethernet is still a network)
- The characteristics of point-to-point channel and broadcast channel in data link layer, and the characteristics of protocols (PPP protocol and CSMA / CD protocol) used in these two channels
- There are three basic problems in the data link layerEncapsulation into frames，Transparent transmission，Error detection
- MAC layer hardware address of Ethernet
- Function and application of adapter, repeater, hub, bridge and Ethernet switch
4. Network layer
4.1. Basic terms
- Virtual circuit: a two-way transparent transmission channel established between the logical or physical ports of two terminal devices. The virtual circuit indicates that this is only a logical connection, and the packets are transmitted along this logical connection according to the store and forward mode, rather than establishing a physical connection.
- IP（Internet Protocol ）IP is one of the two most important protocols in TCP / IP system, and it is the core of Internet layer in TCP / IP architecture. There are ARP, RARP, ICMP, IGMP.
- ARP（Address Resolution Protocol）: address resolution protocol. ARP resolves IP address to hardware address.
- ICMP（Internet Control Message Protocol ）ICMP: Internet control message protocol (ICMP allows hosts or routers to report errors and provide reports on abnormal conditions).
- Subnet mask: it is used to indicate which bits of an IP address identify the subnet of the host and which bits identify the bit mask of the host. The subnet mask cannot exist alone, it must be used in combination with the IP address.
- **CIDR (classless inter domain routing) * *: classless inter domain routing (characterized by eliminating the traditional class A, class B and class C addresses and the concept of subnet division, and using various lengths of “network prefix” to replace the network number and subnet number in the classified address).
- Default route: when the route that can reach the destination address is not found in the route table, the route selected by the router. The default routing can also reduce the space occupied by the routing table and the time spent searching the routing table.
- Routing algorithm (virtual circuit): the core part of routing protocol. Internet adopts adaptive and hierarchical routing protocol.
4.2. Summary of important knowledge points
- The network layer of TCP / IP protocol only provides simple, flexible, connectionless datagram service. The network layer does not provide the commitment of quality of service, does not guarantee the time limit of packet delivery, and the packets transmitted may be wrong, lost, repeated and out of order. The transport layer is responsible for the reliability of communication between processes
- There are two kinds of delivery in the Internet. One is direct delivery in the network without router, the other is indirect delivery with other networks, at least through one router, but the last time must be direct delivery
- The classified IP address consists of network number field (indicating network) and host number field (indicating host). The first category in the network number field indicates the category of the IP address. IP address is a hierarchical address structure. When the IP address management organization allocates IP address, it only allocates the network number, and the host number is allocated by the unit that obtains the network number. The router forwards packets according to the network number of the destination host. A router is connected to at least two networks, so a router should have at least two different IP addresses
- IP datagram is divided into two parts: header and data. The first part of the header is fixed length, 20 bytes in total, which all IP packets must have (source address, destination address, total length and other important sectors are fixed in the header). Some optional fields of variable length are fixed behind the header. The lifetime of IP header gives the maximum number of routers IP datagram can pass through in the Internet. It can prevent IP datagram from circling in the Internet.
- ARP resolves IP address to hardware address. ARP cache can greatly reduce the network traffic. Because this enables the host to find the required hardware address directly from the cache next time it communicates with the host with the same address, instead of sending ARP request packets in broadcast mode
- CIDR is a good way to solve the shortage of IP address. CIDR notation adds a slash after the IP address, and then writes the number of digits occupied by the prefix. The prefix (or network prefix) is used to indicate the network, and the part after the prefix is the suffix, which is used to indicate the host. CIDR forms a “CIDR address block” with the same prefix of continuous IP addresses, and the IP address allocation is based on the CIDR address block.
- Internet control message protocol is a protocol of IP layer. ICMP message is the data of IP datagram, and it is sent out as IP datagram after adding the header. ICMP datagram is not used for reliable transmission. ICMP allows hosts or routers to report error conditions and provide reports on abnormal conditions. There are two kinds of ICMP messages: ICMP error report message and ICMP inquiry message.
- To solve the problem of IP address exhaustion, the most fundamental way is to adopt a new version of IP protocol with larger address space – IPv6.The changes brought by IPv6 include: 1) larger address space (128 bit address); 2) flexible header format; 3) improved options; 4) support for plug and play; 5) support for resource pre allocation; 6) change the header of IPv6 to 8-byte alignment.
- Virtual private network VPN uses the public Internet as the communication carrier between private networks of the organization. The private address of Internet is used in VPN. A VPN must have at least one router with a legal global IP address, so that it can communicate with another VPN of the system through the Internet. All data transmitted over the Internet needs to be encrypted.
- The characteristics of MPLS are: 1) supporting connection oriented QoS; 2) supporting traffic engineering and balancing network load; 3) effectively supporting VPN. MPLS marks each IP datagram with a fixed length at the entry node, and then forwards it in the second layer (link layer) with hardware (label switching in label switching router) according to the mark, so the forwarding speed is greatly accelerated.
5. Transport layer
5.1. Basic terms
- Process: refers to the running program entity in a computer.
- Application processes communicate with each other: the process of exchanging data between a process of a host and a process of another host (in addition, note that the real endpoint of communication is not the host but the process of the host, that is to say, the end-to-end communication is the communication between application processes).
- Multiplexing and demultiplexing of transport layerMultiplexing means that different processes of the sender can transfer data through a unified transport layer protocol. Demultiplexing means that the transport layer of the receiver can correctly deliver the data to the destination application process after stripping the header of the message.
- TCP（Transmission Control Protocol）: transmission control protocol.
- UDP（User Datagram Protocol）: user datagram protocol.
- **Port * *: the purpose of the port is to confirm which process the other machine is. For example, the ports of MSN and QQ are different. If there is no port, there may be interaction errors between QQ process and MSN. Port is also called protocol port number.
- Stop and wait protocol: the sender stops sending every packet, waits for the other party’s confirmation, and sends the next packet after receiving the confirmation.
- flow control: let the sender’s sending rate not be too fast, let the receiver receive in time, and do not cause network congestion.
- congestion control : prevent excessive data injection into the network, so that the router or link in the network will not overload. Congestion control has a premise that the network can bear the existing network load.
5.2. Summary of important knowledge points
- Transport layer provides logical communication between application processes, that is to say, the communication between transport layers does not really transfer data directly between two transport layers. The transport layer shields the details of the following network from the application layer (such as network extension, routing protocol, etc.), which makes the application process look as if there is an end-to-end logical communication channel between two transport layer entities.
- The network layer provides logical communication for hosts, while the transport layer provides end-to-end logical communication between application processes.
- Two important protocols of transport layer are UDP and TCP. According to OSI terminology, the data unit transmitted by two peer transport entities during communication is called transport protocol data unit (TPDU). But in TCP / IP system, according to the protocol used is TCP or UDP, it is called TCP packet segment or UDP user datagram respectively.
- UDP does not need to establish a connection before transmitting data, and the remote host does not need to give any confirmation after receiving the UDP message. Although UDP does not provide reliable delivery, in some cases UDP is the most effective way to work. TCP provides connection oriented services. The connection must be established before data transmission and released after data transmission. TCP does not provide broadcast or multicast services. Because TCP wants to provide reliable, connection oriented transport services, this technology increases a lot of overhead, such as acknowledgement, flow control, timer and connection management. This not only makes the header of protocol data unit increase a lot, but also takes up a lot of processor resources.
- The hardware port is the interface for different hardware devices to interact, while the software port is an address for various protocol processes of application layer to interact with transport entities. UDP and TCP have two important fields, source port and destination port. When the transport layer receives the transport layer message from the IP layer, it can deliver the data to the destination application layer of the application layer according to the destination port number in its header（ In order to communicate between two processes, it is necessary to know not only the IP address but also the port number (in order to find the application process in the other computer)
- The transport layer marks a port with a 16 bit port number. Port number only has local meaning, it is just to mark the interface between each process in computer application layer and transportation layer. In different computers on the Internet, the same port number is not associated. Protocol port number is referred to as port. Although the end of communication is the application process, as long as the message is sent to a suitable port of the destination host, the rest of the work (finally delivered to the destination process) will be completed by TCP and UDP.
- The port number of the transport layer is divided into the port number used by the server (0 ~ 1023 is assigned to the familiar port, 1024 ~ 49151 is the registered port number) and the port number temporarily used by the client (49152 ~ 65535)
- The main characteristics of UDP are: 1) no connection; 2) best effort delivery; 3) message oriented; 4) no congestion control; 5) support of one-to-one, one to many, many to one and many to many interactive communication; 6) low overhead of header (only four fields: source port, destination port, length and check sum)
- The main characteristics of TCP are: (1) connection oriented; (2) each TCP connection can only be one-to-one; (3) reliable delivery; (4) full duplex communication; (5) byte stream oriented
- TCP uses the IP address of the host plus the port number of the host as the endpoint of the TCP connection. Such an endpoint is called a socket or socket. Socket is represented by (IP address: port number). Each TCP connection is uniquely determined by two endpoints at both ends of the communication.
- Stop wait protocol is to achieve reliable transmission. Its basic principle is to stop sending every packet and wait for the other party’s confirmation. Send the next group after receiving the confirmation.
- In order to improve the transmission efficiency, the sender can use pipeline transmission instead of the inefficient stop wait protocol. Pipeline transmission means that the sender can send multiple packets continuously without stopping to wait for the other party’s confirmation after each packet is sent. In this way, data can be transmitted continuously on the channel. This transmission mode can significantly improve the channel utilization.
- In the stop wait protocol, time-out retransmission refers to retransmission of previously sent packets as long as the acknowledgement is not received after a period of time. Therefore, it is necessary to set a time-out timer after each packet is sent, and its retransmission time should be longer than the average round-trip time of data in packet transmission. This automatic retransmission mode is often called automatic retransmission request ARQ. In addition, in the stop wait protocol, if a duplicate packet is received, it is discarded, but at the same time, an acknowledgement is sent. Continuous ARQ protocol can improve channel utilization. Sending maintains a sending window, where packets located in the sending window can be sent out continuously without waiting for the other party’s confirmation. Generally, the receiver uses cumulative acknowledgement to send acknowledgement to the last packet arriving in sequence, indicating that all packets to this packet location have been received correctly.
- The first 20 bytes of TCP packet segment are fixed, and the last 4N bytes are added as needed. Therefore, the minimum length of TCP header is 20 bytes.
- TCP uses sliding window mechanism. The serial number in the sending window indicates the serial number allowed to be sent. The back part of the back edge of the sending window indicates that it has been sent and received an acknowledgement, while the front part of the front edge of the sending window indicates that it is not allowed to send. There are two possibilities for the change of the trailing edge of the sending window, that is, not moving (no new confirmation received) and moving forward (new confirmation received). The leading edge of the sending window is always moving forward. Generally speaking, we always want the data to be transmitted faster. But if the sender sends the data too fast, the receiver may not have time to receive it, which will cause data loss. The so-called flow control is to let the sender’s sending rate not too fast, so that the receiver can receive.
- In a certain period of time, if the demand for a resource in the network exceeds the available part that the resource can provide, the performance of the network will deteriorate. This situation is called congestion. Congestion control is to prevent excessive data injection into the network, so that the router or link in the network will not overload. Congestion control has a premise that the network can bear the existing network load. Congestion control is a global process, involving all hosts, all routers, and all factors related to reducing network transmission performance. On the contrary, traffic control is often a point-to-point traffic control, which is an end-to-end problem. What the flow control should do is to suppress the rate of data sent by the sender so that the receiver can receive it in time.
- In order to control congestion, TCP sender maintains a state variable of congestion window CWnd. The size of congestion control window depends on the degree of network congestion and changes dynamically. The sender chooses its sending window as the smaller of the congestion window and the receiver’s receiving window.
- TCP congestion control adopts four algorithms: slow start, congestion avoidance, fast retransmission and fast recovery. In the network layer, routers can also adopt appropriate packet dropping strategies (such as active queue management AQM) to reduce the occurrence of network congestion.
- There are three stages of transport connection: connection establishment, data transmission and connection release.
- The application process that initiatively initiates TCP connection establishment is called client, while the application process that passively waits for connection establishment is called server. TCP connection adopts three message handshake system. The server should confirm the user’s connection request, and then the client should confirm the server’s confirmation.
- TCP connection release adopts four message handshake system. Either party can send a connection release notice after the end of data transmission, and enter the semi closed state after the other party’s confirmation. When the other party has no data to send, it will send a connection release notice. After the other party confirms, it will completely close the TCP connection
5.3. Supplement (important)
The following knowledge points need to be focused on:
- The meaning of port and socket
- Differences between UDP and TCP and their application scenarios
- In the unreliable network to achieve reliable transmission principle, stop waiting protocol and ARQ protocol
- TCP’s sliding window, flow control, congestion control and connection management
- TCP’s three handshakes and four handshakes
6. Application layer
6.1. Basic terms
- Domain name system (DNS): the domain name system (DNS) integrates human readable domain names (e.g, www.baidu.com ）Convert to a machine-readable IP address (for example, 188.8.131.52). We can think of it as a phonebook designed for the Internet.
- File transfer protocol (FTP)FTP is the English abbreviation of file transfer protocol, and the Chinese abbreviation is “Text Transfer Protocol”. It is used for bidirectional transmission of control files on Internet. At the same time, it is also an application. There are different FTP applications based on different operating systems, and all these applications follow the same protocol to transfer files. In the use of FTP, users often encounter two concepts: “download” and “Upload”“ To download a file is to copy a file from a remote host to your own computer Upload “file is to copy the file from your own computer to the remote host. In Internet language, users can upload (download) files to (from) remote host through client program.
- Simple file transfer protocol (TFTP): TFTP (simple file transfer protocol) is one of the TCP / IP protocols for simple file transfer between client and server. It provides uncomplicated and low cost file transfer service. The port number is 69.
- Remote terminal protocol (telenet)Telnet protocol is a member of TCP / IP protocol family. It is the standard protocol and main mode of Internet remote login service. It provides users with the ability to complete the remote host work on the local computer. Use the telnet program on the end user’s computer to connect to the server. The end user can input commands in the telnet program, and these commands will run on the server, just as they are input directly on the server console. You can control the server locally. To start a telnet session, you must enter a user name and password to log in to the server. Telnet is a common method to control web server remotely.
- World Wide Web (WWW)WWW is the abbreviation of global information network (also known as “Web”, “www”, “W3”, English full name is “World Wide Web”), Chinese name is “World Wide Web”, “global network”, etc., often referred to as web. It is divided into web client and web server. Www allows web clients (common browsers) to access and browse pages on Web servers. It is a system composed of many hypertext links, which can be accessed through the Internet. In this system, every useful thing is called a “resource”; It is identified by a global uniform resource identifier (URI); These resources are transmitted to users through hypertext transfer protocol, which obtains resources by clicking the link. World Wide Web Consortium (W3C) is also called W3C Council. In October 1994, it was established in MIT Computer Science Laboratory. The founder of the World Wide Web Consortium is Tim Berners Lee, the inventor of the world wide web. The world wide web is not the same as the Internet. The world wide web is only one of the services that the Internet can provide. It is a service that runs on the Internet.
- The general work of the world wide web is as follows:
- Uniform resource locator (URL)Uniform resource locator is a concise expression of the location and access method of resources available on the Internet. It is the address of standard resources on the Internet. Every file on the Internet has a unique URL, which contains information about the location of the file and what the browser should do with it.
- Hypertext Transfer Protocol (HTTP)HTTP (Hypertext Transfer Protocol) is the most widely used network protocol on the Internet. All www files must comply with this standard. HTTP was originally designed to provide a way to publish and receive HTML pages. In 1960, American Ted Nelson conceived a method of processing text information by computer, and called it hypertext, which became the development foundation of HTTP hypertext transfer protocol standard architecture.
The essence of HTTP protocol is a communication format agreed between browser and server. The principle of HTTP is shown in the following figure:
- Proxy serverProxy server is a kind of network entity, which is also called World Wide Web cache. The proxy server temporarily stores some recent requests and responses on the local disk. When a new request arrives, if the proxy server finds that the request is the same as the temporarily stored request, it will return the temporary response instead of going to the Internet again to access the resource according to the URL address. Proxy server can work in client or server, or in intermediate system.
- Simple Mail Transfer Protocol (SMTP)SMTP (Simple Mail Transfer Protocol) is a set of rules for sending mail from source address to destination address, which controls the transfer mode of mail. SMTP protocol belongs to TCP / IP protocol cluster, which helps each computer find the next destination when sending or transferring letters. Through the server specified by the SMTP protocol, e-mail can be sent to the recipient’s server. The whole process takes only a few minutes. The SMTP server is a sending mail server that follows the SMTP protocol and is used to send or transfer e-mail.
- Search EnginesSearch engine is a system that collects information from the Internet according to certain strategies and using specific computer programs, organizes and processes the information, provides users with retrieval services, and displays users’ retrieval related information. Search engines include full-text index, catalog index, meta search engine, vertical search engine, collective search engine, portal search engine and free link list.
- Vertical search engineVertical search engine is a professional search engine for a certain industry. It is a subdivision and extension of the search engine. It is an integration of a certain kind of special information in the web page database. It extracts the required data by directional sub field for processing, and then returns it to the user in some form. Vertical search is a new search engine service mode, which is relative to the general search engine with large amount of information, inaccurate query and insufficient depth. It provides valuable information and related services for a specific field, a specific population or a specific demand. Its characteristics are “specialized, refined and deep”, and it has the color of industry. Compared with the massive information disorder of general search engines, vertical search engines are more focused, specific and in-depth.
- Full text indexFull text index is the key technology of search engine. Just imagine that it may take a few seconds to search a word in a 1m file, and it may take dozens of seconds to search a 100m file. If you search in a larger file, you will need more system overhead, which is unrealistic. So in such a contradiction, full-text indexing technology, sometimes called inverted document technology.
- Index of contents: search index / directory, as the name suggests, is to store the website in the corresponding directory by categories. Therefore, when users query information, they can choose keyword search or search by category.
6.2. Summary of important knowledge points
- File transfer protocol (FTP) uses TCP reliable transport service. FTP uses client server mode. One FTP server process can serve multiple users at the same time. In the process of file transfer, two parallel TCP connections should be established between FTP client and server: control connection and data connection. What is actually used to transfer files is the data connection.
- HTTP is the protocol used in the interaction between client and server in the world wide web. HTTP uses TCP connection for reliable transmission. But HTTP itself is connectionless and stateless. Http / 1.1 protocol uses continuous connection (divided into non pipeline mode and pipeline mode)
- The e-mail sends the e-mail to the e-mail server used by the recipient, and puts it in the recipient’s mailbox. The recipient can access the Internet at any time to read it from the e-mail server used by himself, which is equivalent to e-mail.
- An e-mail system has three important components: user agent, e-mail server and e-mail protocol (including e-mail sending protocol, such as SMTP, and e-mail reading protocol, such as POP3 and IMAP). Both the user agent and the mail server run these protocols.
6.3. Supplement (important)
The following knowledge points need to be focused on:
- Common protocols of application layer (focusing on HTTP protocol)
- DNS – resolve IP address from domain name
- The general process of visiting a website
- System call and API concept
My open source project recommendation
- JavaGuideJava learning and interview guide covers the core knowledge that most Java programmers need to master. Prepare for Java interview, choose javaguide!
- guide-rpc-framework: a custom RPC framework implemented by netty + kyro + zookeeper
- jsoncat: a lightweight HTTP framework imitating spring boot but different from spring boot
- programmer-advancementProgrammer should have some good habits + interview must know!
- springboot-guide: not only spring boot but also important knowledge of spring
- awesome-java: collection of awesome java project on GitHub
Original pdf download address of some computers:https://pan.baidu.com/s/1dDoGv-Qlz2pJOcLJHHpxpwPassword: LLST