Illustrated HTTP authoritative guide (3) | web server processing and response to HTTP requests

Time:2021-8-23

Introduction to the author

Mr. lemon, senior operation and Maintenance Engineer (self proclaimed), SRE expert (target), dreams of buying a Porsche at the age of 35. I like to study the underlying technology and think that the underlying foundation is the king. All new technologies are inseparable from operating system (CPU, memory, disk), network, etc. Adhere to input and output, record your learning, and keep moving forward in the ordinary. One day, you will meet yourself who is different. Official account: operation and maintenance Wang (ID:Leeeee_) Li)。

                                                                                         Illustrated HTTP authoritative guide (3) | web server processing and response to HTTP requests

1、 Question

Explain how the web server handles HTTP transactions

2、 Web server

The web server processes the HTTP request and provides a response. Seven steps for web server requests:

1) Accept client connections

2) Receive request message

3) Processing requests

4) Resource mapping and access

5) Build response

6) Send response

7) Log

Illustrated HTTP authoritative guide (3) | web server processing and response to HTTP requests

1. Accept client connections

1) Process new connections

When a client requests a TCP connection to the web server, the web server will establish a connection, judge which client is at the other end of the connection, and resolve the IP address from the TCP connection. Once the new connection is established and accepted, the server will add the new connection to its existing web server connection list, ready to monitor the data transmission on the connection. The web server can reject or immediately close any connection at will. Some web servers close connections because the client IP address or host name is unauthenticated, or because it is a known malicious client. Web servers can also be identified using other technologies.

2) Client hostname identification

Most web servers can be configured with “reverse DNS” to convert client IP addresses to client host names.The web server can use the client hostname for detailed access control and logging.It should be noted that host name lookup may take a long time, which will reduce the speed of web transaction processing. Many high-capacity web servers prohibit host name resolution or only allow resolution of specific content.

Configuration instructions can be usedHostnamelookupsEnable Apache’s host lookup feature. For example, only the host name resolution function of HTML and CGI resources is turned on.

HostnameLookups off
<Files ~ "\.(html|htm|cgi)$">
   HostnameLookups on 
</Files>

3) Identify client users through ident

The server can find the user name that initiates the HTTP connection through the ident protocol. This information is particularly useful for web server logging. The second field of the popular general log format contains the ident user name of each HTTP request.

Illustrated HTTP authoritative guide (3) | web server processing and response to HTTP requests

1) The client opens an HTTP connection

2) The server opens its connection to the client ident service port 113

3) The server sends a simple request for the user name corresponding to the new connection, and parses the response containing the user name from the client

Apache’s identitycheck on instruction tells the Apache Web server to use the ident lookup function. If no ident information is available, Apache will fill the ident log field with hyphens (-). If no ident information is available, the second field is usually a hyphen in a log file using a common log format.

2. Receive request message

When data arrives on the connection, the web server will read the data from the network connection and parse the contents of the request message.

Illustrated HTTP authoritative guide (3) | web server processing and response to HTTP requests

1) Work of web server when parsing request message

1. Parse the request line, find the request method, the specified resource identifier (URI) and version number. Each item is separated by a space, and the end of the line is a carriage return line feed (CRLF) sequence;

2. Read the header of the message ending with CRLF;

3. An empty line ending with CRLF and ending with the identification header is detected;

4. Read request body

2) Internal representation of message

Illustrated HTTP authoritative guide (3) | web server processing and response to HTTP requests

3. Connected I / O processing structure

High performance web servers can support thousands of connections at the same time, and each client has one or more connections open to the server. Different web servers serve requests in different ways.

Single threaded web server:A single threaded web server processes only one request at a time until it completes. After a transaction is completed, the next connection is processed.

Multi process and multi thread web server: multiprocess and multithreaded web servers process requests simultaneously with multiple processes, or more efficient threads.

Servers that reuse I / O:In the reuse structure, the activity on all connections should be monitored at the same time. When the state of the connection changes, the connection is processed. After the processing, the connection is returned to the open connection list and waits for the next state change. Only when there is something to do, the connection will be processed. When waiting on an idle connection, threads and processes will not be bound.

Reusable multithreaded web server:Combine multithreading and reuse functions, and use multiple CPUs of the computer. Each of multiple threads (usually a physical processor) observes open connections and performs tasks on each connection (or a subset of open connections).

Illustrated HTTP authoritative guide (3) | web server processing and response to HTTP requests

3. Processing requests

When the web server receives the request, it processes the request according to the method, resource, header and optional body part.

4. Mapping and access to resources

1) Docroot (root directory of the document)

The file system of the web server has a directory dedicated to storing web content, which is called the document root directory (document root, or docroot). When the root directory of a web server is / usr / local / httpd / files and a request for / special / s.gif arrives, its access is shown in the figure below

Illustrated HTTP authoritative guide (3) | web server processing and response to HTTP requests

In Apache, add a line DocumentRoot to the configuration file httpd.conf to set the root directory of the document

DocumentRoot /usr/local/httpd/files

2) Virtual managed docroot

A virtual hosted web server will provide multiple web sites on the same web server, and each site has its own document root directory on the server. The virtual managed web server identifies the correct document root to use based on the URI or the IP or hostname of the host header. In this way, the URI of the timely request is the same, and two web sites hosted on the same web server can also have completely different memory.

Illustrated HTTP authoritative guide (3) | web server processing and response to HTTP requests

When a requests to come in, get / Doc / AAA / index.html of the server

B when requesting to come in, get / Doc / BBB / index.html of the server

3) User’s home directory docroot

When docroot provides a private web site, it usually starts with a slash and a tilde (/ ~), such as

Illustrated HTTP authoritative guide (3) | web server processing and response to HTTP requests

When a requests to come in, get / home / Mary / index.html of the server

When B requests to come in, get / home / Ken / index.html of the server

The configuration in Apache is as follows:

<VirtualHost www.aaa.com> 
    ServerName www.aaa.com 
    DocumentRoot /doc/aaa
    TransferLog /logs/aaa.access_log 
    ErrorLog /1ogs/aaa.error_log
</VirtualHost>

<VirtualHost www.bbb.com> 
    ServerName www.bbb.com 
    DocumentRoot /doc/bbb
    TransferLog /logs/bbb.access_log 
    ErrorLog /1ogs/bbb.error_log
</VirtualHost>

5. Build response

Once the web server recognizes the resource, it performs the actions described in the request method and returns the response message. The response message contains the status code, the response header, and the response body if the response body is generated.

1) Response subject

If the transaction generates a response body, the content will be sent back in the response message. If there is a response subject, the response message usually includes:

a) Describes the content type header of the MIME type of the response body;

b) The content length header of the response body length is described;

c) Subject content of actual message

2) MIME type

The web server is responsible for determining the MIME type of the response body.

Illustrated HTTP authoritative guide (3) | web server processing and response to HTTP requests

a) Determine MIME type based on extension

The web server can specify MIME types with file extensions. The web server scans a MIME type file containing all extensions for each resource to determine the MIME type. The web server uses a MIME type file to set the content type header of the resource output.

b) Magic classification

The Apache Web server scans the contents of each resource and matches it with a known pattern table (Magic file) to determine the MIME type of each file.

c) Display classification

Configure the web server so that it does not consider the file extension and content, and force the content of a specific file or directory to have a MIME type.

d) Type negotiation

By configuring the web server, it can decide which format to use and the related MIME type through negotiation with the user.

3) Redirect

The web server sometimes returns a redirect response instead of a successful message. The web server can redirect the browser elsewhere to execute the request. The redirection response code is 3xx series. The location response header contains the URL of the new address or preferred address of the content.

a) Permanently deleted resources

The resource may have been moved to a new location or renamed with a new URL. The web server can tell the client that the resource has been renamed, so that the client can update information such as bookmarks before obtaining the resource from the new address. The status code 301 moved permanently is used for such redirection.

b) Temporarily deleted resources

The resource was temporarily moved or renamed. The server may want to redirect the client to a new location. However, because the renaming is temporary, the server hopes that the client can return to the old URL in the future and do not update the bookmark. Status 303 see other and status code 307 temporary redirect are used for such redirection.

c) URL enhancement

The server usually rewrites the URL with redirection, which is often used to embed the context. When the request arrives, the server will generate a new URL containing embedded status information and redirect the user to the new URL. The client will follow the redirection information and reissue the request, but this time the request will contain a complete, status enhanced URL. Status 303 see other and status code 307 temporary redirect are used for such redirection.

d) Load balancing

An overloaded server receives a request, and the server can redirect the client to a less heavily loaded server. Status 303 see other and status code 307 temporary redirect are used for such redirection.

e) Server Association

There may be local information of some users on the web server. The server can redirect the client to the server containing that client information. Status 303 see other and status code 307 temporary redirect are used for such redirection.

f) Specification catalog name

When the URL requested by the client is a directory name without a trailing slash, most web servers redirect the client to a slashed URL, so that the relative link can work normally.

6. Send response

The web server sends data through a connection. Sending data is the same as receiving data. There may be multiple connections to the client, some are idle, some are sending data to the server, and some are returning response data to the client. To record the connection status, the server also needs to pay special attention to the processing of persistent connections. For non persistent connections, the server should close its own connection after sending the whole message.

7. Log

When the transaction ends, the web server adds an entry in the log file to describe the executed transaction.