What is the state in HTTP stateless?



Recently, when I was learning about HTTP, I found that I couldn’t understand the first sentence [HTTP protocol is stateless, connectionless] about http: what does stateless [state] mean?!

Looking for a lot of information, not only did they not find anyone who answered this question directly, but also some explanations were full of all kinds of mistakes. Looking at them, they felt that they were holding a turbid breath in their heart and could not spit it out

So after reading a lot of data, I put forward this question: what exactly does the [state] in HTTP protocol stateless mean?!

And then start to explore and solve this problem…

Finally, I am very happy that I have found a satisfactory answer. Let’s sell it first. If you are in a hurry, you can directly pull it to the bottom

What exactly does [state] in HTTP protocol stateless mean?!

Let’s look at the other two concepts of this sentence: (the standard HTTP protocol is stateless and connectionless)

1. The standard HTTP protocol refers to the HTTP protocol that does not include cookies, sessions and applications. They are not standard protocols, although various network application providers, implementation languages and web containers support it by default

2. What is connectionless

(1) Every access is connectionless. The server processes the access in the access queue one by one. After processing one, it closes the connection. This is over, and then it processes the next new access

(2) The meaning of connectionless is to limit each connection to only one request. After the server processes the client’s request and receives the client’s response, it disconnects

For [stateless], I see many vague statements (official or tutorial statements) across a layer of frosted glass, which are very uncomfortable (but actually right) (later I found out why I find it uncomfortable, because they introduce a lot of new, and obviously a broad noun that may be used in many places. These words are most useful It’s confusing concepts. I’ll mark them below.)

1. Protocol has no memory for transaction processing

2. No context for the same URL request

3. Each request is independent, and its execution and result are not directly related to the previous request and the subsequent request. It will not be directly affected by the previous request response or the subsequent request response

4. The status of the client is not saved in the server. The client must bring its own status to request the status of the server each time

I have to get an exact and specific explanation!

These points give me the next thinking direction:

1. [the status of the client is not saved in the server, so the client must bring its own status to request the server every time] does the status of the client here exactly mean that the server does not save the client’s information? But obviously not

2. [the stateless feature of HTTP seriously hinders the implementation of these applications. After all, interaction needs to be a link between the past and the future. Simple shopping cart programs also need to know what products users have chosen before] I question why stateless shopping cart can’t be implemented? Can’t the server store things?

3. [each request is independent, and its execution and result is not directly related to the previous request and the subsequent request]. I think this statement is more reliable, but the so-called “no relationship between different requests” means that the content of the request has no relationship, or just the request itself has no relationship?

(1) The content of the request doesn’t matter. It’s only possible that there is no user data on the server, but obviously there is

(2) What’s the point of the request? What’s the value of each request?

According to this direction, I did a simulation visit experiment: if there is no cookie, no session and only HTTP, these things will happen when a registered user visits the shopping website:

1. Premise:

(1) The server must have set up a data table for each registered user to record the user’s data

(2) HTTP is connectionless

2. The first step is to log in

The user sends the user’s user name and password to the server through HTTP, and the server compares them with the user’s data stored in it. If they are consistent, the information will be returned, and the login is successful

3. Then the user clicks on a product page

(1) This action is equivalent to entering the URL of a product page

(2) If the product page is relatively confidential and not open to the public, you need to be a user to access it

(3) Although HTTP can transfer the user name and password, and it has just been entered, and the verification is successful, because the server will not remember your login status, and your client will not store the user name and password you just entered

(4) So this visit, because your identity cannot be determined, can only visit failed

At this time, if you want to solve this problem, and there is no cookie and no session, you can only continue to bring your user name and password (continue to input) while visiting the website, just like my current app

4. Suppose the problem in the previous step is solved, that is, you will manually enter the user name and password every time you visit. Then the current situation is: you have selected several items in your shopping cart, and you want to add another item, so you click the plus sign next to a certain item

(1) This action is also equivalent to entering a web address. The content of the web address is to send a request to add the product to your shopping cart

(2) The system first uses your user name and password to verify your identity, then accesses your database, and adds a piece of data under the shopping cart attribute, which is the data of this commodity

(3) After the operation is completed, return to the success of the operation and end the access

5. OK, at the end of the experiment, it seems that we can solve the problem without cookie and session. In fact, there are big problems in both operations

(1) You need to input the user name and password in the client every time you access the content that needs permission. There is no need to repeat the tedious of this item

(2) Every time you operate, you have to interact with the database at the bottom of the system

There is a great waste of performance in a few visits. It’s very easy to think that a large number of operations must be more efficient, so I think of the cache

(3) Your unimportant and trivial data are also written into the database, together with your main data

Adding and deleting shopping carts again and again is only related to your browsing or this session. It’s temporary data and has nothing to do with the user’s main information. They have no value. It’s purely redundant data (some companies think that this kind of data also has great value and can be used skillfully). We need to know how to store these temporary data It’s also easy to think of buffers

After this simulation interview experiment, combined with the previous thinking direction, we know three points:

1. There must be user’s data on the server, and it can handle the addition, deletion, modification and query that you submit. Therefore, the status of [Client‘s status is not saved in the server] does not mean the user’s data, and our guess is wrong

2. Our query is right. Stateless shopping cart can be realized through the user data stored on the server

3. However, there are three major problems in using the above method to realize shopping cart. Therefore, we can’t help thinking, is the solution of these three problems related to the word “state” that we don’t know exactly? So, next, we will explore the meaning of state by solving these three problems

As mentioned above, we can add some mechanisms on the basis of HTTP to solve the above three problems

1. It is very necessary to add a logbook to the client. Just like this, the official cookie mechanism is also used to identify the identity of the visitor as discussed above

2. Adding a cache to the server can solve the latter two problems at the same time

(1) With this buffer as a data buffer, you don’t need to visit the database again and again, which wastes a lot of computer resources. Instead, you put it into the database in the end

(2) With this buffer, you don’t need to put the temporary data into the database. You just need to sort out the data and put the useful data into the database after your communication is over

WeChat official account: Java technology stack, back in the background: http, can get my N HTTP tutorial, all dry cargo.

3. Here, an important concept is naturally extended: session, as a buffer, is separated from the database. The reason is not hard. It has its unique, important and irreplaceable role. This is exactly the same as the official session mechanism

(1) In addition, a very confusing understanding of the main role of session is that the value of session is to assign a session ID instead of user name and password to visitors

(2) Why is it so confusing? Because session does do it, and it also plays a great role, so it’s right, but it’s only half right, and it doesn’t involve the essence of the problem. This situation is the most dangerous (it seems very convincing, persuades you, so it’s hard for you to have the motivation to continue to look, but the real situation deviates from it, but The deviation is not big, so it’s hard to persuade you back. There’s only something wrong. At this time, you are closest to and farthest from reality.)

(3) By the way, why it is right is another useful thing to do with session

(a) Give each session an ID, on the one hand, to facilitate their own query, on the other hand, give this ID to the user, the user can use this ID directly to indicate their identity when they visit the next time without a user name and password

(b) First of all, is this ID secure? Is this ID more secure than sending user name and password directly?

It’s easy for you to think that the combination of user name and password is very complicated. You replace it with a new set of numbers. Isn’t it too unsafe?

We know that the HTTP protocol itself is completely unencrypted. If the user name and password are used, the first access is put in the HTTP header, and the password will be automatically saved in the cookie. All these are completely unencrypted, and its security is basically zero, which means that it runs naked. As long as it is stolen, it will be lost

Therefore, in this sense, the security of sessionid is no different from using a user name and password

But in fact, although HTTP itself can’t be encrypted, some software can encrypt you manually at the application level. For example, QQ will use account name password plus temporary verification code joint hash, sessionid plus a timestamp. Simple encryption is also a very common method

Moreover, because the sessionid itself has a validity period, even if it is lost, it may soon become invalid, and the loss may not be so great. If the user name and password are lost, it will be great

So the conclusion is:

Session ID without strict encryption is not as secure as user name and password

But by comparison, sessionid is more secure

And using HTTPS is completely secure

(c) Then, what are the benefits of using sessionid

It is convenient to query the corresponding session of the user directly according to the ID

When encrypting, the amount of computation is small

The security will not be reduced, or even higher

OK, by independently solving the problems of pure HTTP mechanism, we discuss the essence of cookie and session mechanism. Moreover, I think that the problem caused by [using HTTP protocol, the server will not save the state of the client] is solved by adding cookie and session mechanism. Does that mean that this [state] is closely related to cookie and session? So this stateless means that [no cache is set for this session to record the state of this session, including the server and the client] but it still doesn’t seem to break the key (mainly because it doesn’t match the previous official statements about state, or even has no corresponding relationship)

Suddenly I thought of a question: what is a stateful HTTP like?

1. It’s hard to directly imagine what a stateful HTTP looks like, because the mechanism of HTTP is naturally stateless

2. Let’s make an analogy. Another mechanism that is naturally stateful is TCP

If stateful means that each request is related, then stateful TCP looks like this: if a piece of data is sent in three TCP packets, the packet will be marked with the number of packets, the packet will be marked with the number of packets that are related, and what is the connection

3. But it seems that this stateful TCP has nothing to do with the stateful HTTP we want, because even if each HTTP request is related to each other, it can not solve the above mentioned problem of HTTP stateless

4. Well, wait a minute. It’s like an analogy:

(1) If every HTTP connection has a signature, then after the first successful login, the server will know that the signature is allowed to log in, and then all HTTP connections with the same signature can log in. Here, the same owner relationship between HTTP connections issued by the same user is used, and the problem of keeping the login status is solved

(2) In the same way, we try to solve the above problem by making use of this [every HTTP request is related to each other], [every operation must interact with the underlying database of the system], but after thinking for a long time, we really can’t go on

(3) But I have an idea. From another perspective, it seems that I have solved this problem

(a) Only the condition that each HTTP request is related to each other cannot solve the problem that every operation must interact with the underlying database of the system

(b) Because obviously, to solve the problem that every operation must interact with the underlying database of the system, a cache must be opened up on the server side

(c) However, if you think about how to implement [every HTTP request is related to each other], you will find that it also needs to open a cache on the server side

(d) So [open a cache on the server side] is the real condition, that is to say, it is really equivalent to [stateful]

(e) Moreover, I also found the corresponding point between the condition of [opening a cache on the server side] and the previous official statement of status, that is:

By opening up a cache on the server side to store, memorize and share some temporary data, you can:

Protocol has memory ability for transaction processing

Context relation for the same URL request

Each request is not independent, and its execution and results are directly related to the previous request and subsequent requests

Save the status of the client in the server [status]

(f) So, this state, plus the client also has cookies, which means that the data generated by the client and the server in the temporary session! As I said before, how important it is to use the cache to hold the data in the temporary session

Therefore, the status includes not only the relationship between different URL accesses, but also the data records of other URL accesses, as well as some other things. Therefore, to be more precise, the status should be the temporary data of the customer in [the cache space behind the implementation of these things]

Cookies and sessions should fully implement the stateful function

A common misunderstanding of state:

1. When someone explains HTTP statelessness, he opposes it to connection. That is to say, if you want to be stateless, you must have connection, but it is not

2. Connected and disconnected and keep alive are TCP connections

3. Stateful and stateless can refer to TCP or http

4. TCP has always been stateful and HTTP has always been stateless. However, in order to be stateful, applications add cookie and session mechanisms to HTTP, so that applications using HTTP can also be stateful, but HTTP is still stateless

5. At first, TCP had a connection, then TCP had no connection, and then, that is, now TCP is keep alive, a bit like a connection

Author: rowing Captain