Solve the problem of distributed session

Time:2021-9-17

session

Speaking of session, I believe that every programmer is no stranger and has used it more or less in the project. The word session is actually an abstract concept. It does not have a clear definition like cookie. When most programmers talk about session, they may refer to the session object where the server stores data. For example, after the user logs in successfully, the user information is stored in the session, which is similar to such a program

 Session["UserName"] = new User();

 public class User{
     public int UserId {get ;set ;}
     public string UserName {get ;set;}

 }

In computers, especially in network applications, session is defined as “session”, which can be regarded as a channel connection between the client and the server, and the request of the same user uses the same session. In most applications, it is mainly used for user identification. Generally speaking, the server can record the status information of each user through session. Let’s talk about the most commonly used server session object

Stand alone session

Session is stored on the server, which is a very important concept. This means that it needs to occupy the memory of the server, and it needs a release mechanism to ensure that the server memory will not be burst (such as LRU).

In the early stage of the project, in order to go online quickly, the server is deployed. In many cases, there is only one server to record the login status of users, and the session mechanism is widely used. Please don’t say that this is unreasonable. At least in the early stage of the project, this is the simplest and fastest solution. With the continuous iteration and upgrading of the project and the increasing number of users, you will find that the single machine system has become the biggest performance bottleneck of the project. At this time, most architects will choose the horizontal expansion scheme.

In fact, in the final analysis, the improvement of system performance revolves around a “sub” word. Whether it is the sub database and sub table of the database or the emerging micro service, it is always segmented around one field

When the session mechanism of a single machine is horizontally extended, it is faced with the problem that must be solved: how to solve the affinity (stickiness) of the session?

Distributed session

When a stand-alone system is expanded into a distributed system, it will face the choice of AP and CP in the distributed cap theory. For details, see the previous article:

Is the obscure cap completely correct?

When it comes to the consistency of distributed sessions, in fact, it is mainly to solve the affinity of user sessions. How can the request of the same user ensure to reach the server that correctly stores session information?

Session replication

The initial scheme is to use the session replication scheme. The overall process is very simple: suppose there are three servers. When a session is created on one of the servers, copy the session to the other two servers at the same time. In this way, no matter which server the user’s request reaches, there will be corresponding session data.

The advantage of this scheme is that the server can be expanded at any level, and each server retains all session information. When joining a server, you only need to copy all session information. But the disadvantages are more obvious

  • All session information is saved on each server, which greatly increases the resources occupied by the server.
  • Session synchronization needs to occupy network bandwidth. The most important thing is that if asynchronous replication is adopted, the data will be inconsistent temporarily, which may lead to user access failure.

Few people use the session replication scheme now

Load balancing scheme

When a server is expanded to multiple servers, the most common solution is to add a load balancer at the traffic entrance. The general deployment diagram is as follows

Solve the problem of distributed session

If the load balancer can use some means to realize the stickiness of sessions, it can realize distributed sessions. At present, the mainstream nginx can fix the request of the same IP to a server according to the “hash_ip” algorithm, so that the session request from the same IP always requests to the same server.

This method is much better than the session synchronization method. Each server only stores the corresponding session data, which greatly saves memory resources, and there is no data synchronization process between servers. When a new server is added, you only need to modify the configuration of the load balancer, which is convenient to support the horizontal expansion of the server. However, it also faces some deficiencies

  • Server restart means that the corresponding session information is lost, which is not allowed in some important business scenarios
  • The horizontal expansion of the server needs to modify the configuration of the load balancer. After modification, the previous sessions may be redistributed, which will lead to some users not routing the correct sessions
Session stripping

Now the more widely used distributed session technology is to completely separate the session data from the business server and store it separately in other external devices, which can adopt the master-slave or master-slave or even cluster mode to achieve high availability. For example, the most commonly used solution is to store session data in redis. Although reading and writing session data from redis takes a certain amount of network time, it is acceptable for general applications.

The advantage of this scheme is that the overall architecture is clearer and more flexible. The overall expansion ability of the application server no longer needs to consider the impact of session. The problem of session is transferred to external devices. Generally, memory NoSQL can be used to solve performance problems, and these external devices generally have corresponding distributed clustering schemes, such as redis, The master-slave or sentinel mode or even cluster can be used to provide larger scale data support capability.

Solve the problem of distributed session

Actor model

Few people will mention the actor model. I introduced the actor model in the previous article. You can take a look

The distributed high parallel distribution actor model is so excellent

The actor model is more elegant in solving this problem of user stickiness. It is naturally equipped with an object recognition function. In short, requests from the same key can always reach the correct actor instance. Isn’t this the result we want? Moreover, in the actor model, concurrency can be handled without locking. Why doesn’t anyone use it? Moreover, the acotr model can use the form of in-process cache, which is much lower than the network delay of requesting LAN redis.

Write at the end

Of course, if you are only aiming at the application scenario of user login, the session solution is not the only solution. You can refer to the article before dishes

Programmers pass the test — a more elegant token authentication method JWT

Write at the end

There are many solutions to each problem. There is no perfect solution, only the one most suitable for the business scenario. Recognizing the essence of technology is the shortcut for us to improve our skills. Limited ability, unlimited technology, welcome criticism and correction!

More wonderful articles

Solve the problem of distributed session

Recommended Today

Supervisor

Supervisor [note] Supervisor – H view supervisor command help Supervisorctl – H view supervisorctl command help Supervisorctl help view the action command of supervisorctl Supervisorctl help any action to view the use of this action 1. Introduction Supervisor is a process control system. Generally speaking, it can monitor your process. If the process exits abnormally, […]