“And the interviewer series” stand alone is no high availability


What is high availability

High availability refers to the availability of the system. 100% availability. For example, if all the computer rooms where your web services are deployed are out of power, the system can no longer provide services. In general, we only need to achieve four nines, as shown in the following figure:

High availability implementation routines

The nodes in a high availability cluster are usually one primary and one standby, or one primary and multiple standby. The availability of the whole system can be improved by backup.
Load balancing cluster is generally multi master, and each node shares the traffic.

load balancing

A single machine corresponds to multiple machines under load balancing.

Multiple machines can share the traffic coming in from the service. The problem to be solved is not to make one machine unavailable, which will lead to the end or unavailability of Web services. Multiple machines often form a cluster to handle all the load.
Related software: haproxy, LVS, nginx, these software provide cluster management, is the gateway of cluster

High availability

High availability refers to the high availability of the whole system, and also refers to the redundant takeover of the host. As far as possible, measures should be taken to reduce the interruption time of system services, so as to improve the ability of business processes to provide services continuously.

Relationship between load balancing and high availability

A single load balancer is located at the front end of the network. It plays a role of diverting customer requests, which is equivalent to the entrance of the whole website or system. If it fails, the website will also fail. Therefore, there is a scheme that can take over the collapsed load balancer in a short time, which can also be called high availability. As for the Web Cluster and database cluster on the back end of the load balancer, because of the internal mechanism of the load balancer, even if one or two of them have problems, it will not affect the use of the real system.

The hardware of load balancer is F5 BIG-IP, and the software is LVS, nginx and ha proxy. High availability software includes hearteat and keepalived. The mature Linux Cluster architectures include LVX + keepalived, nginx + keepalived and DRBD + heartbeat.

Common high availability architectures

Most people understand that high availability means that a single machine will not hang up if the whole service fails. Therefore, when writing code, use the idea of cluster to write code, such as making stateless service, to ensure that there is no state when cluster is used, and single machine failure does not affect the service, so as to achieve the effect of high availability.

First of all, there is no problem with this architecture mode itself, and it is really good. There are service discovery, cluster, single machine hung up and other machines available. It is widely used in search system, recommendation system, advertising system, website background system.

The information received by many people is based on the architecture shown in the figure above, so the system becomes a highly available system.

Therefore, there is no high availability for a single machine?

Blindly pursuing cluster architecture, divorced from the actual business scenarios and team configuration, the architecture design is excessive design

But in fact, the above system mainly solves the following problems.

  • Data synchronization, the public configuration of this small amount of ignored data synchronization between the various machines.
  • Service discovery: after adding or reducing machines, other machines can perceive that new nodes are added or old nodes are offline.

High availability is not something that can be solved by an architecture model. A high availability system can be solved by code level, not by several open source modules.

Some people always think that high availability systems have silver bullet. They see various architectures in various forums and conferences, and basically use some mature open-source software. So they think that with these, they can be a highly available system. I have zookeeper, then the service single machine hangs up, and the service runs as usual, but in fact, it doesn’t work. What zookeeper solves is external uncontrollable factors For example, the hard disk of the machine is broken and the network is broken. The service caused by this factor is hung up, which can be solved by zookeeper. The problem of your code causes the machine to hang up, and 1000 machines under zookeeper can’t be solved. In general, it’s still hung up all at once.

For example, in a distributed search system, the index is partitioned, so there is a cluster with 50 machines, and each partition has about 10 machines, and the machines can be increased or decreased dynamically. The cluster is managed by zookeeper. Is this a highly available system? This is the high availability architecture of a standard search system. It can only be said that, under the premise of excellent code, the system is highly available. Network problems and machine hardware problems are more difficult to hook up the whole cluster. However, once there is a small bug in the code, or there is a problem in the generation of index data, generally, the cluster will be completely hung up, and how to make it highly available.

There is no silver bullet in high availability. What you see, hear and learn about high availability architectures everywhere, they will only tell you how powerful the system architecture is. They will frame some modules with several frames, and then tell you that the services in this frame can adapt to various emergencies. When the traffic peak comes, the linear plus machine can solve the problem. However, for you, it is impossible for you They didn’t tell you how powerful their code is, and only under this premise can they be highly available. If you want to build a highly available system purely by a few boxes, it’s ppt architects.

The real high availability does not need to tangle the architecture design, only needs the code to be robust, the robust code plus the primary and secondary system design, does not need other, basically is a high availability system, the bank’s core data processing center plus the remote disaster recovery is like this, do you dare to say that it is not high availability?

Therefore, only by writing good code can we achieve high availability. Learning architecture is more just a supplement to improving the overall understanding of the system. There is no highly available architecture, only highly available code exists.

Where is the root of high availability

Excellent code is the cornerstone of all high availability architecture. Excellent code plus reasonable architecture is a high availability architecture. A high availability architecture is not achieved by building blocks of open source software. Mature open source software solves the problem of making part of the code you should have written better.

At the financial level, instead of the so-called high availability architecture of multiple servers, it is better to spend the money on buying faster read-write hard disk, larger memory and core CPU, server bandwidth, CDN acceleration service, supplemented by certain operation and maintenance tools, to ensure the high availability of services.

At the code level, there should be no low-level null pointer problems, memory overflow caused by improper use of the framework, and interface response timeout. Before code delivery, targeted performance tests (concurrency, pressure, and execution efficiency tests) should be conducted. Excellent code is the foundation of high availability.

At the architecture level, dynamic and static are separated. Message queuing asynchronously digests time-consuming and memory consuming tasks. Redis caches prevent requests from directly hitting the DB. At the same time, it handles cache penetration, breakdown, and avalanche defense schemes. If you think that the memory of a single redis machine is not large enough, you can use redis 2.0 to open the VM function and break through the limitation of physical memory. Redis can keep hot data in memory by itself.

In the aspect of operation and maintenance, the supervisor is used to guard PHP FPM, MySQL and other processes, so as to give early warning of CPU, hard disk space and other indicators for server and database services. Moreover, the security defense of single machine is relatively good.


An introduction book on architecture written for PHPer can also be read through wechat code scanning.

Recommended Today

What are the new methods of visual + map technology?

Last week, Ren Xiaofeng, chief scientist of Alibaba Gaode map, made a technical exchange with you on the development of computer vision related technology and the application in the field of map travel at the online live broadcast activity of “cloud dialogue” between senior students of Alibaba. The interaction between live broadcast is hot. Especially […]