1. Why do you need high availability?
High availability ha (high availability) is to solve single machine failure.
There is no service or machine in the world that is absolutely safe and reliable, but we can achieve high availability of the system through certain schemes, so as to reduce the unacceptable risk of destruction of the whole system caused by single system downtime to a certain extent. This is the pain point to be solved by high availability, and the distributed system is also to solve this problem.
Of course, a higher level of high availability should consider concurrent access, so it should also be load balanced.
Database is no exception. Of course, it needs to be highly available. We assume that the system can always provide services, so we say that the availability of the system is 100%. If the system runs every 100 time units, one time unit will not be able to provide services. We say that the availability of the system is 99%. The high availability target of many companies is 4 9s, or 99.99%, which means that the annual downtime of the system is 8.76 hours.
2. Database middleware
Database middleware is a kind of computer software that connects applications and databases to facilitate the communication between software components. It is similar to Tomcat and other web middleware, but the two sides of the connection are different.
The original Java application directly connects to the database with the following disadvantages:
1. Java program and database are closely coupled
2. High access and concurrency put great pressure on the database
3. Read and write not separated
Therefore, the middleware looks like a logical database, and the real physical data is hidden from the outside. This idea is the same as nginx.
2.2 classification and comparison, and the key introduction of MYCAT
Cobar: it belongs to Alibaba B2B business group. It began in 2008 and has served in Alibaba for many years. Took over the schema cluster log processing of 3000 + MySQL database, and the online sql requests were more than 5 billion. Due to the resignation of Cobar initiator, the maintenance was stopped.
MYCAT: the secondary development of the open source community based on Alibaba Cobar. And add new features.
Oneproxy: it gives MySQL the official idea of proxy. The performance is very good, but it is charged.
Kingshared: Based on go language, it needs to continue to develop and open source.
Of course, in addition, there are many middleware, such as Vites, Atlas, maxscale and so on.
It can be seen intuitively that MYCAT should be the most suitable middleware at present, with high community activity and open source.
The functions of MYCAT include: read-write separation, load balancing, sub database and sub table (data fragmentation), and multi data source integration.
Principle of MYCAT: one of the most important concepts in the principle of MYCAT is “interception”. It intercepts the SQL statements sent by the user. First, it makes some specific analysis on the SQL statements, such as fragment analysis, routing analysis, read-write separation analysis, cache analysis, etc., then sends the SQL to the real database at the back end, processes the returned results appropriately, and finally returns it to the user.
3. MySQL high availability Mainstream Architecture
3.1 dual high availability
One machine a is used as the read-write library and the other machine B is used as the backup library; After library a fails, Library B acts as a read-write library; After the recovery of warehouse A, warehouse a will be used as a standby warehouse.
In this case, the virtual IP address can be used for the database IP address in the data source configuration. The virtual IP address is configured by keepalive on the two database machines, and heartbeat is detected from each other. When one fails, the virtual IP address will automatically drift to the other normal library. DBA and operation and maintenance personnel are required to maintain the active and standby configuration, troubleshooting and data completion of the database. The program code or configuration does not need to be modified.
Applicable scenarios and advantages and disadvantages:
In the scenario where both reading and writing are not high (single table data is less than 5 million), dual computers are highly available. The advantage is that when a machine fails, it can switch automatically; The disadvantage is that there is only one library working, reading and writing are not separated, and concurrency is limited.
3.2. Master slave read-write separation
One machine a is used as the write library and the other machine B is used as the read library; After library a fails, Library B acts as a read-write library. After library a is repaired, Library B is a write library and library a is a read library.
The implementation of this scheme should be realized with the help of database middleware MYCAT. In the project development, the MYCAT data source should be configured and the data operation of MYCAT data source should be realized. Database a and database B should be master-slave to each other. The main configuration, troubleshooting and data completion of the database still need to be maintained by DBA and operation and maintenance personnel.
Usage scenarios and advantages and disadvantages：
The scenario where reading and writing are not very high (single table data is less than 10 million) is highly available. The concurrency is much higher than that of scheme 1. The advantage is that when a machine fails, it can switch automatically; Read write separation and concurrency have been greatly improved. The disadvantage is that one MYCAT node is introduced. To be highly available, at least two MYCAT nodes need to be introduced. The conventional solution is to introduce haproxy and keepalive to cluster MYCAT.
3.3. One master and multiple slaves + read-write separation
One master write library a has multiple slave libraries. When master library a fails, promote slave Library B to be the master write library, and modify libraries C and D to be slave libraries of B. After a fault is repaired, it will be used as the slave Library of B.
MYCAT needs to be used as the middleware in the project development. After the main library a fails, MYCAT will automatically promote from B to write library. For C and D slave libraries, the master library can be automatically changed to B through MHA and other tools. Then realize the destination of automatic switching.
MHA manager can be deployed on a separate machine to manage multiple master slave clusters, or on a slave node. The MHA node runs on each MySQL server. The MHA manager will regularly detect the master node in the cluster. When the master fails, it can automatically promote the slave of the latest data to the new master, and then point all other slave to the new master again. The entire failover process is completely transparent to the application.
Usage scenarios and advantages and disadvantages：
This architecture is suitable for scenarios with small write concurrency but large read concurrency. Because multiple read nodes are configured, the ability of read concurrency has been qualitatively improved. Theoretically, there can be multiple read nodes, which can load high-level read concurrency. Of course, MYCAT still needs to design high availability solutions.
3.4、MariaDB Galera Cluster
Multiple databases can be written and read simultaneously under the action of load balancing; Galera replication is used for data synchronization among libraries, that is, theoretically, the data of each library is completely consistent.
During database reading and writing, you only need to modify the virtual node whose database reading and writing IP is keepalive; The database configuration is relatively complex, and various plug-ins and configurations such as haproxy, keepalive and galaera need to be introduced.
Usage scenarios and advantages and disadvantages:
This scheme is suitable for scenarios with large read-write concurrency and not very large amount of data.
1) It can be read on any node
2) Automatically eliminate fault nodes
3) Automatically add new nodes
4) True parallel replication, based on row level
5) The client connection is consistent with the experience of the operation order database
6) Synchronous replication, so it has high performance and reliability.
1) Delete does not support tables without a primary key. Tables without a primary key will have different orders at different nodes
2) When processing a transaction, a coordination authentication program will be run to ensure the global consistency of the transaction. If the transaction runs for a long time, all relevant tables in the node will be locked, resulting in stuck insertion (this situation is the same as that of single table insertion)
3) The write throughput of the whole cluster is limited by the weakest node. If one node becomes slow, the whole cluster will be slow. In order to meet the requirements of stable high performance, all nodes should use unified hardware
4) If there is a problem with the DDL statement, it will destroy the cluster. It is recommended to disable it
5) Mysql database version 5.7.6 and later supports this scheme
3.5 database middleware
Using MYCAT for slice storage can solve the problems of write load balancing and excessive amount of data; Each partition is configured with multiple read slave libraries, which can reduce the reading pressure of a single library.
In this case, you need to configure haproxy, keepalive and MYCAT clusters, and you need to configure a master-slave cluster on each partition. For the complete configuration of each partition, please refer to scheme 3. Scheme 3 can be simply understood as a partition structure. Therefore, the amount of configuration and maintenance is relatively large.
Scenario and advantages and disadvantages:
Scenarios with large read-write concurrency and a very large amount of data.
Advantages: the ultimate solution to high concurrency and high data volume.
Disadvantages: configuration and maintenance are troublesome and require large software and hardware equipment resources.
Most high availability architectures require master-slave replication because it is the best way to ensure multi node data consistency.
4. Master-slave replication
Based on the above architecture, we can see the following scenarios: in fact, we all need to make master-slave backup:
1. In a system with complex business, there is a scenario where an SQL statement needs to lock the table, resulting in the temporary inability to use the read service, which will greatly affect the running business. Use master-slave replication, and let the master database be responsible for writing and the slave database be responsible for reading. In this way, even if the master database locks the table, the normal operation of the business can be guaranteed by reading the slave database.
2. Make hot standby for data, and replace the main database in time after the main database goes down, so as to ensure business availability.
3. Extension of architecture. The business volume is increasing, and the I / O access frequency is too high, which can not be met by a single machine. At this time, do multi library storage, reduce the frequency of disk I / O access, and improve the I / O performance of a single machine.
MySQL master-slave replication is an asynchronous replication process. The master database sends update events to the slave database, reads the update records from the database, and executes the update records to keep the contents of the slave database consistent with the master database. The process is roughly as follows:
1. The update events (update, insert, delete) of the main database DB are written to binlog
2. The master database creates a binlog dump thread and sends the contents of binlog to the slave database
3. Start and initiate a connection from the library and connect to the main library
4. After starting from the library, create an I / O thread, read the binlog from the main library and write it to the relay log
5. After starting from the library, create an SQL thread to read the content from the relay log and from exec_ Master_ Log_ The POS position starts to execute the read update event and writes the update content to the slave
4.2.1 reasons for delay
1. Mysql database master-slave synchronization delay principle MySQL master-slave synchronization principle
The master database writes binlog in sequence for write operations, reads “binlog of write operations” in sequence from the single thread of the database to the master database, and fetches binlog from the database and executes it locally as is (random write), so as to ensure the logical consistency of master-slave data. The master-slave replication of MySQL is a single thread operation. The master database generates binlog for all DDL and DML. Binlog is written in sequence, so it is very efficient. The slave is the slave of the slave_ IO_ The running thread fetches logs from the main database, which is quite efficient. Next, the problem comes, the slave of the slave_ SQL_ The running thread implements the DDL and DML operations of the main library in the slave. The IO operations of DML and DDL are random, not sequential, and the cost is much higher. It may also generate lock contention for other queries on the slave_ SQL_ Running is also single threaded, so a DDL card owner needs to execute for 10 minutes. Then all subsequent DDLS will wait until the DDL is executed, which leads to a delay. A friend will ask, “the same DDL on the main database also needs to be executed for 10 points. Why does the slave delay?”, The answer is that master can be concurrent, slave_ SQL_ The running thread cannot.
2. How does MySQL database master-slave synchronization delay occur?
When the TPS concurrency of the main database is high and the number of DDLS generated exceeds the range that a slave SQL thread can bear, the delay will occur. Of course, there may be lock waiting with the large query statements of the slave. Primary reason: the database is under too much read-write pressure in business, the CPU computing load is large, the network card load is large, and the random io of the hard disk is too high. Secondary reasons: the performance impact of reading and writing binlog and the delay of network transmission.
3. How do I check whether there is a delay?
First, execute show slave satus on the server; You can see many synchronization parameters:
Master_ Log_ File: the name of the primary server binary log file currently being read by the I / O thread in slave
Read_ Master_ Log_ Pos: in the binary log of the current master server, the position read by the I / O thread in the slave
Relay_ Log_ File: the name of the relay log file that the SQL thread is currently reading and executing
Relay_ Log_ Pos: in the current relay log, the location where the SQL thread has read and executed
Relay_ Master_ Log_ File: the name of the primary server binary log file containing most recent events executed by the SQL thread
Slave_ IO_ Running: whether the I / O thread is started and successfully connected to the master server
Slave_ SQL_ Running: whether the SQL thread is started
Seconds_ Behind_ Master: the time gap between the slave server SQL thread and the slave server I / O thread, in seconds
4. Synchronization delay from the library
● show slave status displays the parameter seconds_ Behind_ If master is not 0, this value may be very large
● show slave status displays the parameter relay_ Master_ Log_ File and master_ Log_ The file shows that the bin log numbers differ greatly, indicating that the bin log is not synchronized in time on the slave library, so the recently executed bin log is very different from the bin log read by the Current IO thread
● there are a large number of MySQL relay log logs in the MySQL slave database data directory, which will be automatically deleted by the system after synchronization. There are a large number of logs, indicating that the master-slave synchronization delay is very serious
1. Semi synchronous replication
From mysql5 Since 5, MySQL has supported semi synchronous replication. Semi synchronous replication is between asynchronous replication and synchronous replication. The main database does not return the results to the client immediately after executing the transaction. It needs to wait for at least one slave database to receive and write to the relay log before returning the results to the client. Compared with asynchronous replication, semi synchronous replication improves the security of data. At the same time, it also causes a time-consuming delay of TCP / IP round-trip.
2. Configure sync for main database_ binlog=1，innodb_ flush_ log_ at_ trx_ commit=1
sync_ The default value of binlog is 0. MySQL will not synchronize binlog to the disk. Its value indicates that the disk is synchronized every time how many binlogs are written.
innodb_ flush_ log_ at_ trx_ A commit of 1 means that the log needs to be flushed to disk every time a transaction is committed or an instruction outside the transaction.
Note: when the above two values are set to 1 at the same time, the write performance will be limited to a certain extent. It is only recommended for scenarios with high data security requirements, such as order payment business involving money, and the system I / O capability must be supported!
1. Optimize the network
2. Upgrade slave hardware configuration
3. Slave adjusts parameters, closes binlog and modifies InnoDB_ flush_ log_ at_ trx_ Commit parameter value
4. Upgrade MySQL version to 5.7 and use parallel replication
Youku’s solution: database fragmentation technology, which abandons the problem of replication delay caused by the increasing amount of data. By user_ ID is partitioned. In this way, there must be a global table to manage the relationship between users and Shards_ ID can get share_ ID, and then according to share_ ID to query the specified data in the specified partition
Taobao’s solution: modify the source code. The corresponding mechanism is the transfer mechanism. Here, the binlog log redo is implemented by multithreading, so as to improve the qpps of the slave
How to ensure the data consistency between MySQL and redis?
In the case of a single thread, deleting the cache and then updating the database can achieve the consistency between redis and MYSQL, but high concurrency. When there are the following situations:
Inconsistent scenario I: thread a needs to update id = 1, column1 = 3. At this time, the database is 2. When a deletes the cache and has not had time to update the database, thread B queries the value of column1 with id = 3 and finds that the database is 2. Then it returns successfully and updates the cache (because the cache has been deleted by a, it needs to be updated). At this time, a completes the update of the data and the value of the database becomes 3. The cache is still 2, and then thread C prepares to query, it will find the dirty data 2 in the cache instead of the real value 3. This is the scenario caused by the inconsistency between redis and the database.
Inconsistent scenario 2: inconsistent double writing
There may be more complex scenarios, but the reasons for them are the same. While the caching system bears the pressure of dispersing the database, it is inevitable to have this lag, because the traditional single database operation does not have this risk. The fundamental reason is that the original simple operation that only needs to maintain one system has become the separation operation of two systems.
The ideas to solve this problem are as follows:
1. Thread blocking is carried out by using special values. When a is ready to update the database, first modify the value in the cache to – 9999, and then when B queries, it is found that if it is the agreed special value, it will enter sleep until a completes the operation to update the cache. The essence of this idea is serialization,
2. Or all update operations can be serialized in Java code.
3. Classic delayed double deletion: if data update operations are frequent, there will still be dirty data problems.
Read requests and write requests are serialized and concatenated into a memory queue, so as to ensure that there will be no inconsistency. Therefore, if your system does not strictly require that the cache + database must be consistent, the cache can be slightly inconsistent with the database occasionally. It is best not to do this scheme. After serialization, the throughput of the system will be greatly reduced, and several times more machines will be used to support a request on the line.