1、 Split by business dimension
For example, a system may include user, commodity and order business. Because these three dimensions are unbalanced in access and data reading and writing, in order to avoid mutual influence and improve performance, it can be divided into user system, commodity system and order system according to business dimension.
2、 Data archiving
For some data that only needs to be retained and does not involve reading and writing, it can be considered to archive them. Like the access log of a certain time ago
3、 Read write separation
For a system, when it develops to a certain stage, the database will inevitably become a bottleneck. You can first consider realizing the separation of reading and writing to reduce the pressure on the main database. There are roughly two ways to realize it:
- Middleware routing. Between the application and the database, the middleware forwards the request to the read library or write library. Related middleware include amoeba, MySQL proxy, maxscale, MYCAT, etc.
- Internal application routing. The application is directly connected to the database, and the application determines whether to use the read library or write library. Related technologies include spring dynamic data source, sharding JDBC, etc.
The advantage of using middleware is that it is transparent to the business and does not need to modify the code. The disadvantage is that it lengthens the call link, increases the fault point and reduces the performance; Routing within an application is the opposite.
In addition, the following problems need to be considered to realize the separation of reading and writing:
- Failover issues. Related technologies include MHA, keepalive, MYCAT, etc.
- If the application is used for routing, the read-write data source needs to be configured in the application.
- The problem of master-slave delay. To solve this problem, on the one hand, we can see whether we can avoid it from the business. If we can’t avoid it, we will connect the relevant read services to the main database. On the other hand, if the read performance bottleneck is large, you can directly consider using cache instead and using cache fragmentation to deal with the problem of large amount of data.
4、 Sub table and sub database
When the amount of data reaches a certain level, the reading and writing performance of the database will decline, not to mention those more complex queries. Generally speaking, even if the amount of single table data reaches tens of millions, it is a large amount, so split storage needs to be considered. As far as a single database is concerned, it has developed to 2000, which has a very good performance. If it is larger, it will not be able to avoid sub database. To put it simply:If there is a large amount of data, it is divided into tables; If the concurrency is high, the database will be divided。
4.1 slicing strategy
- Range slicing
Excellent: when adding or reducing fragmentation, it basically does not involve data migration and has strong scalability
Bad: prone to hot data
- Hash slice
Excellent: uniform data distribution
Bad: adding or reducing fragmentation involves data migration, which is unfavorable for expansion
- Look up table method
Excellent: the mapping algorithm can be formulated flexibly with strong expansibility
Bad: the mapping table may become a hotspot table, and caching can be considered
4.2 issues to be considered in slicing
- Selection of sharding key
How to determine the partition key needs to be determined according to the business. The partition key can be stored redundantly with other common fields in multiple dimensions.
- Distributed transaction
- Distributed primary key ID
You can consider using global generation of primary key table, snowflake algorithm, etc.
- Cross database join, page turning, etc
The common practice is to store the full amount of data into es or use hive; Data redundancy; With the help of middleware; Whether it can be restricted from business.
5、 Smooth switch to new library
- Double write, add a switch in the process of writing and reading at the same time, and carry out gray operation within a certain range.
- The old data can be synchronized through the program, and the binlog can also be monitored.
- Data comparison is to confirm whether the data of new and old databases are consistent.
This article is well written. It deeply analyzes the advantages and disadvantages of various database and table algorithms, and especially puts forward common wrong practices:Sub database and sub table