[mysql] problems to be considered in database and table classification

Time:2022-7-31

With the growth of business, general companies will go through a process from single database and single table to sub database and sub table. The following factors need to be considered to determine whether to start sub database and sub table

1. If the QPS of a MySQL single database exceeds 1000, it is necessary to consider database distribution, which is generally based on business

At present, SinaNet’s main repository is SinaNet’s various auxiliary repositories userservice customer service system sinastore file repository entsales sales sales system repository

 

2. When the data volume of a single table is very large, you need to consider the sub table. If it exceeds 10million, you need to consider it, because at this time, the height of the b+ tree index is about 3-5

If a single field is particularly large, it should be separated. This is the vertical split table, which follows the hot and cold split and size split

This is basically considered at the time of design. Generally, this situation will not occur

 

If there is a large amount of data, it is necessary to select an appropriate splitting algorithm in combination with business needs and product characteristics

How to segment?
a: Hash modulo algorithm hash (ID) /num,
The ID of this table is database auto_ Incr ID, hash split data distribution is particularly uniform, but the subsequent num setting has no empirical value and can only be calculated manually; max_ row/day_ Incr= year, which is guaranteed to carry the data increment in the past four years.
Considering the subsequent expansion of table data, data migration will be more difficult.

The user table of sina mailbox is split according to the hash module of the default domain mailbox

b: Consistent hash algorithm
In order to ensure that the impact of subsequent migration data is small, it is recommended to use a consistent hash algorithm.

The order table of sina mailbox selects the storage table node according to the consistency hash algorithm and the range size of different values

c:range(timestamp)
It has a natural time field, which is very easy to split and has good scalability.
At present, queries are all timestamp, so there will be uneven access to tables. But it also avoids cross node join and other problems

The log table of sina email users is divided into 1024 tables based on month and hash

 

How to migrate data?

This is an inevitable problem. You can use real-time data double writing and script import for historical data. After online data alignment, slowly pour the traffic into the new DB.

Recommended Today

Ansible combat MySQL installation

Do the preparatory work first, prepare a client, and then equip a master machine. Check connectivity: [[email protected] ~]# ansible mytest -m ping 10.128.25.175| SUCCESS => { “ansible_facts”: { “discovered_interpreter_python”: “/usr/bin/python” }, “changed”: false, “ping”: “pong” } [[email protected] ~]# 1. Ansible module search method 1. Ansible’s yum module search yum module online documentation: https://docs.ansible.com/ansible/latest/collections/ansible/builtin/yum_module.html Example: – […]