History of database development
Representatives: Oracle (commercial), MySQL (open source).
Relational database (RDBMS) is the most widely used database management system. It can not only store data, but also perform complex data operations (left join, subquery, etc.).
He has the following shortcomings:
- Cannot process unstructured data.
- In essence, it is a stand-alone system, which is difficult to meet the needs of massive data, so it is necessary to do sub database and sub table.
- The scalability is relatively low.
NoSQL = not only SQL, distributed and non relational database. Compared with RDBMS, its main advantages are as follows:
- Can handle unstructured data
- Low latency read / write speed
- High performance and scalability (at the expense of acid transactions)
NoSQL data can be roughly divided into four types:
- Key value database, such as redis, memcached, etc
- Document database, such as mongodb, etc
- Column storage database, such as HBase, Cassandra, etc
- Figure database: neo4j et al.
NoSQL can solve some problems of RDBMS, but it can’t completely replace RDBMS. For example, it can’t query data with SQL, so RDBMS still can’t meet the performance problem. Although it can be divided into databases and tables, it still has certain complexity.
Newsql = RDBMS + NoSQL, distributed relational database, which not only meets linear scalability, but also can handle distributed transactions, such as tidb.
RDBMS architecture evolution
All services are stored in a database, and they are master-slave (synchronized through binlog). Third party applications, such as MYCAT and shardingsphere, can be used for the separation of read and write in the application layer, which also support sub database and sub table.
Sub database, to separate goods, users, orders, reduce IO competition, and support more connections.
For example, split four tables and take the module for four. The partition key of the user table is userid, which is userid% 4. For example, if Id is 1, read and write in user1; if Id is 6, read and write in user2.
How to select the partition key?
- According to the user’s city selection, this will cause more data in a certain database
- According to the registration time, it will be uneven
- To select random, so that hash, each library is more uniform.
In the above example, only one field is partition key. If there are multiple fields, for example, you can query by ID or uname. What can I do? If there is only one partition key with ID, then when querying name, only the whole database can be searched, so the performance is very low.
Genetic method (hypothetical model 16)
- Last 4 digits of username
- ID generation 60 bits
- Add the last four digits of username after ID
When searching in this way, whether it is searching for ID or username, you can directly search according to the last four digits.
The ID above is generated according to the user name. If the ID is fixed and does not depend on the user name, what should I do?
- Set up a mapping table of username and ID (this must be 1-to-1. If it is 1-to-many, you still have to go to multiple databases to find it, which is still very costly)
- When querying usename, first go to the mapping table to find the corresponding relationship and get the ID
- Through the way of ID module, go to the corresponding library to find
The problem of sub database and sub table
- After sub database and sub table, SQL aggregation, join and sub query are limited.
- The cost of dividing tables again is high. It has been divided into four tables, and eight tables?
- The cost of recovery is high, so it is necessary to recover multiple split libraries and tables
- The operation and maintenance cost is high, which requires multiple split libraries and tables to maintain
- How to solve data consistency?