This is a little strong, MySQL commonly used optimization guide and large table optimization ideas (worth collecting)

Time:2020-11-16

When the number of records in MySQL single table is too large, the performance of adding, deleting, modifying and querying will decrease sharply

Single table optimization

Unless the data of a single table will continue to rise in the future, do not consider splitting at the beginning. Splitting will bring all kinds of complexity in logic, deployment, and operation and maintenance. Generally, tables with integer values are less than 10 million levels, and tables with string based values below 5 million are not too problematic.

In fact, there is still a lot of optimization space for the performance of MySQL single table, and even can support more than ten million levels of data.

field

  • Try to use tinyint, smallint and medium_ Int is an integer type, not int, and if it is nonnegative, then it is signed
  • The length of the varchar allocates only the space that is really needed
  • Use enumerations or integers instead of string types
  • Try to use timestamp instead of datetime
  • Do not have too many fields in a single table. It is recommended to be within 20
  • Avoid the use of null fields, which is difficult to query optimization and takes up extra index space
  • Use integers to store IP

Indexes

  • The more indexes are, the better. To create them according to the query, consider where and order by
  • The columns involved in the command are indexed. You can check whether index or full table scanning is used according to explain
  • We should try our best to avoid judging the null value of the field in the where clause, otherwise the engine will give up using the index and scan the whole table
  • Fields with very few values are not suitable for indexing, such as “gender”, which has only two or three values
  • Character fields are indexed by prefixes only
  • It is better not to use character fields as primary keys
  • No foreign key, program guarantee constraint
  • Try not to use unique, by the program guarantee constraints
  • When using multi column index, the order and query condition should be consistent, and unnecessary single column index should be deleted

Query SQL

  • Slow SQL can be found by opening the slow query log
  • Do not perform column operation: select id where age + 1 = 10, any operation on the column will lead to table scanning, including database tutorial functions, calculation expressions, etc. when querying, you should move the operation to the right of the equal sign as much as possible
  • SQL statements should be as simple as possible: one SQL can only be operated in one CPU; large statements can be disassembled into small statements to reduce lock time; a large SQL statement can block the entire database
  • Do not select*
  • Rewrite or into in: the efficiency of or is n level, the efficiency of in is log (n), and the number of in is recommended to be controlled within 200
  • Instead of functions and triggers, it is implemented in the application program
  • Avoid queries like% XXX
  • Use less join
  • Use the same type for comparison, such as’ 123 ‘and’ 123 ‘, 123 and 123
  • Try to avoid using the! = or < > operator in the where clause, otherwise the engine will abandon the use of index and scan the whole table
  • For consecutive values, use between instead of in: select id from t where num between 1 and 5
  • Do not take the whole table for list data, use limit to paginate, and the number of each page should not be too large

engine

At present, MyISAM and InnoDB are widely used

MyISAM

MyISAM engine is the default engine of MySQL 5.1 and before. Its features are as follows:

  • Row locking is not supported. All tables that need to be read are locked when reading, and exclusive locks are added to tables when writing
  • Transaction is not supported
  • Foreign keys are not supported
  • Security recovery after crash is not supported
  • When the table has read queries, it supports inserting new records into the table
  • Support blob and text first 500 character index, support full-text index
  • Supports delayed index update, greatly improving write performance
  • For tables that will not be modified, it supports compressing tables, which greatly reduces disk space consumption

InnoDB

InnoDB becomes the default index after MySQL 5.5. Its features are as follows:

  • Support row lock and mvcc to support high concurrency
  • Support transaction
  • Foreign key support
  • Support security recovery after crash
  • Full text indexing is not supported (version 5.6.4 is gradually supported)

In general, MyISAM is suitable for select intensive tables, while InnoDB is suitable for insert and update intensive tables. Summary of 100 interview questions

System tuning parameters

You can use the following tools for benchmarking:

  • Sysbench: a modular, cross platform and multi thread performance testing tool
  • Iibench MySQL: performance testing tool for inserting MySQL / percona / MariaDB index based on Java
  • TPCC MySQL: a TPC-C testing tool developed by percona

There are many specific tuning parameters. For details, please refer to the official documents. Here are some important parameters:

  • Backlog: the backlog value indicates how many requests can be stored in the stack for a short period of time before MySQL temporarily stops answering new requests. That is to say, if the connection data of MySQL reaches maxconnections, the new requests will be stored in the stack to wait for a connection to release resources. The number of the stack is known as the backlog. If the number of waiting connections exceeds back_ Log, will not be granted connection resources. You can go from the default 50 to 500
  • wait_ Timeout: idle time of database connection. Idle connection will occupy memory resources. It can be reduced from the default 8 hours to half an hour
  • Maxuserconnection: the maximum number of connections. The default value is 0. There is no upper limit. It is better to set a reasonable upper limit
  • thread_ Concurrency: the number of concurrent threads, set to twice the number of CPU cores
  • Skipnameresolve: prohibit DNS resolution for external connections, eliminate DNS resolution time, but require all remote hosts to use IP access
  • Keybuffer size: the cache size of the index block. Increasing the cache size will improve the index processing speed, and has the greatest impact on the performance of the MyISAM table. For memory of 4G or so, it can be set to 256M or 384m, and query show status like’key_ Read% ‘, guarantee key_ reads / key_ read_ Requests below 0.1% is the best
  • innodbbufferpool_ Size: cache data blocks and index blocks, which have the greatest impact on InnoDB table performance. Query show status like’innodb_ buffer_ pool_ Read% ‘, guarantee (InnoDB_ buffer_ pool_ read_ requests –<br />Innodb_ buffer_ pool_ reads) / Innodb_ buffer_ pool_ read_ The higher the requests, the better
  • InnoDB additional tool size: the memory space size used by InnoDB storage engine to store data dictionary information and some internal data structures. When there are too many database objects, adjust the size of this parameter appropriately to ensure that all data can be stored in memory and improve access efficiency. When it is too small, MySQL will record warning Information to the database error log, this parameter size needs to be adjusted
  • innodblogbuffer_ Size: the buffer used by InnoDB storage engine’s transaction log is generally not recommended to exceed 32MB
  • Querycachesize: caches the resultset in mysql, that is, the result set of an SQL statement execution, so it can only be used for select statements. Any change in the data of a table will cause the cache data of all select statements that reference the table to be invalid in the query cache. Therefore, when our data changes very frequently, the use of query cache may not pay off. According to the hit rate (qcache)_ hits / (Qcache_ hits + Qcache_ In general, it is not recommended that the size should be too large. 256MB may be almost the same. Large configuration static data can be appropriately increased. You can use the command show status like’qcache_ %’view the current system query catch usage size
  • Read buffer size: MySQL read in buffer size. A read in buffer is allocated to a request to scan the table sequentially, and a memory buffer is allocated to it by mysql. If the sequential scan requests for a table are very frequent, you can improve its performance by increasing the value of the variable and the size of the memory buffer
  • Sortbuffer size: the buffer size used by MySQL for sorting. If you want to increase the speed of order by, first see if you can let MySQL use indexes instead of extra sort stages. If the sortbuffer size cannot be increased, the size of the variable can be increased
  • readrndbuffer_ Size: the random read buffer size of MySQL. When rows are read in any order (for example, in sort order), a random read buffer is allocated. When sorting query, MySQL will scan the buffer first to avoid disk search and improve the query speed. If you need to sort a large amount of data, you can adjust the value appropriately. However, MySQL will allocate this buffer space for each client connection, so it should be set appropriately as far as possible to avoid excessive memory overhead
  • record_ Buffer: each thread that performs a sequential scan allocates a buffer of this size for each table it scans. If you do a lot of sequential scans, you may want to increase this value
  • Thread cache size: saves threads that are not currently associated with a connection but are ready to serve new connections in the future. It can quickly respond to connected thread requests without creating a new one
  • Tablecache: similar to threadcache_ However, it is used to cache table files, which has little effect on InnoDB, and is mainly used for MyISAM

Upgrade hardware

According to whether MySQL is CPU intensive or I / O intensive, the performance of MySQL can be significantly improved by improving CPU and memory and using SSD.

Separation of reading and writing

It is also commonly used optimization, read from the main database write, generally do not use double master or multi master, introduce a lot of complexity, try to use other solutions in this paper to improve performance.

At the same time, many current splitting solutions also consider the separation of read and write. Read write separation: hand in hand to teach you to achieve MySQL read-write separation

cache

Caching can occur at these levels:

  • MySQL internal: describes the relevant settings in the system tuning parameters
  • Data access layer: for example, mybatis caches SQL statements, while hibernate can be accurate to a single record. The objects cached here are mainly persistent objects
  • Application service layer: more precise control and more implementation strategies can be achieved for caching by programming means. Here, the cache object is data transfer object (dto)
  • Web layer: cache web pages
  • Browser client: client caching

According to the actual situation, the cache can be added in one or more levels. Cache expansion: common cache problems and Solutions

This paper focuses on the cache implementation of the next service layer. At present, there are two main ways:

  • Write through: after the data is written to the database, the cache is updated to maintain the consistency between the database and the cache. This is also how most application caching frameworks, such as spring cache, work. This implementation is very simple, good synchronization, but the efficiency of the general.
  • Write back: when there is data to be written to the database, only the cache will be updated, and then the cached data will be synchronized to the database asynchronously in batches. This implementation is more complex and requires more application logic. At the same time, it may cause the database and cache to be out of sync, but the efficiency is very high.

Table partition

The partition introduced by MySQL in version 5.1 is a simple horizontal split. Users need to add partition parameters when creating tables. It is transparent to the application and does not need to modify the code

For users, the partition table is an independent logical table, but the underlying layer is composed of multiple physical sub tables. The code for partitioning is actually through the object encapsulation of a group of underlying tables, but for the SQL layer, it is a black box that completely encapsulates the underlying layer. The way MySQL implements partitioning also means that the index is defined according to the sub table of the partition, and there is no global index.

这个有点强,MySQL常用优化指南及大表优化思路(值得收藏)

 

The user’s SQL statement needs to be optimized for the partition table. The column of partition condition should be included in the SQL condition, so that the query can be located on a small number of partitions. Otherwise, all partitions will be scanned. You can use explain partitions to see which partitions a SQL statement will fall on, so as to perform SQL optimization. As shown in the following figure, five records fall on two partitions:

这个有点强,MySQL常用优化指南及大表优化思路(值得收藏)

 

The benefits of partitioning are:

  • You can make a single table store more data
  • The data of partition table is easier to maintain. You can delete a large amount of data by clearing the whole partition, or you can add new partitions to support the newly inserted data. In addition, an independent partition can be optimized, checked and repaired
  • Part of the query can be determined from the query conditions, only a few partitions, the speed will be very fast
  • The data of partition table can also be distributed on different physical devices, so as to efficiently utilize multiple hardware devices
  • Partition table can be used to avoid some special bottlenecks, such as exclusive access of InnoDB single index and inode lock competition of ext3 file system
  • You can back up and restore a single partition

Limitations and disadvantages of Zoning:

  • A table can have up to 1024 partitions
  • If the partition field has primary key or unique index columns, all primary key columns and unique index columns must be included
  • Partition tables cannot use foreign key constraints
  • Null values invalidate partition filtering
  • All partitions must use the same storage engine

Type of partition:

  • Range partition: assigns multiple rows to a partition based on column values that belong to a given continuous interval
  • List partition: similar to partition by range, the difference is that list partition is selected based on column value matching a certain value in a discrete value set
  • Hash partition: a partition selected based on the return value of a user-defined expression that uses the column values of these rows to be inserted into the table. This function can contain any expression valid in MySQL that produces non negative integer values
  • Key partition: similar to hash partition, the difference is that key partition only supports calculation of one or more columns, and MySQL server provides its own hash function. One or more columns must contain integer values

The suitable scenes for the partition are as follows:

If the most suitable scene data has strong time series, it can be partitioned according to time, as follows:

这个有点强,MySQL常用优化指南及大表优化思路(值得收藏)

 

The efficiency of query is very high when time range condition is added. At the same time, the unnecessary historical data can be deleted in batches.

If there are obvious hot spots in the data, and other data are rarely accessed except this part of data, then the hot data can be placed in a separate partition, so that the data of this partition can be cached in memory. Only a small partition table is accessed during query, which can effectively use the index and cache. Index related: take you through the MySQL index from the beginning to the end

In addition, MySQL has an early simple partition implementation merge table, which has many restrictions and lacks optimization. Therefore, it is not recommended to use it and should be replaced by a new partition mechanism

Split Vertically

The vertical database is split according to the correlation of the data tables in the database. Extension: summary of common sub database and sub table schemes of Internet companies

For example, if a database contains both user data and order data, then vertical splitting can put user data into user database and order data into order database.

Vertical table splitting is a way to vertically split a data table. It is common to split a large multi field table into common fields and non used fields. Generally, the number of data records in each table is the same, but the fields are different, so the primary key association is used

For example, the original user table is:

这个有点强,MySQL常用优化指南及大表优化思路(值得收藏)

 

After vertical split:

这个有点强,MySQL常用优化指南及大表优化思路(值得收藏)

 

The advantages of vertical splitting are:

  • It can make the row data smaller, a block can store more data, and reduce the number of I / O during query (less blocks are read each time)
  • It can achieve the purpose of maximizing the use of cache. When vertically splitting, you can put together the fields that do not change frequently, and those that change frequently
  • Data maintenance is simple

The disadvantages are:

  • The primary key is redundant, and redundant columns need to be managed
  • It will cause table join operations (increase CPU overhead) and reduce database pressure by joining on the business server
  • There is still a problem of too much data in a single table (horizontal splitting is required)
  • Complex transaction processing

split horizon

summary

Horizontal splitting is to store the data into pieces by some strategy. The database is divided into two parts: table and database. Each piece of data will be distributed to different MySQL tables or libraries, achieving the distributed effect and supporting a very large amount of data. The previous table partition is also a special kind of in library table partition.

Sub table in the database simply solves the problem of too large data in a single table. Because the data of the table is not distributed to different machines, it does not play a significant role in reducing the pressure on the MySQL server. We still compete for IO, CPU and network on the same physical machine, which should be solved by sub database

If you split the user table vertically in front of you, the result is as follows:

这个有点强,MySQL常用优化指南及大表优化思路(值得收藏)

 

In fact, it is often a combination of vertical split and horizontal split, namely users_ A_ M and users_ N_ Z is divided into users and user extras, so there are four tables in total

The advantages of horizontal splitting are:

  • There is no performance bottleneck of single database big data and high concurrency
  • The application end transformation is less
  • The stability and load capacity of the system are improved

The disadvantages are:

  • The consistency of partitioned transactions is difficult to solve
  • Cross node join has poor performance and complex logic
  • It is very difficult to expand the data for many times and maintain it

Fragmentation principle

  • Can not be divided, refer to single table optimization
  • The number of partitions should be as small as possible, and the partitions should be evenly distributed on multiple data nodes as much as possible. Because the more cross partitions of a query SQL are, the worse the overall performance will be. Although it is better than the results of one partition of all data, only expand the capacity and increase the number of partitions when necessary
  • Partition rules need to be carefully selected and planned well in advance. In the selection of fragmentation rules, we need to consider the data growth mode, data access mode, fragmentation correlation, and fragmentation expansion. The latest fragmentation strategies are range slicing, enumeration fragmentation, and consistent hash fragmentation. All of these are conducive to the expansion
  • Try not to span multiple shards in one transaction. Distributed transaction is always a difficult problem
  • The query conditions should be optimized as much as possible, and the mode of select * should be avoided as much as possible. Under a large number of data result sets, a lot of bandwidth and CPU resources will be consumed. The query should avoid returning a large number of result sets, and try to index the frequently used query statements.
  • Reducing the possibility of cross database join through data redundancy and table partition dependency

This paper emphasizes the selection of fragmentation rules. If the data of a table has obvious time characteristics, such as orders, transaction records, etc., they are usually more appropriate to use time range segmentation. Because of the timeliness of data, we tend to focus on its recent data, and the query conditions often contain time fields for filtering. The better solution is to use the current live data The data of jump is divided into pieces in a short time period, while the historical data is stored in a longer span.

Generally speaking, the selection of fragmentation depends on the most frequent SQL query conditions. Because the query SQL without any where statement will traverse all the partitions, and its performance is relatively poor. Therefore, the more such SQL is, the greater the impact on the system is. Therefore, we should try our best to avoid this kind of SQL.

Solution

Due to the complicated logic involved in horizontal splitting, there are many mature solutions at present. These options fall into two broad categories:

  • Client architecture
  • Agent architecture

Client architecture

By modifying the data access layer, such as JDBC, data source and mybatis, multiple data sources are managed through configuration, directly connected to the database, and the data fragmentation integration is completed in the module, which is generally presented in the form of jar package

This is an example of a client architecture:

这个有点强,MySQL常用优化指南及大表优化思路(值得收藏)

 

As you can see, the implementation of fragmentation is implemented with the application server by modifying the spring JDBC layer

The advantages of client architecture are:

  • Application of directly connected database to reduce the risk of downtime caused by peripheral system dependence
  • Components with low integration cost and no additional operation and maintenance

The disadvantages are:

  • Limited to the database access layer can only do articles, scalability is general, for more complex systems may not be able to do
  • Putting the pressure of sharding logic on the application server creates additional risk

Agent architecture

Through independent middleware to manage all data sources and data fragmentation integration, the back-end database cluster is transparent to the front-end application, and needs independent deployment and operation and maintenance agent components

This is an example of an agent architecture:

这个有点强,MySQL常用优化指南及大表优化思路(值得收藏)

 

In order to separate and prevent single points, proxy components generally exist in the form of clusters, and may need to be managed by service components such as zookeeper

The advantages of the agent architecture are:

  • It can handle very complex requirements, and is not limited by the original implementation of database access layer, and has strong scalability
  • Transparent to the application server without any additional load

The disadvantages are:

  • Independent agent middleware needs to be deployed and operated, which has high cost
  • The application needs to connect to the database through the agent, so there is one more hop on the network, and there is loss of performance and additional risk

Comparison of schemes

这个有点强,MySQL常用优化指南及大表优化思路(值得收藏)

 

Open your eyes

With so many options, how to choose? It can be considered as follows:

  • Determine whether to use the proxy or client architecture. Small and medium-sized or relatively simple scenarios tend to choose client architecture, while complex scenarios or large-scale systems tend to choose proxy architecture
  • Whether the specific functions are satisfied, such as the need for cross node order by, the priority to support this function
  • Regardless of the products that have not been updated in a year, it shows that the development is stagnant, and even there is no maintenance and technical support
  • It’s better to choose the producer in the order of large company, community, small company and individual
  • Choose the ones with good word-of-mouth, such as GitHub star count, user quantity and quality, and user feedback
  • Open source is the priority. Often projects have special requirements and may need to change the source code

According to the above ideas, the following options are recommended:

  • Client Architecture: shardingjdbc
  • Agent architecture: MYCAT or atlas

MySQL compatible and horizontally scalable database

At present, some open source databases are compatible with MySQL protocol, such as:

https://github.com/pingcap/tidb

http://www.cubrid.org/

However, there is still a gap between its industrial quality and MYSQL, and a large amount of operation and maintenance investment is required. If you want to migrate the original Mysql to a new horizontally scalable database, you can consider some cloud databases:

https://cn.aliyun.com/product/petadata/?spm=5176.7960203.237031.38.cAzx5r

https://cn.aliyun.com/product/oceanbase?spm=5176.7960203.237031.40.cAzx5r

https://www.qcloud.com/product/dcdbfortdsql.html

NoSQL

Sharding on MySQL is a dance with shackles on it. In fact, many large tables do not need MySQL RDBMS and do not require acid. You can consider migrating these tables to NoSQL to completely solve the problem of horizontal expansion. For example:

  • Log data, monitoring data and statistical data
  • Unstructured or weakly structured data
  • The transaction requirements are not strong, and there are not too many associated operation data

Author: as low as one fold up
juejin.im/post/5b7d52afe51d453885030b91

 

Recommended Today

PHP 12th week function learning record

sha1() effect sha1()Function to evaluate the value of a stringSHA-1Hash. usage sha1(string,raw) case <?php $str = “Hello”; echo sha1($str); ?> result f7ff9e8b7bb2e09b70935a5d785e0cc5d9d0abf0 sha1_file() effect sha1_file()Function calculation fileSHA-1Hash. usage sha1_file(file,raw) case <?php $filename = “test.txt”; $sha1file = sha1_file($filename); echo $sha1file; ?> result aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d similar_text() effect similar_text()Function to calculate the similarity between two strings. usage similar_text(string1,string2,percent) case […]