How to optimize query performance in multi table query


In the actual development process, such as DDD Domain model congestion scheme or in order to make the data model more convenient for later expansion and interpretation, it is not convenient and not recommended to add status fields to solve the problem. However, the upper layer business is relatively complex, so there will be the problem of adaptation between the data model and the business requirements. Complex business may require the joint table query of multiple tables in the data model How to solve these problems?

Split mode

The original SQL method is divided into multiple steps. Multi step can be at the SQL level or at the program level.

Some business situations are allowedExecution through multiple SQLThe results are assembled in the program to get the final result set that meets the requirements. Here, the author recommends that it should be processed in the program as much as possible. The advantage of this is to reduce the pressure on the database.

it’s fine tooWith the help of other caching MiddlewareFor example, redis processes part of the data in advance, and the query performance of redis is certainly better than that of the database.

Merge mode (read write separation)

The required result set is stored in redundancy mode in advance, and the program only needs to query the redundant data set.

Some people will first think of using views to do this, but those who know the views know that every time you query the view data, you will still execute the linked table SQL, which will not improve the performance.

useCache table modeTaking MySQL as an example, MySQL provides the implementation of cache table, which caches the target data into the cache table first, and then looks up the data in the cache table.

Synchronize data to elasticsearch and query the redundant data in elasticsearchAlibaba canal products provide the implementation of MySQL synchronization to elasticsearch, which can be referred toSync ES · alibaba/canal Wiki · GitHub。 However, this scheme often has the problem of delay, so it is only suitable for real-time scenarios.

Big data spark / FlinkIn real-time or offline mode (low real-time requirements) for multiple target tables business processing, persistent result set, the program only needs to read the data in the result set. Flink provides the joining scheme, which can be referred toApache Flink 1.11 Documentation: Joining

Sub database and sub table + master-slave modeFor example, in the multi tenant scenario, the data is isolated between tenants. We can use one master and many slaves, and one master is used to write data. The multiple slaves are split according to the tenants, and each tenant queries its own slave database. However, the program implementation of this scheme will be more complex. At the same time, the data volume of a certain tenant is very large, and there will be performance problems.

Summary: this kind of mode is often implemented by event driven mode, which will lead to real-time and sequence problems, which need to be considered when selecting the type. The schemes mentioned above are only suitable for the scenario of relatively simple join table query. If there are complex requirements such as sub query, they can not meet the requirements.

How to optimize query performance in multi table query