Author: Xiao yaya
As a member of aikesheng DBA team, he is responsible for solving database faults and platform problems in the project, and has a special liking for database high availability and distributed technology.
Source: original contribution
*It is produced by aikesheng open source community. The original content cannot be used without authorization. For reprint, please contact Xiaobian and indicate the source.
When verifying the application docking function, the customer’s dble test environment found that a function failed to produce results for a long time and reported an error of timeout. It is learned from the customer that this error will appear when the function fails to obtain the back-end data return at a given time.
The current dble version is 2.19.07.3 and the database version is mysql-5.7.25. The two tables cusvaa and cusm involved in business SQL are split tables, with data volumes of 67 and 7600W respectively; The join condition is the stringhash used by the fragmentation key and fragmentation rules. It is divided into 8 slices, each with a length of 10. After the customer engineer fed back the above situation, he asked me a question: “the association condition is the fragmentation key, and the fragmentation rules are the same. Why is the execution efficiency so slow?”
With the customer’s questions, the problem investigation began. The SQL that should be executed for this function at that time is as follows:
When I see this screenshot, my first hunch is that the execution time is too long. Is the fragment key condition of join invalid? Then asked the customer for an implementation plan.
The SQL execution plan at dble layer is as follows:
After obtaining this information, we can basically confirm that the partition key has no effect. In the execution plan, it can be clearly seen that dble splits and distributes SQL to each partition, obtains and sorts all data from the partition, and then performs merge and join operations in the dble middle layer.
Normally, SQL should be distributed to each partition, and the execution results of each partition should be returned to dble for merge operation directly.
Thinking of the same splitting rule stringhash that the customer said, I couldn’t help muttering. Does the customer configure hashslice? The customer was then asked to provide the dble configuration file.
Seeing the rule.xml file provided by the customer, I really verified my conjecture. Although the same sharding rule stringhash is used, the specific sharding function configuration is different, which is clearly two sharding rules.
After that, the customer and business development confirmed the value range of the fragmentation key and adjusted the fragmentation function. As follows:
After adjustment, load dynamically to make the configuration effective. View the SQL execution plan again:
After seeing the execution plan, confirm that the current partition key works, and then ask the customer to verify the function again. The result is also responded in seconds.
For function configuration of stringhash, not only partitionlength  and partitioncount  arrays need to be defined, but also hashslice binary needs to be configured. During SQL parsing and routing, dble will judge all the contents of the sharding rules. Only the SQL with the same sharding rules will be directly distributed to each shard. After the calculation results are returned to the dble layer for merge operation. On the contrary, it is necessary to retrieve all the data from the shard to the dble layer for merge and join operation. When dealing with the slow execution of SQL in dble environment, we should not only obtain the execution of SQL, but also confirm whether the description provided by the customer is consistent with the actual situation.