Become a contributor Series in ten minutes | add optimization rules for cascades planner


Author: Cui Yiding

Up to now, “become a contributor series” has launched a series of activities such as “support ast restore to SQL”, “add build in function for tikv”, “vectorization expression”, etc.Taking advantage of the opportunity of tidb to optimize the database, we can take advantage of this opportunity to optimize the database.

As we all know, the optimizer is the core component of the database. It needs to find a reasonable execution plan in a reasonable time to ensure that the query can return the correct results stably and quickly. The original optimizer only had some heuristic optimization rules. With the change of data volume and business, the industry has designed the system R optimizer framework to deal with more and more complex SQL queries. It divides query optimization into two stages: logical optimization and physical optimization. Logical optimization transforms execution plan into physical plan which can execute faster according to statistical information and cost calculation. At present, tidb optimizer also adopts the optimizer model.

Although the system R optimizer framework greatly improves the ability of the database to process complex SQL, it also has some defects, such as:

  1. Poor scalability. Every time an optimization rule is added, the relationship between the new rule and the old rule needs to be considered. Students who are very familiar with the optimizer can accurately determine where the new optimization rule should be. In addition, each optimization rule needs to traverse the whole logic execution plan completely, and the mental burden and knowledge threshold of adding optimization rules are very high.
  2. The search space is limited. On the one hand, the search space is narrow due to the difficulty of adding optimization rules. On the other hand, logical optimization requires that the optimization rules must have benefits in all scenarios. However, in various scenarios faced by the database, there are always some optimization rules that have benefits in one data distribution, but no benefits in another data distribution. Therefore, the cost should be estimated according to the data distribution Because of this reason, some optimization rules can not be added to the search framework, or the optimization rules need to be turned on or off manually after adding.

In order to solve the above problems, it is more convenient to add optimization rules, expand the optimizer search space, and find a better execution plan. Based on the cascade optimizer model, we re write a new optimizer named cascades planner. In this optimizer framework, adding optimization rules becomes extremely simple

  1. It is not necessary to consider the order relationship between the optimization rules, and the rules are completely decoupled.
  2. Only add optimization rules for specific patterns, no longer need to traverse the entire logical execution plan, do not need to be familiar with the functions of all logic operators, greatly reducing the development threshold of the optimizer.

In this article, we mainly focus on how an optimization rule works and how to add rules to the new optimizer. First, let’s have an intuitive feeling of the optimizer: “optimizer is not difficult, but it is.”. In the next article, we will introduce the principle and framework of tidb cascades planner in more detail for further study by interested students. If you can’t wait, you can first read the following references to understand the cascade optimizer model in advance:

  1. The Cascades Framework for Query Optimization
  2. Orca: A Modular Query Optimizer Architecture for Big Data
  3. CMU SCS 15-721 (Spring 2019) : Optimizer Implementation (Part II)

Introduction to cascades optimizer

Cascades optimizer is a search framework born after optimization and adjustment of Goetz Graefe on the basis of volcano optimizer generator. This framework has the following concepts:

  • Expression: in the original paper, expression is used to refer to all expressions including the plan node (i.e., the SQL operator commonly known as SQL operator) and various function expressions (such as more than 200 built-in functions supported by MySQL). In the existing framework implementation of tidb, only the plan node is regarded as expression. Expression mostly contains child nodes, but each child node is not an expression, but a set of equivalent expression sets, that is, the group to be introduced next.
  • Group: represents the set of equivalent expressions, that is, the expressions in the same group are logically equivalent. Each expression group is represented as a child node. In the following illustration, groupG0Contains two equivalent expressions before and after predicate pushdown.
  • Transformation rule: it is an equivalent change rule applied to expression and group, which is used to expand the search space of optimizer. It is also the module that this time focuses on. Group in the figure belowG0The second group inExpressionThat’s the first groupExpressionAfter a predicate condition (filter) pushes down the transformation rule of a joinExpression
  • Pattern: describes a fragment of an execution plan. Note that this “fragment” is not a “subtree.”. This fragment can be any segment in the execution plan operator tree. Each transformation rule has its own pattern, which means that the rule is only used to satisfy the expression of the pattern.
  • Implementation rule: a rule that converts a logical operator to a physical operator. For example, a join can be converted into hashjoin, mergejoin, indexnested loopjoin, etc. Each transformation is completed by a corresponding implementation rule.

To queryselect * from t1 join t2 on t1.a = t2.a where t1.b > 1For example, after the transformation rule is pushed down through the predicates of the cascades optimizer, the groups and expressions in the search space will be as follows:

Become a contributor Series in ten minutes | add optimization rules for cascades planner

At present, the search process of cascades optimizer in tidb is as follows:

  1. Firstly, the abstract syntax tree (AST) is transformed into the initial logical execution plan, that is, the operator tree represented by logical plan.
  2. The cascades planner splits the initial logical plan tree equivalently intoGroupandGroupExprIn this way, we get the initial input of the cascades planner optimizer.
  3. The cascade planner divides the search process into two stages. The first stage is exploration, which continuously traverses the whole group, applies all feasible transformation rules, and generates new groups and groupexpr, iterating until no new groupexpr is born.
  4. In the second stage of implementation, the cascades planner searches for the best (lowest cost) physical execution plan for each group by applying the corresponding implementation rules to groupexpr.
  5. After the end of the second stage, the cascades planner will generate a final physical execution plan. The optimization process is finished, and the physical execution plan is handed over to the tidb execution engine module for further processing.

How an optimization rule works

At present, all the implementation codes of transformation rules are placed in theplanner/cascades/transformation_rules.goFile. We usePushSelDownProjectionAs an example, it briefly introduces the workflow of a transformation rule.

Transformation rule is an interface, which is defined as follows (omit the comment section)

type Transformation interface {
    GetPattern() *memo.Pattern
    Match(expr *memo.ExprIter) bool
    OnTransform(old *memo.ExprIter) (newExprs []*memo.GroupExpr, eraseOld bool, eraseAll bool, err error)

In cascades, each rule matches a local expression subtreeGetPattern()This is to return the pattern that the rule is going to match. The specific structure of the pattern is as follows (omit the comment part)

type Pattern struct {
    Children []*Pattern

What needs to be mentioned here is thatEngineTypeSetFor example, the tikv parameter can not be executed on the tikv (tikv) open source engine. For example, a tikv parameter can not be executed in the tikv column. This parameter is introduced to handle rules that only work on specific execution engines.

Patterns can be constructed with the help of patternsBuildPattern()as well asNewPattern()To complete. aboutPushSelDownProjectionAccording to this rule, the execution plan pattern that it works isProjection -> Selection

func (r *PushSelDownProjection) GetPattern() *memo.Pattern {
    return memo.BuildPattern(
        memo.NewPattern(memo.OperandProjection, memo.EngineTiDBOnly),

Match()Function is to make some more specific judgments after hitting pattern, because pattern only contains operator type and structure information of operator tree. In this case, for example, if a rule only takes effect on inner join, pattern can only judge that the operator is a join. To further determine whether it is an inner join, we need to rely on itMatch()Function. For most simple transformation rules, soMatch()The function simply returnstrueThat’s it.

OnTransform()Function is the main logic of the rule. Inside the function, we will create a new expression and return the appropriate oneeraseOldas well aseraseAllValue of. For example, predicate push-down will make the calculation as early as possible and reduce the amount of subsequent calculation. Therefore, the newly generated expression must be a better choiceeraseOldYou can go backtrue 。 Similarly, when the expression returned by a rule must be better than all other choices, such as an optimization rule discoverya > 1 and a < 1After constant false, the result of the judgment query must not be generated, so it is generatedTableDualThe new expression oferaseAllreturntrueLet the optimizer empty all other expressions in the same group.

PushSelDownProjectionOfOnTransform()The behavior of is shown in the figure below

Become a contributor Series in ten minutes | add optimization rules for cascades planner

  1. In the beginning, onlyG0andG1Two related groups.
  2. This optimization rule willSelectionpush toProjectionNext, create a new groupG2At the same timeG0New inProjection->G2OfGroupExpr

How to add a transformation rule

To add a transformation rule is to write a new structure and implement itTransformationThe three interfaces of this interface. The advantage of cascades architecture is that it does enough abstraction, so that adding rules doesn’t need to consider too many complicated things. If you feel that some places are not easy to write when adding a rule, you can immediately stop your keyboard and discuss with us in sig planner.

Of course, here are some notes to help you avoid going awry when adding transformation rules

  • OnTransform()Add sufficient comments in the function header and function process to explain what kind of transformation your rule has made, so that others can quickly understand what the rule has done when reading the code of this rule.
  • OnTransform()Do not make any changes to the original expression in the function, because the original expression may continue to trigger other behaviors later. If the modification is made, some unexpected chemical reactions may occur in the next trigger rule.
  • To be indefaultTransformationRuleMapRegister the rule in the. The map is currently the default of tidbRuleSet。 useRuleSetFor example, using different rule sets for TP and AP queries enables the optimizer to quickly make the same execution plan when processing TP queries, and applies more optimization rules to make execution plans with shorter execution time when processing AP queries.
  • Unit testing is essential. Currently, the testing of transformation rule istransformation_rules_test.goMedium. Test function can refer to other functions under the file, mainly to run a complete SQL test. In order to reduce the workload of modifying the test output, we record the test input and output separately in the file, and can update the output quickly through the command line. After adding the test function, you need to modify thetransformation_rules_suite_in.jsonFile to add input to the test, and then use thego test --recordThe correspondingxxx_out.jsonDocuments. Remember to check whether the output of the test meets the expectation and make sure that the test result is the equivalent transformation you want.
  • At present, the cascades optimizer is still in its early stage. Occasionally, there are some framework changes, which may cause some code conflicts to be solved. Please understand.

How to become a contributor

In order to facilitate discussion with the community on planner related matters, we have created ා sig planner in tidb community slack for exchange and discussion. After that, we will set up a special interest group for optimizer. No threshold is set. Interested students are welcome to join us. When you encounter some problems when you add rules, you can make complaints about channel and Tucao.

Participation process:

  1. In cascade tracking issueporting the existing rules in the old plannerChoose the function you are interested in and tell everyone that you will complete it.
  2. Add a rule and add unit tests to it.
  3. functionmake devTo ensure that all tests pass.
  4. Initiate a pull request and complete the merge to the main branch.

If you have any questions, please come to sig planner for questions and discussion.

Original reading:

Become a contributor Series in ten minutes | add optimization rules for cascades planner