MySQL left join avoid pit Guide

Time:2019-12-9

phenomenon

left joinIn our usemysqlThe process of query is very common, such as how many comments a blog article has, how many comments a commodity has in the mall, how many likes a comment has, and so on. But because of the rightjoinonwhereUnfamiliar keywords sometimes lead to inconsistent query results with expectations, so today I’ll summarize and avoid pits together.

Here I’ll give you a scenario and two questions. If you can answer both questions correctly, you don’t need to read this article.

Suppose there is a class management application, a table classes, which stores all classes, and a table students, which stores all students. The specific data is as follows (thanks to Liao Xuefeng’s online sql):

SELECT * FROM classes;

id    name
1 class 1
2 class two
3 class three
4 class four

SELECT * FROM students;

id  class_id  name   gender
1 1 Xiaoming M
2 1 small red f
3 1 small army M
4 1 millet f
5 2 small white f
6 2 minion M
7 2 Xiaolin M
8 3 small new F
9 3 Xiao Wang M
10 3 Xiaoli f

Now there are two requirements:

  1. Find out the name of each class and the corresponding number of female students
  2. Find out the total number of students in class one

For requirement 1, most people can come up with the following two SQL writing methods without thinking. Which is right?

SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    and s.gender = 'F'
    group by c.name

perhaps

SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    where s.gender = 'F'
    group by c.name

For requirement 2, most people can also come up with the following two SQL writing methods without hesitation. Which is right?

SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    Where c.name = first shift 
    group by c.name

perhaps

SELECT c.name, count(s.name) as num 
    FROM classes c left join students s 
    on s.class_id = c.id 
    And c.name = first shift 
    group by c.name

Please don’t continue to turn down!! First, give your own answer. The correct answer is below.
.
.
.
.
.
.
.
.
The answer is that the first statement of both requirements is correct. To understand this problem, you need to understand that MySQLleft joinThe following section will expand the implementation principle of.

root

mysqlaboutleft joinSimilar toNested loopFor slave processing, take the following statement as an example:

SELECT * FROM LT LEFT JOIN RT ON P1(LT,RT)) WHERE P2(LT,RT)

amongP1yesonFilter condition, if it is missing, it is considered asTRUEP2yeswhereFilter condition, missing also considered asTRUEThe execution logic of the statement can be described as follows:

For each row lt in LT {// traverses each row of the left table
  BOOL b = FALSE;
  For each row RT in RT such that P1 (LT, RT) {// traverse each row of the right table to find the row that meets the join condition
    If P2 (LT, RT) {// meets where filter condition
      T: = lt|rt; // merge the row and output the row
    }
    B = true; // lt has a corresponding row in RT
  }
  If (! B) {// after traversing RT, it is found that LT does not have a corresponding row in RT, then try to fill a row with null
    If P2 (LT, null) {// fill in null and meet the where filter condition
      T: = lt|null; // output the line filled with LT and null
    }         
  }
}

Of course, in practice, MySQL will use buffer to optimize and reduce the number of row comparisons, but this does not affect the key execution process, which is not covered in this article.

From this pseudo code, we can see two points:

  1. If you want to restrict the right table, you mustonConditions, ifwhereIf it is in progress, it may lead to data loss, and the row with no matching row in the left table and the right table will not appear in the final result, which violates ourleft joinUnderstanding. Because for the row with no right table matching row in the left table, after traversing the right tableb=FALSE, so I will try itNULLMake up the right table, but at this time ourP2The right table row is restricted. Null is not satisfiedP2(NULLGenerally, the restrictions will not be met unlessIS NULLThis will not be added to the final result, resulting in the loss of results.
  2. WithoutwhereConditions, regardless ofonWhat kind of restriction does the condition have on the left table? Each row of the left table will have at least one row synthesis result. For the left table row, if there is no corresponding row in the right table, then after the right table traversal endsb=FALSE, will use one lineNULLTo generate data, which is redundant. So you must use where to filter the left table.

Next, expand the execution results and error causes of the error statements of the two requirements:
Demand 1

name    num
Class 2
Class two 1
Class three 2

Demand 2

name    num
Class 4
Class two 0
Class three 0
Class four 0
  1. In requirement 1, due to the restriction on the right table in the where condition, the data is missing (there should be one for class 40Result)
  2. In requirement 2, due to the restriction on the left table in the on condition, the data is redundant (the results of other classes are also out, or wrong)

summary

Through the above phenomenon and analysis, we can draw a conclusion: in theleft joinIn the statement, the left table filter must be placed in the where condition, and the right table filter must be placed in the on condition, so that the result is not much, just good.

SQL seems to be simple, but there are many details in it. A little confusion will cause the result to be inconsistent with the expectation. Therefore, we should pay attention to these details at ordinary times to avoid mistakes at critical times.