Does the space character affect the group grouping of string fields? On the importance of skills

Time:2022-3-19

The company is going to ring the bell for listing, and the audit is in progress. Two students in our group also participated in some data analysis.

 

First introduce the key field of a user table (t_soho): user_ Name varchar (32) – user name and ID_ Card-varchar (255) – ID number (ciphertext storage), create_ Time datetime registration time

One of the statistics is grouped by ID number.

The two students who backup each other write SQL statements respectively. After execution, it is found that the results of the two sides are different. The following is the SQL of both parties, in which fun_ Decryption is the decryption function. The difference is that the result of the first statement is more than that of the second.

SELECT ... FROM t_soho WHERE ... GROUP BY `FUN_DECRYPTION`(id_card);
SELECT ... FROM t_soho WHERE ... GROUP BY id_card;

It can be seen that the difference is that one decrypts the data and the other does not decrypt it.

Normally, whether it is plaintext data or ciphertext data, it should not affect the counting. Well, since the results of the above two sides are different, there must be something wrong.

After analysis, the two students believe that it may be caused by the existence of space characters, such as:11011519920120718XCiphertext and11011519920120718X (there is a space after x) the ciphertext is different. If the program does not process this space character, t_ Two different ciphertext field values will appear in the SOHO table.

Analysis and regression analysis, that is, whether it is such a thing, we need to verify its accuracy.

A classmate soon confirmed that the above difference was caused by the space character. This is how he verifies it: select a, count (1) from (select ‘test’ as a union all select ‘test’) as table1 group by a

Another student later confirmed the correctness of this analysis. His verification method is to modify the data of the table, that is, there are two lines of the same text in the same field, one of which is followed by a space, and then find group by.

 

The previous “different way” is skill!

 

Let’s see another thing.

There is a complex report in Excel, which needs to count the activity of monthly registered users month by month.

This is troublesome. It takes a lot of effort to get all the data out of an SQL. One of our little friends wasted nine cattle and two tigers. It took an hour to finish it. His implementation steps are: after writing the SQL, copy the execution results into excel, then turn the rows into columns, and paste them into the cells of each month one by one; After this line is completed, continue to modify the month in SQL, and complete the data line of the next month in the same way until the data lines of all months are completed. Later, when checking, I found that there was a problem with his SQL, and all his previous efforts were wasted, so the boy couldn’t help sighing.

Take a popular example. For example, during the epidemic period, the vegetables and fruits we bought from the vegetable market should be disinfected and stored. One way is to disinfect and store in the same way, and the other way is to disinfect and then store in a unified way. Our suggestion is to use the latter.

In the above case, the row column conversion of Excel is a skill; The way of batch operation is skill, which saves the time spent switching back and forth between MySQL client and excel.

 

Skills are important. If you master certain skills, it will be icing on the cake!

 

The old mother cleaned up the housework, handed over a piece of golden crackers and nagged, “three small bags of cookies from shopping. These cookies are very delicious!” I heard it unconsciously, and then looked at the package. It’s Crispy rock roast cheese. “Cheese” is power. Hurry to arrange it!