Mysql database deletes duplicate data, and only one method instance is reserved

Time:2022-1-23

1. Problem introduction

Suppose a scenario, a user table, contains three fields. id,identity_ id,name。 Now ID number identity_ There are many duplicate data in ID and name, which need to be deleted, and only one valid data is retained.

2. Simulation environment

1. Log in to the MySQL database and create a separate test database mysql_ exercise


create database mysql_exercise charset utf8;

2. Create user table users


create table users(
					id int auto_increment primary key,
					identity_id varchar(20),
					name varchar(20) not null
     );

3. Insert test data

Insert into users values (0, '620616199409206512', 'Zhang San'),
						(0, '620616199409206512', 'Zhang San'),
						(0, '62062619930920651x', 'Li Si'),
						(0, '62062619930920651x', 'Li Si'),
						(0, '6206222199101206211', 'Wang Wu'),
						(0, '6206222199101206211', 'Wang Wu'),
						(0, '32223519990916233', 'Zhao Liu');

It can be executed several times to generate more duplicate data.

4. Solutions

(1) grouping according to ID number and name;

(2) Take out the maximum ID (or minimum ID) after grouping;

(3) Delete other fields except the maximum (or minimum) id;

5.First attempt (failed!!!)


delete from users where id not in (select max(id) from users group by identity_id,name);

report errors:

1093 (HY000): You can’t specify target table ‘users’ for update in FROM clause

Because in mysql, you cannot select the records of a table first, and then update and delete the records of the same table according to this condition.

The solution is to select the result from the select through the intermediate table again, so as to avoid errors,

This problem only occurs in mysql, MSSQL and Oracle.

Therefore, we can take out the SQL statements in brackets first and find the maximum (or minimum) id first.


select max_id from (select max(id) as max_id from users group by identity_id,name);

Then, I reported wrong again!!!

ERROR 1248 (42000): Every derived table must have its own alias

It is suggested that each derived table must have its own alias!

When the sub query is executed, the outer query will treat the inner query as a table, so we need to add an alias to the inner query

Continue to correct:

Give the maximum (or minimum ID) result as a new table, alias T, and query t.mix_ id。


select t.max_id from (select max(id) as max_id from users group by identity_id,name) as t;

The maximum (or minimum) ID can be found successfully, as shown in the following figure:

6. Second attempt (successful!!!)


delete from users where id not in (
		select t.max_id from 
		(select max(id) as max_id from users group by identity_id,name) as t
		);

Execution results:

The duplicate data is successfully deleted, and only the last added record is retained. Similarly, you can also keep the records added for the first time (that is, delete other records in each group except the minimum ID)

3. Knowledge expansion I: update data

Other scenario applications: the user table user_ The status of a user whose name in info is an empty string (“”) is changed to “0”


update user_info set status='0' where user_id in (select user_id from user_info where name='')

The following errors are also reported:

You can’t specify target table ‘user_info’ for update in FROM clause

Because in mysql, you can’t select records of a table first, and then update and delete records of the same table according to this condition. The solution is to select the results of the selection through the intermediate table again, so as to avoid errors.
Both of the following are acceptable!!!


update user_info set status='0' where user_id in 
	 (select user_id from (select user_id from user_info where name = '') t1);

The following can also be slightly different. The alias can be as or not, T1 user_ The ID is directly related to the user of the inner layer_ ID correspondence is also OK.


update user_info set status='0' where user_id in 
	(select t1.user_id from (select user_id from user_info where name='') as t1);

3.1 step by step analysis

(1) Use the following query results as intermediate tables:


select user_id from user_info where name='';

(2) Query the intermediate table again as the result set:


select user_id from (select user_id from user_info where name='') as t;

(3) Update data


update user_info set status='0' where user_id in 
	(select user_id from (select user_id from user_info where name='') as t1);

4. Extension exercise: delete duplicate data

Write an SQL query to delete all duplicate e-mail addresses in the person table. Only the one with the smallest ID is retained in the duplicate e-mail address.


+----+------------------+
| Id | Email      |
+----+------------------+
| 1 | [email protected] |
| 2 | [email protected]ple.com |
| 3 | [email protected] |
+----+------------------+

ID is the primary key of this table.

For example, after running your query statement, the person table above should return the following lines:


+----+------------------+
| Id | Email      |
+----+------------------+
| 1 | [email protected] |
| 2 | [email protected] |
+----+------------------+

Answer 1:


delete from Person where Id not in (
	select t.min_id from (
		select min(Id) as min_id from Person group by Email
		) as t
	);

Answer 2:


delete p1 from 
	Person as p1,Person as p2 
		where p1.Email=p2.Email and p1.Id > p2.Id;

summary

This is the end of this article about the method of deleting duplicate data in MySQL database. For more information about deleting duplicate data in mysql, please search the previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!