MySQL Data Table Merge and Reduplicate

Time:2019-8-12

Scene:

The crawled data is generated into a data table, which has the same structure as another main table and needs to be merged and de-duplicated.

Solution: (direct example)

  • First create two tables pep, pep2, where PEP is the main table

    CREATE TABLE IF NOT EXISTS `pep/pep2`(
    `id` INT UNSIGNED AUTO_INCREMENT,
    `no` VARCHAR(100) NOT NULL,
    PRIMARY KEY ( `id` )
    )ENGINE=InnoDB DEFAULT CHARSET=utf8;
  • Then insert two pieces of data into pep, and one piece of data in PEP 2 is the same as that in pep.

    insert into pep(no) values('abc');
    insert into pep(no) values('caa');
    
    insert into pep2(no) values('abc');
  • Insert pep2 data into PEP

    insert into pep (no) select no from pep2;
  • Grouping to recreate the new temporary table tmp

    create table tmp select id,no from pep group by no;

    Note: The ID field type of the table created is no longer primary key self-increasing

    Maybe you will make a mistake.
     ```Syntax error or access violation: 1055 Expression #1 of SELECT 
     list is not in GROUP BY clause and contains nonaggregated 
     column 'XXX.Y.ZZZZ' which is not functionally dependent on 
     columns in GROUP BY clause; this is incompatible with
      sql_mode=only_full_group_by
     ```
     Solution: Execute the following two orders:
     ```
     mysql> set global sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
     
     mysql> set session sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
     ```
  • Delete the PEP table and rename the TMP table PEP

    drop table pep;
    alter table tmp rename to pep;
  • Look at the desc structure and select * from pep to find that the field type of ID has changed, which needs to be changed back to the original type.

    alter table pep add primary key (id);
    alter table pep modify id int auto_increment;

There is also the use of join to do heavy, faster can also add a field (can be a few fields + the MD5 value), to create a unique index for this field, after inserting data, automatically filter back duplicate data.

Recommended Today

Preliminary Practice of Quality System Construction of Honeycomb Large Transportation Business

Quality is one of the key factors that determine the success of products and the sustainable development of enterprises. How to do a good job in the construction of quality system is a relatively large topic, covering a wide range, and there is no fixed measurement standard. When you open an Internet company recruitment website […]