Migration strategy of MySQL 100 million data migration

Time:2021-9-18

step

The subproject provides a data synchronization interface, which can select the number of synchronized threads and the amount of data inserted in batch according to the situation.
Specific steps:

  1. Full migration: move all historical data in the current library (the new project provides data migration interface, which can migrate data in batch, and this process will move most of the data in the Library)
  2. Incremental migration: record the start time of full migration and the data changed during full migration
  3. Data comparison: compare the data in Cassandra and MySQL through the interface, and the final data consistency reaches more than 99.99%
  4. Enable double write: enable double write after ensuring that there is no problem with full migration and incremental migration through data comparison. If there is a problem with double writing, the data comparison can also find the problem in double writing.
  5. Cut Cassandra read: after making sure that there is no problem with double write, turn off MySQL write.

test result

Spring-data-cassandra

  • 5 threads, 150 data per thread, one hour, 8.08 million data, 134000 / min
  • 5 threads, 500 data per thread, 15 minutes, 2.74 million data, 228000 / min
    When using spring data Cassandra connection driver, the speed is too slow. After inspection, it is found that in the API of spring data Cassandra, batch insertion is performed, and the internal implementation is inserted one by one

Cassandra-driver

During local operation, the number of threads modifying Cassandra concurrent read and write is 32, the size of each batch data is 5000kb, and the read-write timeout reminder is 500000 milliseconds. The test results are as follows:

  • Use a node

    With 10 threads, MySQL queries 5000 pieces each time and inserts 1000 pieces each time. It takes 3 minutes, 1.37 million 456000 / min, and 10 million pieces of data for 21.9 minutes
      With 10 threads, MySQL queries 5000 items each time, inserts 2000 items each time, takes two minutes, 76380000 / min, and 10 million data for 26.3 minutes
      With 6 threads, MySQL queries 5000 pieces at a time, inserts 2000 pieces at a time, two minutes, 99500000 / min, and 10 million pieces of data for 20 minutes
      With 6 threads, MySQL queries 5000 items at a time and inserts 5000 items at a time. One minute, 520000, 520000 / min, 10 million pieces of data, 19.2 minutes
      With 5 threads, MySQL queries 5000 pieces each time, inserts 1000 pieces each time, 7 minutes, 4 million 570000 / min, 10 million pieces of data, 17.5 minutes
      With 5 threads, MySQL queries 5000 items each time, inserts 2000 items each time, one minute, 560000 / min, 10 million data, 17.8 minutes
      With 5 threads, MySQL queries 5000 items at a time and inserts 5000 items at a time. One minute, 360000 360000 / min, 10 million pieces of data, 27.7 minutes
  • Add to two nodes:

    With 5 threads, MySQL queries 5000 items at a time and inserts 5000 items at a time. It takes 10 minutes, 7.9 million 790 million / min, and 10 million pieces of data for 12.6 minutes
      With 5 threads, MySQL queries 10000 pieces each time, inserts 5000 pieces each time, takes 10 minutes, 6.32 million, 632000 / min, and 10 million pieces of data for 15.8 minutes
      With 10 threads, MySQL queries 5000 items each time, inserts 5000 items each time, takes 10 minutes, 6.5 million, 650000 / min, and 10 million data for 15.3 minutes
  • Close the commitlog and the two nodes

    With 5 threads, MySQL queries 5000 items each time, inserts 5000 items each time, takes 10 minutes, 7.6 million 760000 / min, and 10 million pieces of data for 13.1 minutes
  • Close the commitlog and a node

    With 5 threads, MySQL queries 5000 items each time, inserts 5000 items each time, takes 10 minutes, 6.9 million 690000 / min, and 10 million pieces of data for 14.5 minutes
      With 5 threads, MySQL queries 5000 items each time, inserts 5000 items each time, 2.5 minutes, 1.1 million, 440000 / min, 10 million data, 22.7 minutes
      With 5 threads, MySQL queries 1000 items each time, inserts 5000 items each time, 3 minutes, 1.33 million, 440000 / min, 10 million data, 22.7 minutes
      With 5 threads, MySQL queries 10000 pieces each time, inserts 5000 pieces each time, 3 minutes, 840000, 280000 / min, 10 million pieces of data, 35.7 minutes
      With 10 threads, MySQL queries 10000 pieces each time, inserts 5000 pieces each time, 2 minutes, 870000, 435000 / min, 10 million pieces of data, 22.9 minutes
  • Running on the server, three nodes

    With 5 threads, MySQL queries 10000 pieces each time, inserts 5000 pieces each time, 800000 / min, and 10 million pieces of data for 12.5 minutes
      With 5 threads, MySQL queries 20000 pieces each time, inserts 5000 pieces each time, 850000 / min, and 10 million pieces of data for 11.8 minutes
      With 5 threads, MySQL queries 10000 pieces each time, inserts 10000 pieces each time, 800000 / min, and 10 million pieces of data for 12.5 minutes
      With 10 threads, MySQL queries 10000 pieces each time, inserts 5000 pieces each time, 760000 / min, and 10 million pieces of data for 13.1 minutes

Scylla-driver

Using Scylla driver, the speed is basically the same as that of Cassandra driver

##Summary

  1. We installed Cassandra on three servers. When the Cassandra parameter configuration is completely consistent, the data migration speed on 86 servers is 450000 / min ~ 600000 / min, that on 120 servers is 700000 / min ~ 800000 / min, and that on 106 servers is 800000 / min ~ 850000 / min.
  1. In the process of data migration, we tried to modify Cassandra’s concurrent threads, read-write timeout, JVM parameters, etc. and tested them one by one. It was found that when migrating data, the server disk was in frequent IO, and the final bottleneck was that Cassandra flushed the data from memtable to an sstable on the disk.