Evaluation of million level data migration scheme

Time:2020-11-13

preface

Recently, the company is usingABPBefore refactoring the old project, the database is alsoSQL SERVERSwitch toMySql。 Make complaints about the product before.Windows Server 2008 , SqlServer 2008R2, .Net Framework 4.5, now embrace.net core。 Back to the point. At present, there are10w+100w+The data are different. We’ll test them later. Database switching, as well as changes in the database table structure, can not avoid the need for data migration. There are not many migration schemes. Here are two schemes I try to use for testing.

Multithreaded bulk write

private static async Task BatchInsertTestUsers(List testUsers)
        {
            var prefix =
                "INSERT INTO users (Id,Name,Age) VALUES";
            using (IDbConnection conn = new MySqlConnection(DataMigrationConfig.MySqlConstr))
            {
                var sqlText = new StringBuilder();
                sqlText.Append(prefix);

                foreach (var testUser in testUsers)
                {
                    sqlText.AppendFormat(
                        $"({testUser.Id},'{testUser.Name}', {testUser.Age}),");
                }

                var insertSql = sqlText.ToString().Substring(0, sqlText.ToString().LastIndexOf(','));
                await conn.ExecuteAsync(insertSql);
            }
        }
  • BatchInsertTestUsersThe input collection is spliced intoSQLAnd implement.
public static Task RunMultiTasks(List users)
        {
            var tasks = new List();
            var pageSize = 10000;
            var writeCount = (users.Count() / pageSize) + 2;

            for (var i = 1; i < writeCount; i++)
            {
                var skipCount = (i - 1) * pageSize;
                var batchInsertList = users.Skip(skipCount).Take(pageSize).ToList();

                var task = Task.Run(() => { BatchInsertTestUsers(batchInsertList); });
                tasks.Add(task);
            }

            var sw = new Stopwatch();
            sw.Start();
            Task.WaitAll(tasks.ToArray());
            sw.Stop();
            Console.WriteLine ($"multi thread batch insert time:{ sw.ElapsedMilliseconds } ms");

            return Task.FromResult(0);
        }
  • RunMultiTasksInsert data in batches and at one time1wArticle.

MySQL bulkloader scheme

come to knowMySqlBulkLoaderBecauseSqlServerOfSqlbulkcopyMySqlBulkLoaderImport of collection is not supported. You need to export the data as.csvFormat, and then read.csvData import.

public static async Task Export(string filePath, List items)
        {
            IExporter exporter = new CsvExporter();
            await exporter.Export(filePath, items);
        }
  • Here, the data is exported using dotnetcore, which is open-source by Chinese people/ Magicodes.IE I this export code, should understand it! Simple operation!!!
public static void Load(string filePath, string tableName)
        {
            using MySqlConnection conn = new MySqlConnection(DataMigrationConfig.MySqlConstr);
            var bulk = new MySqlBulkLoader(conn)
            {
                FieldTerminator = ",",
                FieldQuotationCharacter = '"',
                EscapeCharacter = '"',
                LineTerminator = "\r\n",
                FileName = filePath,
                Local = true,
                NumberOfLinesToSkip = 1,
                TableName = tableName,
                CharacterSet = "utf8mb4",
            };

            bulk.Load();
        }
  • Because the database is not on its own computer, it is setLocal = trueRead the local file and import it.

Test instructions

  • This test is conducted locally, and the database runs on a machine deployed on the intranetDockerIn the container, a mechanical hard disk is used. If you are usingSSDHard disk, the effect will be better.
  • The test here is mainly to insert simple user data, which is defined as follows:
public class TestUser
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public int Age { get; set; }
    }
  • Test separately1w10w,100wThe performance of data insertion, as well as the impact of index on and off
  • The test execution code is as follows:
class Program
    {
        static async Task Main(string[] args)
        {
            var testData = DataGen.Run(100 * 10000);
            await RunMultiTasks(testData);
            await RunMySqlLoaderTask(testData);
        }

        public static async Task RunMultiTasks(List users)
        {
            await DataMigrateTask.RunMultiTasks(users);
        }

        public static async Task RunMySqlLoaderTask(List users)
        {
            var fileName = "users";
            var filePath = Directory.GetCurrentDirectory() + "\" + fileName + ".csv";
            await DataMigrateTask.Export(filePath, users);
            var sw = new Stopwatch();
            sw.Start();
            DataMigrateTask.Load(filePath, "users");
            sw.Stop();
            Console.WriteLine ($"MySQL bulkloader time:{ sw.ElapsedMilliseconds } ms");
        }
    }

test result

That’s the point.

programme 1w 10w 100w
RunMultiTasks 367ms 3548ms 91263ms
RunMySqlLoaderTask 2031ms 1597ms 13105ms
Runmultitasks (close index) 233ms 3230ms 67040ms
Runmysql loadertask (close index) 1785ms 1367ms 12456ms

last

The above test is only for reference, the above simple test, when the amount of data is largeMySqlLoaderTaskThe advantage is obvious, for less than1wMultithreading can be used to insert data in batches, and the effect is better. Interested partners can download their own code to play. If there is a better one
The plan is open to instruction.

  • Code address: datamigrationtest

pit

  • MySqlLoaderImportnullData usageNULL, rather thanmysqlAccording to the document\N