The simple encapsulation of SQL Server batch insert data scheme sqlbulkcopy makes batch insert more convenient

Time:2021-9-24

1、 Introduction to SQL Server insertion scheme

  • aboutSqlServerThere are three common insertion methods for batch insertion,InsertBatchInsertSqlBulkCopy, let’s compare the speed of the following three schemes

1. OrdinaryInsertInsertion method

public static void Insert(IEnumerable persons)
{
    using (var con = new SqlConnection("Server=.;Database=DemoDataBase;User ID=sa;Password=8888;"))
    {
        con.Open();
        foreach (var person in persons)
        {
            using (var com = new SqlCommand(
                "INSERT INTO dbo.Person(Id,Name,Age,CreateTime,Sex)VALUES(@Id,@Name,@Age,@CreateTime,@Sex)",
                con))
            {
                com.Parameters.AddRange(new[]
                {
                    new SqlParameter("@Id", SqlDbType.BigInt) {Value = person.Id},
                    new SqlParameter("@Name", SqlDbType.VarChar, 64) {Value = person.Name},
                    new SqlParameter("@Age", SqlDbType.Int) {Value = person.Age},
                    new SqlParameter("@CreateTime", SqlDbType.DateTime)
                        {Value = person.CreateTime ?? (object) DBNull.Value},
                    new SqlParameter("@Sex", SqlDbType.Int) {Value = (int)person.Sex},
                });
                com.ExecuteNonQuery();
            }
        }
    }
}

2. SplicingBatchInsertInsert statement

public static void BatchInsert(Person[] persons)
{
    using (var con = new SqlConnection("Server=.;Database=DemoDataBase;User ID=sa;Password=8888;"))
    {
        con.Open();
        var pageCount = (persons.Length - 1) / 1000 + 1;
        for (int i = 0; i < pageCount; i++)
        {
            var personList = persons.Skip(i * 1000).Take(1000).ToArray();
            var values = personList.Select(p =>
                $"({p.Id},'{p.Name}',{p.Age},{(p.CreateTime.HasValue ? $"'{p.CreateTime:yyyy-MM-dd HH:mm:ss}'" : "NULL")},{(int) p.Sex})");
            var insertSql =
                $"INSERT INTO dbo.Person(Id,Name,Age,CreateTime,Sex)VALUES{string.Join(",", values)}";
            using (var com = new SqlCommand(insertSql, con))
            {
                com.ExecuteNonQuery();
            }
        }
    }
}

3.SqlBulkCopyInsert scheme

public static void BulkCopy(IEnumerable persons)
{
    using (var con = new SqlConnection("Server=.;Database=DemoDataBase;User ID=sa;Password=8888;"))
    {
        con.Open();
        var table = new DataTable();
        table.Columns.AddRange(new []
        {
            new DataColumn("Id", typeof(long)), 
            new DataColumn("Name", typeof(string)), 
            new DataColumn("Age", typeof(int)), 
            new DataColumn("CreateTime", typeof(DateTime)), 
            new DataColumn("Sex", typeof(int)), 
        });
        foreach (var p in persons)
        {
            table.Rows.Add(new object[] {p.Id, p.Name, p.Age, p.CreateTime, (int) p.Sex});
        }

        using (var copy = new SqlBulkCopy(con))
        {
            copy.DestinationTableName = "Person";
            copy.WriteToServer(table);
        }
    }
}

3. Speed comparison of three schemes

programme number time
Insert 1000 145.4351ms
BatchInsert 1000 103.9061ms
SqlBulkCopy 1000 7.021ms
Insert 10000 1501.326ms
BatchInsert 10000 850.6274ms
SqlBulkCopy 10000 30.5129ms
Insert 100000 13875.4934ms
BatchInsert 100000 8278.9056ms
SqlBulkCopy 100000 314.8402ms
  • Comparison of insertion efficiency between the two,InsertObvious ratioSqlBulkCopyIt’s too slow, about 20 ~ 40 times the performance gap. Next, we’llSqlBulkCopyPackage it to make batch insertion more convenient

2、 Sqlbulkcopy encapsulation code

1. Method introduction

Batch insert extension method signature

method Method parameters introduce
BulkCopy Synchronous batch insertion method
SqlConnection connection SQL Server connection object
IEnumerable source Data source to be inserted in batch
string tableName = null Insert table name [null, entity name by default]
int bulkCopyTimeout = 30 Batch insert timeout
int batchSize = 0 The number of batches written to the database [if 0 represents all one-time inserts] is the most appropriate number [depending on your environment, especially the number of rows and network latency. Personally, I’ll start with the batchsize property set to 1000 rows and see how it performs. If rows are available, I’ll double the number of rows (for example, to 2000, 4000, etc.) , until performance degradation or timeout. Otherwise, if the timeout occurs at 1000, I will reduce the number of rows by half (for example, 500) until it works.]
SqlBulkCopyOptions options = SqlBulkCopyOptions.Default Batch copy parameters
SqlTransaction externalTransaction = null Executed transaction object
BulkCopyAsync Asynchronous batch insert method
SqlConnection connection SQL Server connection object
IEnumerable source Data source to be inserted in batch
string tableName = null Insert table name [null, entity name by default]
int bulkCopyTimeout = 30 Batch insert timeout
int batchSize = 0 The number of batches written to the database [if 0 represents all one-time inserts] is the most appropriate number [depending on your environment, especially the number of rows and network latency. Personally, I’ll start with the batchsize property set to 1000 rows and see how it performs. If rows are available, I’ll double the number of rows (for example, to 2000, 4000, etc.) , until performance degradation or timeout. Otherwise, if the timeout occurs at 1000, I will reduce the number of rows by half (for example, 500) until it works.]
SqlBulkCopyOptions options = SqlBulkCopyOptions.Default Batch copy parameters
SqlTransaction externalTransaction = null Executed transaction object
  • This method mainly solves two problems:
    • Eliminates manual buildDataTableperhapsIDataReaderFor interface implementation classes, manually built conversions are difficult to maintain. If you modify a field, you have to modify all these places. In particular, you need to deal with the enumeration type and convert it to its basic type (default)int
    • You don’t have to create it yourselfSqlBulkCopyObject, and configure the mapping of database columns, and the configuration of some properties
  • This scheme is also used in our company to meet the company’s demand for batch data insertion, such as third-party reconciliation data
  • This method usesExpressionThe efficiency of dynamically generating data conversion function is similar to that of handwritten native code. Compared with native handwritten code, the redundant conversion loss is very small [the biggest performance loss isValue typeDisassembly and assembly box]
  • The difference between this scheme and other online schemes is: NoListFirst convert toDataTable, and then writeSqlBulkCopyInstead, use an implementationIDataReaderReader packaging forList, every timeSqlBulkCopyInsert a row of data to convert a row of data
  • IDataReaderProgramme andDataTableAdvantages over other schemes
    • efficient:DataTableThe scheme needs to be completely converted before it can be handed over toSqlBulkCopyWrite to the database, andIDataReaderThe scheme can be converted and handed over toSqlBulkCopyWrite to database(For example, 100000 data insertion speed can be increased by 30%
    • Less memory:DataTableThe scheme needs to be completely converted before it can be handed over toSqlBulkCopyWriting to the database requires a lot of memory, andIDataReaderThe scheme can be converted and handed over toSqlBulkCopyWrite to the database without taking up too much memory
    • Powerful: because it is write side conversion, andEnumerableReaderWhat is passed in is an iterator, which can achieve the effect of continuously inserting data

2. Implementation principle

① Entity model and table mapping

  • Database table code
CREATE TABLE [dbo].[Person](
	[Id] [BIGINT] NOT NULL,
	[Name] [VARCHAR](64) NOT NULL,
	[Age] [INT] NOT NULL,
	[CreateTime] [DATETIME] NULL,
	[Sex] [INT] NOT NULL,
PRIMARY KEY CLUSTERED 
(
	[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
  • Entity class code
public class Person
{
    public long Id { get; set; }
    public string Name { get; set; }
    public int Age { get; set; }
    public DateTime? CreateTime { get; set; }
    public Gender Sex { get; set; }
}

public enum Gender
{
    Man = 0,
    Woman = 1
}
  • Create a field mapping [if there is no field mapping, the data will be filled in the wrong position, and if the type is wrong, an error will be reported] [because: if there is no field mapping, it is inserted according to the column serial number by default]
  • Used to create a mapSqlBulkCopyTypeColumnMappingsProperty to complete the mapping between data columns and columns in the database
//Create batch insert objects
using (var copy = new SqlBulkCopy(connection, options, externalTransaction))
{
    foreach (var column in ModelToDataTable.Columns)
    {
        //Create field mapping
        copy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
    }
}

② Convert entity to data row

  • To convert data into data rows:reflex+ExpressionTo complete
    • amongreflexIs used to get theExpressionRequired program classes, properties and other information
    • amongExpressionIs used to generate efficient conversion functions
  • amongModelToDataTableType makes use of the static generic class characteristics to realize the caching effect of generic parameters
  • stayModelToDataTableIn the static constructor of, generate the conversion function, obtain the attribute information to be converted, and store it in the static read-only field to complete the cache

③ Overloading of inserting data using idatareaader

  • EnumerableReaderYesIDataReaderInterface, which is used to read out the model object in the iterator and convert it into a data line forSqlBulkCopyread
  • SqlBulkCopyOnly three methods will be called:GetOrdinalReadGetValue
    • amongGetOrdinalOnly the serial number represented by each column will be read in the first row [to be filled in:SqlBulkCopyTypeColumnMappingsProperties]
    • amongReadThe method iterates to the next line and callsModelToDataTable.ToRowData.Invoke()To convert model objects into data rowsobject[]
    • amongGetValueThe method is to get the value of the specified subscript position of the current row

3. Complete code

Extension method class

public static class SqlConnectionExtension
    {
        /// 
        ///Bulk copy
        /// 
        ///Inserted model objects
        ///Data source to be inserted in batch
        ///Database connection object
        ///Insert table name [null, entity name by default]
        ///Insert timeout
        ///The number of batches written to the database [if 0 represents all one-time inserts] is the most appropriate number [depending on your environment, especially the number of rows and network latency. Personally, I'll start with the batchsize property set to 1000 rows and see how it performs. If rows are available, I'll double the number of rows (for example, to 2000, 4000, etc.) , until performance degradation or timeout. Otherwise, if the timeout occurs at 1000, I will reduce the number of rows by half (for example, 500) until it works.]
        ///Bulk copy参数
        ///Executed transaction object
        ///Insert quantity
        public static int BulkCopy(this SqlConnection connection,
            IEnumerable source,
            string tableName = null,
            int bulkCopyTimeout = 30,
            int batchSize = 0,
            SqlBulkCopyOptions options = SqlBulkCopyOptions.Default,
            SqlTransaction externalTransaction = null)
        {
            //Create reader
            using (var reader = new EnumerableReader(source))
            {
                //Create batch insert objects
                using (var copy = new SqlBulkCopy(connection, options, externalTransaction))
                {
                    //Inserted table
                    copy.DestinationTableName = tableName ?? typeof(TModel).Name;
                    //Number of batches written to the database
                    copy.BatchSize = batchSize;
                    //Timeout
                    copy.BulkCopyTimeout = bulkCopyTimeout;
                    //Create a field mapping [if there is no field mapping, the data will be filled in the wrong position, and if the type is wrong, an error will be reported] [because: if there is no field mapping, it is inserted according to the column serial number by default]
                    foreach (var column in ModelToDataTable.Columns)
                    {
                        //Create field mapping
                        copy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
                    }
                    //Batch write data to database
                    copy.WriteToServer(reader);
                    //Returns the number of inserted data
                    return reader.Depth;
                }
            }
        }

        /// 
        ///Bulk copy-异步
        /// 
        ///Inserted model objects
        ///Data source to be inserted in batch
        ///Database connection object
        ///Insert table name [null, entity name by default]
        ///Insert timeout
        ///The number of batches written to the database [if 0 represents all one-time inserts] is the most appropriate number [depending on your environment, especially the number of rows and network latency. Personally, I'll start with the batchsize property set to 1000 rows and see how it performs. If rows are available, I'll double the number of rows (for example, to 2000, 4000, etc.) , until performance degradation or timeout. Otherwise, if the timeout occurs at 1000, I will reduce the number of rows by half (for example, 500) until it works.]
        ///Bulk copy参数
        ///Executed transaction object
        ///Insert quantity
        public static async Task BulkCopyAsync(this SqlConnection connection,
            IEnumerable source,
            string tableName = null,
            int bulkCopyTimeout = 30,
            int batchSize = 0,
            SqlBulkCopyOptions options = SqlBulkCopyOptions.Default,
            SqlTransaction externalTransaction = null)
        {
            //Create reader
            using (var reader = new EnumerableReader(source))
            {
                //Create batch insert objects
                using (var copy = new SqlBulkCopy(connection, options, externalTransaction))
                {
                    //Inserted table
                    copy.DestinationTableName = tableName ?? typeof(TModel).Name;
                    //Number of batches written to the database
                    copy.BatchSize = batchSize;
                    //Timeout
                    copy.BulkCopyTimeout = bulkCopyTimeout;
                    //Create a field mapping [if there is no field mapping, the data will be filled in the wrong position, and if the type is wrong, an error will be reported] [because: if there is no field mapping, it is inserted according to the column serial number by default]
                    foreach (var column in ModelToDataTable.Columns)
                    {
                        //Create field mapping
                        copy.ColumnMappings.Add(column.ColumnName, column.ColumnName);
                    }
                    //Batch write data to database
                    await copy.WriteToServerAsync(reader);
                    //Returns the number of inserted data
                    return reader.Depth;
                }
            }
        }
    }

Encapsulated iterator data reader

/// 
    ///Iterator data reader
    /// 
    ///Model type
    public class EnumerableReader : IDataReader
    {
        /// 
        ///Instantiate iterator to read object
        /// 
        ///Model source
        public EnumerableReader(IEnumerable source)
        {
            _source = source ?? throw new ArgumentNullException(nameof(source));
            _enumerable = source.GetEnumerator();
        }

        private readonly IEnumerable _source;
        private readonly IEnumerator _enumerable;
        private object[] _currentDataRow = Array.Empty();
        private int _depth;
        private bool _release;

        public void Dispose()
        {
            _release = true;
            _enumerable.Dispose();
        }

        public int GetValues(object[] values)
        {
            if (values == null) throw new ArgumentNullException(nameof(values));
            var length = Math.Min(_currentDataRow.Length, values.Length);
            Array.Copy(_currentDataRow, values, length);
            return length;
        }

        public int GetOrdinal(string name)
        {
            for (int i = 0; i < ModelToDataTable.Columns.Count; i++)
            {
                if (ModelToDataTable.Columns[i].ColumnName == name) return i;
            }

            return -1;
        }

        public long GetBytes(int ordinal, long dataIndex, byte[] buffer, int bufferIndex, int length)
        {
            If (dataindex < 0) throw new exception ($"starting subscript cannot be less than 0!");
            If (bufferindex < 0) throw new exception ("the starting subscript of the target buffer cannot be less than 0!");
            If (length < 0) throw new exception ("read length cannot be less than 0!");
            var numArray = (byte[])GetValue(ordinal);
            if (buffer == null) return numArray.Length;
            If (buffer. Length < = bufferindex) throw new exception ("the starting subscript of the target buffer cannot be greater than the range of the target buffer!");
            var freeLength = Math.Min(numArray.Length - bufferIndex, length);
            if (freeLength <= 0) return 0;
            Array.Copy(numArray, dataIndex, buffer, bufferIndex, length);
            return freeLength;
        }

        public long GetChars(int ordinal, long dataIndex, char[] buffer, int bufferIndex, int length)
        {
            If (dataindex < 0) throw new exception ($"starting subscript cannot be less than 0!");
            If (bufferindex < 0) throw new exception ("the starting subscript of the target buffer cannot be less than 0!");
            If (length < 0) throw new exception ("read length cannot be less than 0!");
            var numArray = (char[])GetValue(ordinal);
            if (buffer == null) return numArray.Length;
            If (buffer. Length < = bufferindex) throw new exception ("the starting subscript of the target buffer cannot be greater than the range of the target buffer!");
            var freeLength = Math.Min(numArray.Length - bufferIndex, length);
            if (freeLength <= 0) return 0;
            Array.Copy(numArray, dataIndex, buffer, bufferIndex, length);
            return freeLength;
        }

        public bool IsDBNull(int i)
        {
            var value = GetValue(i);
            return value == null || value is DBNull;
        }
        public bool NextResult()
        {
            //Move to next element
            if (!_enumerable.MoveNext()) return false;
            //Row level + 1
            Interlocked.Increment(ref _depth);
            //Get data row
            _currentDataRow = ModelToDataTable.ToRowData.Invoke(_enumerable.Current);
            return true;
        }

        public byte GetByte(int i) => (byte)GetValue(i);
        public string GetName(int i) => ModelToDataTable.Columns[i].ColumnName;
        public string GetDataTypeName(int i) => ModelToDataTable.Columns[i].DataType.Name;
        public Type GetFieldType(int i) => ModelToDataTable.Columns[i].DataType;
        public object GetValue(int i) => _currentDataRow[i];
        public bool GetBoolean(int i) => (bool)GetValue(i);
        public char GetChar(int i) => (char)GetValue(i);
        public Guid GetGuid(int i) => (Guid)GetValue(i);
        public short GetInt16(int i) => (short)GetValue(i);
        public int GetInt32(int i) => (int)GetValue(i);
        public long GetInt64(int i) => (long)GetValue(i);
        public float GetFloat(int i) => (float)GetValue(i);
        public double GetDouble(int i) => (double)GetValue(i);
        public string GetString(int i) => (string)GetValue(i);
        public decimal GetDecimal(int i) => (decimal)GetValue(i);
        public DateTime GetDateTime(int i) => (DateTime)GetValue(i);
        public IDataReader GetData(int i) => throw new NotSupportedException();
        public int FieldCount => ModelToDataTable.Columns.Count;
        public object this[int i] => GetValue(i);
        public object this[string name] => GetValue(GetOrdinal(name));
        public void Close() => Dispose();
        public DataTable GetSchemaTable() => ModelToDataTable.ToDataTable(_source);
        public bool Read() => NextResult();
        public int Depth => _depth;
        public bool IsClosed => _release;
        public int RecordsAffected => 0;
    }

Model object to data line tool class

/// 
    ///Object to datatable class
    /// 
    ///Generic type
    public static class ModelToDataTable
    {
        static ModelToDataTable()
        {
            //If you need to eliminate some columns, you can modify this code
            var propertyList = typeof(TModel).GetProperties().Where(w => w.CanRead).ToArray();
            Columns = new ReadOnlyCollection(propertyList
                .Select(pr => new DataColumn(pr.Name, GetDataType(pr.PropertyType))).ToArray());
            //Generate object to data row delegation
            ToRowData = BuildToRowDataDelegation(typeof(TModel), propertyList);
        }

        /// 
        ///Building delegates that convert to data rows
        /// 
        ///Incoming type
        ///Converted properties
        ///Convert data row delegation
        private static Func BuildToRowDataDelegation(Type type, PropertyInfo[] propertyList)
        {
            var source = Expression.Parameter(type);
            var items = propertyList.Select(property => ConvertBindPropertyToData(source, property));
            var array = Expression.NewArrayInit(typeof(object), items);
            var lambda = Expression.Lambda>(array, source);
            return lambda.Compile();
        }

        /// 
        ///Convert attributes to data
        /// 
        ///Source variable
        ///Attribute information
        ///Get property data expression
        private static Expression ConvertBindPropertyToData(ParameterExpression source, PropertyInfo property)
        {
            var propertyType = property.PropertyType;
            var expression = (Expression)Expression.Property(source, property);
            if (propertyType.IsEnum)
                expression = Expression.Convert(expression, propertyType.GetEnumUnderlyingType());
            return Expression.Convert(expression, typeof(object));
        }

        /// 
        ///Get data type
        /// 
        ///Attribute type
        ///Data type
        private static Type GetDataType(Type type)
        {
            //Enumeration is converted to the corresponding value type by default
            if (type.IsEnum)
                return type.GetEnumUnderlyingType();
            //Nullable type
            if (type.IsGenericType && type.GetGenericTypeDefinition() == typeof(Nullable<>))
                return GetDataType(type.GetGenericArguments().First());
            return type;
        }

        /// 
        ///Column set
        /// 
        public static IReadOnlyList Columns { get; }

        /// 
        ///Object to data line delegation
        /// 
        public static Func ToRowData { get; }

        /// 
        ///Convert collection to datatable
        /// 
        ///Assemble
        ///Table name
        ///Datatable after conversion
        public static DataTable ToDataTable(IEnumerable source, string tableName = "TempTable")
        {
            //Create table object
            var table = new DataTable(tableName);
            //Set column
            foreach (var dataColumn in Columns)
            {
                table.Columns.Add(new DataColumn(dataColumn.ColumnName, dataColumn.DataType));
            }

            //Cycle through each row of data
            foreach (var item in source)
            {
                table.Rows.Add(ToRowData.Invoke(item));
            }

            //Return table object
            return table;
        }
    }

3、 Test package code

1. Test code

Table creation code

CREATE TABLE [dbo].[Person](
	[Id] [BIGINT] NOT NULL,
	[Name] [VARCHAR](64) NOT NULL,
	[Age] [INT] NOT NULL,
	[CreateTime] [DATETIME] NULL,
	[Sex] [INT] NOT NULL,
PRIMARY KEY CLUSTERED 
(
	[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

Entity class code

  • The attribute name of the defined entity needs andSqlServerColumn name type correspondence
public class Person
{
    public long Id { get; set; }
    public string Name { get; set; }
    public int Age { get; set; }
    public DateTime? CreateTime { get; set; }
    public Gender Sex { get; set; }
}

public enum Gender
{
    Man = 0,
    Woman = 1
}

test method

//Generate 100000 pieces of data
var persons = new Person[100000];
var random = new Random();
for (int i = 0; i < persons.Length; i++)
{
    persons[i] = new Person
    {
        Id = i + 1,
        Name = "Zhang San" + I,
        Age = random.Next(1, 128),
        Sex = (Gender)random.Next(2),
        CreateTime = random.Next(2) == 0 ? null : (DateTime?) DateTime.Now.AddSeconds(i)
    };
}

//Create database connection
using (var conn = new SqlConnection("Server=.;Database=DemoDataBase;User ID=sa;Password=8888;"))
{
    conn.Open();
    var sw = Stopwatch.StartNew();
    //Batch insert data
    var qty = conn.BulkCopy(persons);
    sw.Stop();
    Console.WriteLine(sw.Elapsed.TotalMilliseconds + "ms");
}

Execute batch insert results

226.4767ms
Please press any key to continue

在这里插入图片描述

4、 Code download

GitHub code address:https://github.com/liu-zhen-liang/PackagingComponentsSet/tree/main/SqlBulkCopyComponents

Recommended Today

Supervisor

Supervisor [note] Supervisor – H view supervisor command help Supervisorctl – H view supervisorctl command help Supervisorctl help view the action command of supervisorctl Supervisorctl help any action to view the use of this action 1. Introduction Supervisor is a process control system. Generally speaking, it can monitor your process. If the process exits abnormally, […]