Compare the performance of execute and execute any in pymysql, and how does it compare with native SQL

Time:2020-10-21

When inserting a large amount of data into MySQL today, I suddenly think of pymysql and executemany method. Which method is faster? How fast?

testing environment

  • python3
  • mysql
  • pymysql

The old rule, test code first:

class IN_sql():
    def __init__(self):
        Print ('initializing... ')
        self.connDB = pymysql.connect(host=DB_HOST,
                                      user=DB_USER,
                                      passwd=DB_PASSWD,
                                      db=DB_DB,
                                      charset='utf8mb4')
        self.connDB.ping()
        self.cu=self.connDB.cursor()

    def workOn(self):
        L=[]
        with open("./1W.txt","r",encoding="utf-8")as f:
            while True:
                res=f.readline().replace("\r","").replace("\n","")
                if not res:
                    break
                L.append(res)
        #Count the time taken to insert
        start = time.time()
        sql = "insert ignore into article(an) values(%s);"
        
        #Execute execute code
        for l in L:
            self.cu.execute(sql, l)
            
        #Executemany execute code
        #self.cu.executemany(sql, tuple(L))
        
        self.connDB.commit()
        Print ('total time (seconds): '+ str (round)( time.time () - start, 2)))

        self.cu.close()
        self.connDB.close()


if __name__ == '__main__':
    insert=IN_sql()
    insert.workOn()

test result

Method order time (s) 1W 10W 100W
execute 3.126 25.139 248.022
executemany 0.1 0.981 10.854

summary

You can see the obvious difference. If you need to insert the database in batch, it is better to use the executemany method, which is not an order of magnitude with execute!!!

==Recently, a friend asked, “there are many differences between execute any and native SQL statements?”?==
Here I have carried on the test again:

def workOn(self):
        L=[]
        with open("./100W.txt","r",encoding="utf-8")as f:
            while True:
                res=f.readline().replace("\r","").replace("\n","")
                if not res:
                    break
                L.append(res)
        #Count the time taken to insert
        start = time.time()
        print("start time:" + time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(start)))
        sql = "insert into article(an) values  "
        for l in L:
            sql += "('%s'), " % l
        sql = sql.rstrip(', ') + ';'
        self.cu.execute(sql)
        self.connDB.commit()
        end=time.time()
        print("end time:"+time.strftime("%Y-%m-%d %H:%M:%S",time.localtime(end)))
        Print ('total time (seconds): '+ str (round (end - start, 3)))

The final result: 20.176s

WHAT?

==Why is executing native SQL slower than executing any? In theory, native SQL should be the fastest. = =

According to the source code query, executemany actually combines all parameters into a SQL statement for execution (= = insert into article (an) values(), (), (), (), (), (), () =). The optimization is mainly in string splicing.

Finally, the test results show that: the string splicing of my own code took about 12s, writing SQL statements took about 8s, while the execution any string splicing took only about 2S.

Therefore, when you encounter a large number of data writing in the future, try to use the executemany method!

Recommended Today

CentOS install Chinese Fonts

preface When we encounter some Chinese display garbled problems, it may be because the server does not install Chinese fonts install 1. View system fontsBefore starting the installation, we first check the installed fonts in the system. To view the installed fonts in the system, we can use thefc-listIf there is no command in the […]