Python operates PostgreSQL database, threads modify 5W pieces of data in parallel, and performance optimization

Time:2022-6-17

Get all the data in the new XLS table and sort it out and return it as a list

In fact, the amount of modified code is not large, but we should consider retaining the functions and methods we used before and continue to use them.

excel2sql. Py:

When we create the object opexcel, add default values for the file URL and sheet name:

class OpExcel(object):

When we do not give parameters, the file we open is the original file.

When we open a new XLS file with the region name and all longitudes and latitudes, it looks like the following:

Test = opexcel (config.src\u path+ "\\data\\test.xls", "list of cities, provinces, cities and counties in China")

Then we can query new files and data

The function init we wrote before_ SampleViaProvince_ Name, the first level location and domain name (list) of the specified province are returned through parameter transfer

We modified it to return all data without input transmission by default:

#Query the longitude and latitude of each region name through the Gaode map API and return it as list data

Test:

f:/workspace/env/Scripts/python. Exe f:/workspace/ City distance crawl /process_ data/excel2sql. py

success.

The new function has been defined, so we can also use the previous method in the main function to transfer all data to the database.

Optimize code and write more convenient print log system

In config Py file:

#Construct the print log format and call config logger. Info() is OK

Print the log when you need to print the log again.

For example, the code for initializing the database link:

opsql.py

#!/ usr/bin/python

To clean up a table data and query function when modifying our previous OP database:

#Define emptying of tables

And the link breaking code after the call is completed:

#Database shutdown

Test the emptying operation:

if __ name__  == "__main__":

Python operates PostgreSQL database, threads modify 5W pieces of data in parallel, and performance optimization

(env) PS F:\workspace> & f:/workspace/env/Scripts/python. Exe f:/workspace/ City distance crawl /op_ postgresql/opsql. py

ok!nice

The same thread concurrency pool is used to op the database, modify the previous code, and optimize

Or the concurrency of thread pool:

#Define insert data operation in city sample table

OK, let’s start with main Py write a test function and use the test data to see:

@config. logging_ time

Run successfully!:

(env) PS F:\workspace> & f:/workspace/env/Scripts/python. Exe f:/workspace/ City distance crawl /main py

The speed is a little too fast. Let’s see if the data is really saved?

Python operates PostgreSQL database, threads modify 5W pieces of data in parallel, and performance optimization

nice!

Retrieve location data from the database, and 1-1 obtain distance and path through Gaode API, and process a small amount of data (level 0 cities)

Because the places we need to compare are:

-0 level and 0 level distance

Level 1 and level 1 distance

Then we will classify two locations:

We need to generate a 1-1 relationship from the list of cities. According to the handshake principle, I use the method of calculating nodes in the full graph of undirected graph, and use the recursive algorithm to achieve:

def get_ disViaCoordinates(self,

Write a test function to query the data from the database, query the one-to-one location path according to the handshake principle, inject the distance and other data of level 0 cities into the database, and there are 5W pieces of level 0 city data

@config. logging_ time

When we run, we can obviously see that the number of threads increases dramatically. It’s still fun, ha ha

Python operates PostgreSQL database, threads modify 5W pieces of data in parallel, and performance optimization

We save the queried data into the parameters, and we can see about 57630 more data through the debugging console:

Python operates PostgreSQL database, threads modify 5W pieces of data in parallel, and performance optimization

We insert City 0 type data into the database:

(env) PS F:\workspace> & f:/workspace/env/Scripts/python. Exe f:/workspace/ City distance crawl /main py

View database:

Python operates PostgreSQL database, threads modify 5W pieces of data in parallel, and performance optimization

The data volume should be the same as we calculated before!

Python operates PostgreSQL database, threads modify 5W pieces of data in parallel, and performance optimization

Operation succeeded!

but!

There are more than 2800 cities at level 1, but only 300 cities at level 0. When the handshake principle is used to match the data, the order of magnitude is certainly not a unit.

Cranial pain

Finally, we throw out a discussion. We use the formula (handshake principle, the edge of a full undirected graph is n (n-1) /2) to calculate that more than 2800 City 1-1 relationships are 400W. How can we better insert them into the database?

This work adoptsCC agreement, reprint must indicate the author and the link to this article

article!! Started on my blogStray_Camel(^U^)ノ~YO