Python crawler in Xiaobai school (5): pre preparation (4) database foundation


Python crawler in Xiaobai school (5): pre preparation (4) database foundation

Life is short, I use python

Previous portal:

Xiaobai’s Python crawler (1): Opening

Python crawler (2): preparation (1) installation of basic class library

Learn Python crawler (3): preparation (2) introduction to Linux

Xiaobai’s Python crawler (4): preparation (3) introduction to docker

In this article, we will introduce the basic content, database.

After the crawler crawls the data, there must be a place to store it. Where does the data exist?

Of course, it’s in the database. The one who said to put it in Excel, please stop!

Python crawler in Xiaobai school (5): pre preparation (4) database foundation

Of course, Excel can also be used, and the third party also provides class library support for Excel operation. However, SQL library is still an old code farmer’s insistence.

Database is now divided into relational database, non relational database and new database.

It’s not a big mistake to use English again.

  • SQL (Structured Query Language): database, refers to relational database. Main representatives: SQL server, Oracle, mysql, PostgreSQL.
  • NoSQL (not only SQL): generally refers to non relational database. Main representatives: mongodb, redis and CouchDB.
  • Newsql: abbreviation for various new extensible / high performance databases. Main representatives: clustrix, geniedb and tidb.

The databases used in this series of articles are mainly MySQL and redis.

Now, let’s start to pretend happily.

MySQL installation

MySQL can be found in the win environment.exeHowever, the editor does not recommend that you install MySQL directly. At this time, the docker described above will come into use. We will introduce how to install MySQL in docker.

Linux will be used in the system environment of this paper. Of course, the installation process and command are basically the same under windows through docker.

First, download the MySQL image from the image warehouse to the local

docker pull mysql:5.7

Wait for the progress bar to finish. Then we can use the command to view the image downloaded just now

docker images

Python crawler in Xiaobai school (5): pre preparation (4) database foundation

If you can see the above figure, it means that the download is successful. We only need to start the MySQL image.

docker run --name mysql --restart=always -p 3306:3306 -v /www/mysql/conf.d:/etc/mysql/conf.d -v /www/mysql/mysql.conf.d:/etc/mysql/mysql.conf.d -v /www/mysql/datadir:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=123456 -e TZ=Asia/Shanghai -d mysql:5.7

Let me explain the meaning of several parameters here:

--name: here is the name of the post boot container.
--restart: automatic restart, for example, the server is suddenly powered off, and it is not necessary to restart manually after restarting the server.
-p: Specifies the port number.
-v: mount. The configuration and data in the container can not be accessed directly, but we can hang these contents in our own local directory. The directory here is compiled using the file directory of Linux. If you want to operate on the win computer, please remember to modify it.
-e: the root password of the configuration database and the specified time zone are Shanghai in Asia.
-d: as a daemon thread.

After that, our MySQL installation is completed. You can check whether the container is started normally through the command.

docker ps

Python crawler in Xiaobai school (5): pre preparation (4) database foundation

Well done! Is it simple? We only need three commands to build a stand-alone version of MySQL service. Moreover, in different systems, the commands are almost the same when installed by docker.

But with MySQL installed, how do we look at the data in it? This can install a client software, called Navicat, but it’s expensive, so I don’t have to say more about how to use it.

Python crawler in Xiaobai school (5): pre preparation (4) database foundation

The interface is quite fresh, and the operation is very simple. After all, there is a Chinese version. You can see how to operate it by yourself.

In addition, it is recommended that you install MySQL in Linux.

Redis installation

Let’s first introduce redis.

Redis is generally used as a cache because its data is stored in memory, so its read-write speed is far faster than mysql. The data is in memory, which means that once the power is turned off and restarted, all data will be lost.

Redis also provides data persistence to the hard disk. However, after enabling data persistence to the hard disk, the performance of redis will be effectively reduced.

Similarly, we install redis in docker.

Download redis:

docker pull redis

Here, first create a folder to use as the mount directory of redis. Edit the folder directory created here as/www/redis/, which is used to store the configuration files and data of redis.

The start command of redis is as follows:

docker run -d -p 6379:6379 --restart=always -v /www/redis/conf/redis.conf:/usr/local/etc/redis/redis.conf -v /www/redis/data:/data --name docker-redis redis --appendonly yes

Redis’s configuration file is not posted here. It’s too long. Upload it to the code warehouse. If you need it, you can take it by yourself.

By the way, this Redis cache service will share a common official account with you, and reply to redis in the public address to get the cache service configuration.

The hardware load is low, this share is only used for testing. I hope you don’t do high-risk operations such as pressure test.

Excel installation

At present, office is also updated to the 2019 version. I won’t say much about how to install it. I only say one word: office tool. If you understand it, you will understand it naturally. If you don’t understand it, you will go to Baidu. Baidu will not let you down.

Connection library installation

We have talked about the installation of the database above. If we want to connect to these databases using python, we also need some class libraries provided by a third party.


In Python, you need to install pymysql if you want to connect to MySQL for operation.

The installation command is as follows:

pip install pymysql


In Python, to connect to redis for operation, you need to install redis py.

The installation command is as follows:

pip install redis

This is the end of this article. I hope you can do it yourself. Thank you.

Sample code

Sample code GitHub

Sample code gitee

Python crawler in Xiaobai school (5): pre preparation (4) database foundation

If my article is helpful, please scan the code to pay attention to the official account of the author: get the latest dry cargo push:

Recommended Today

How to share queues with hypertools 2.5

Share queue with swote To realize asynchronous IO between processes, the general idea is to use redis queue. Based on the development of swote, the queue can also be realized through high-performance shared memory table. Copy the code from the HTTP tutorial on swoole’s official website, and configure four worker processes to simulate multiple producers […]