Pgboxer best practices: Series 2

Time:2021-5-13

Author: Wang Zhibin, a former PGCE of China, is an official certified lecturer of PostgreSQL and a specially invited gold medal lecturer of Pangu cloud classroom.

The selection of connection pool must be supported by test data in order to better decide how to choose. Through the following test results, we can more intuitively see the difference between the two (relevant data and test results are from percona [3]):

In general, PostgreSQL implements connection processing by “branching” its main operating system processes into each new connected child process. At the operating system level, a complete view of the resource utilization of each connection in PostgreSQL is obtained (the following output is from the top command)

Table 1 direct memory usage

PID USER      PR NI VIRT RES  SHR S %CPU %MEM TIME+  COMMAND 

24379 postgres  20 0 346m 148m 122m R 61.7  7.4 0:46.36 postgres: sysbench sysbench ::1(40120)

24381 postgres  20 0 346m 143m 119m R 62.7  7.1 0:46.14 postgres: sysbench sysbench ::1(40124)

24380 postgres  20 0 338m 137m 121m R 57.7  6.8 0:46.04 postgres: sysbench sysbench ::1(40122)

24382 postgres  20 0 338m 129m 115m R 57.4  6.5 0:46.09 postgres: sysbench sysbench ::1(40126)

First, in terms of time and memory, it is much more expensive to fork an operating system process than to generate a new thread for an existing process. As time goes on, consideration becomes more and more important. This may be one of the reasons why connection pooling is needed early in the extended life cycle of PostgreSQL based applications.

In order to illustrate the possible impact of connection pool on the performance of PostgreSQL server, we use the tests on PostgreSQL on sysbench TPCC, and partially repeat these tests by using pgbouncer as the connection pool.

When running the test for the first time, the goal is to optimize PostgreSQL for the sysbench TPCC workload of PostgreSQL, which runs 56 concurrent clients (threads), and the server has the same number of available CPUs, and the running time is set at 30 minutes. The goal this time is to change the number of concurrent clients (56, 150, 300, and 600) to see how the server responds to the expansion of the connection.

The transaction pool is used for testing because the workload of sysbench TPCC consists of several phrasal sentences and single sentence transactions. The following table is the complete configuration file, named pgbouncer.ini:

Table 2 pgbouncer.ini file

[databases]

sbtest = host=127.0.0.1 port=5432 dbname=sbtest

[pgbouncer]

listen_port = 6543

listen_addr = 127.0.0.1

auth_type = md5

auth_file = userslist.txt

logfile = pgbouncer.log

pidfile = pgbouncer.pid

admin_users = postgres

pool_mode = transaction

default_pool_size=56

max_client_conn=600

Except for the pool_ Besides mode, the most important variables are:

  • default_ pool_ Size: how many server connections are allowed per user / database pair.
  • max_ client_ Conn: the maximum number of client connections allowed

Userslist.txt through the specified file auth_ File only contains the user and password information used to connect to PostgreSQL; Depending on the authentication method to be used, the password in the file can be either plain text or encrypted with MD5 or scram.

Another way to define users is to have pgbouncer directly query the PostgreSQL backend when needed. This is set by configuring the parameters of auth_ User, which can be set globally or in each database. After setting this option, pgbouncer uses the user to connect to the PostgreSQL backend and run the query auth defined by the setting_ Query to find the user and password. If auth_ If user needs the password for the connection, it needs to be set in user.txt. For details, seeOfficial website of pgboxer

Start pgbouncer as a daemon with the following command:

$pgbouncer -d pgbouncer.ini

In addition to only running the benchmark for 30 minutes and changing the number of concurrent threads at a time, the number of threads = 56. The following example is from the first run:

$ ./tpcc.lua --pgsql-user=postgres --pgsql-db=sbtest --time=1800 --threads=56 --report-interval=1 --tables=10 --scale=100 --use_fk=0  --trx_level=RC --pgsql-password=****** --db-driver=pgsql run > /var/lib/postgresql/Nando/56t.txt

For tests that use connection pooling, adjust the connection options to connect directly to pgbouncer instead of PostgreSQL. Note that it is still a local connection:

./tpcc.lua --pgsql-user=postgres --pgsql-db=sbtest --time=1800 --threads=56 --report-interval=1 --tables=10 --scale=100 --use_fk=0  --trx_level=RC --pgsql-password=****** --pgsql-port=6543 --db-driver=pgsql run > /var/lib/postgresql/Nando/P056t.txt

After each sysbench TPCC execution, use the following command to clear the operating system cache:

$ sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'

In default_ pool_ In the case of size = 56, the results are as follows:

Pgboxer best practices: Series 2

TPS of sysbench TPCC: comparing direct connection with PostgreSQL and using pgbouncer as connection pool

When running sysbench TPCC with only 56 concurrent clients, the direct connection to PostgreSQL can provide 2.5 times higher throughput (TPS represents transactions per second) than using pgbouncer. In this case, using connection pooling can greatly affect performance. In such a small scale, the connection pool has no revenue, only overhead.

However, when we ran the benchmark with 150 concurrent clients, we began to see the benefits of using connection pooling. Obviously, the TPS value of the test is significantly higher than that of the direct connection mode.

Even if the number of concurrent clients doubles and then quadruples, pgbouncer can still maintain such throughput. In this case, instead of immediately filling a large number of requests to the server, all stops outside pgbouncer. Once a connection in its pool is released, pgbouncer only allows the next request to continue to PostgreSQL.

This strategy seems to be very effective for sysbench TPCC. For other workloads, the balance point may be elsewhere.

For the above test, set default on pgbouncer_ pool_ Size is set to equal the number of CPU cores available on this server (56). To explore the tuning of this parameter, I repeated these tests with larger connection pools (150, 300, 600) and smaller connection pools (14). The results are as follows

Pgboxer best practices: Series 2

How does the use of pgbouncer affect the throughput of sysbench TPCC: first, compare the use of different pool sizes

Using a smaller connection pool (14), which is only a quarter of the number of available CPUs, still produces almost the same result. It shows that making full use of pgbouncer for connection processing has begun to have effect.

Doubling the number of connections in a connection pool makes no real difference. However, once the number is inferred to be 600, the number of concurrent threads is greater than the number of available CPUs, and the throughput becomes equivalent to that without connection pooling. This is true even if the number of concurrent threads running is the same as the number of connections available in the pool (600). It is expected that there is a practical limitation in PostgreSQL.

First, it seems like a good idea to set the connection pool size to equal the number of CPUs available in the server. About 150 connection pools may have a hard limit. The following table summarizes the results obtained for different views:

Pgboxer best practices: Series 2

From the above test process, we can learn that the use of connection pool can fully improve the efficiency of database processing.

To learn more about PostgreSQL hotspots, news trends and exciting activities, please visitChina PostgreSQL official website

To solve more PostgreSQL related knowledge, technology and work problems, please visitOfficial Q & a community of PostgreSQL in China

Download more PostgreSQL related materials, tools, plug-ins, please visitOfficial download website of PostgreSQL in China