Preliminary Study on the Efficiency Calculation of Postgresql Query

Time:2019-6-27

abstract

One important aspect of relational databases is query speed. The query speed directly affects the quality of a system.

Query speed generally needs to peek at the execution process through query planning.

The query path chooses the path with the lowest query cost to execute. And how does this cost come out?

Parameters and tables of major concern

Parameters:From the postgresql.conf file, you can view it through show


seq_page_cost = 1.0     # measured on an arbitrary scale
random_page_cost = 4.0     # same scale as above
cpu_tuple_cost = 0.01     # same scale as above
cpu_index_tuple_cost = 0.005   # same scale as above
cpu_operator_cost = 0.0025    # same scale as above
parallel_tuple_cost = 0.1    # same scale as above
parallel_setup_cost = 1000.0   # same scale as above 

Table (View): pg_class (focusing on relpages, reltuples), pg_stats

Analyzing the Cost Computing Process of Simple Queries

Establish simulation data, insert 100000 data into a table


create table test(id int, info text);
insert into test(id, info) select i, md5(i::text) from generate_series(1, 100000) t(i); 

The absence of an index

Analyzing the Cost Computing Process of Full Table Query

Posgres= analyze test;  Prevent no analysis
postgres=# explain select * from test;
       QUERY PLAN       
-------------------------------------------------------------
 Seq Scan on test (cost=0.00..1834.00 rows=100000 width=37)

1. Query the pg_class table to see the number of pages and rows in the test table


postgres=# select t.relpages, t.reltuples from pg_class t where t.relname = 'test';
 relpages | reltuples 
----------+-----------
  834 | 100000 

How was the cost calculated at 1834.00?

2. This process actually sequentially scans 834 pages, and the nodes emit 100,000 rows.

3. View configuration parameters


seq_page_cost = 1.0 
cpu_tuple_cost = 0.01 

4. The result is


postgres=# select 834 * 1.0 + 100000 * 0.01;
 ?column? 
----------
 1834.00 

5. The query cost is 1834.00. It is consistent with the query plan above.

Cost Calculating Process of Adding Conditions to the Full Table


postgres=# explain select * from test where id = 100;
      QUERY PLAN      
--------------------------------------------------------
 Seq Scan on test (cost=0.00..2084.00 rows=1 width=37)
 Filter: (id = 100) 

How did the cost of 2084.00 be calculated?

1. Query the pg_class table, pages, tuples and the example above

2. This process is a sequential test table, which emits 100,000 rows and filters 100,000 rows through cloud storage.

3. Look at the cost of a line of filtering operations


cpu_operator_cost = 0.0025 

4. The result is


postgres=# select 834 * 1.0 + 100000 * 0.01 + 100000 * 0.0025;
 ?column? 
-----------
 2084.0000

Indexing


```
create index on test(id);
```

Compare the following four scenarios

Index Only Scan


postgres=# explain select id from test where id = 100;
                 QUERY PLAN                 
-----------------------------------------------------------------------------
 Index Only Scan using test_id_idx on test (cost=0.29..8.31 rows=1 width=4)
  Index Cond: (id = 100) 

Index Scan


postgres=# explain select * from test where id = 100;
                QUERY PLAN                
-------------------------------------------------------------------------
 Index Scan using test_id_idx on test (cost=0.29..8.31 rows=1 width=37)
  Index Cond: (id = 100) 

Index Scan


postgres=# explain select * from test where id < 100;
                 QUERY PLAN                 
----------------------------------------------------------------------------
 Index Scan using test_id_idx on test (cost=0.29..10.11 rows=104 width=37)
  Index Cond: (id < 100) 

Unordered insertion of data


truncate table test;
insert into test(id, info) select i, md5(i::text) from generate_series(1, 1000000) t(i) order by random();

postgres=# explain select * from test where id < 100;
                 QUERY PLAN                 
----------------------------------------------------------------------------
 Bitmap Heap Scan on test (cost=5.22..380.64 rows=102 width=37)
  Recheck Cond: (id < 100)
  -> Bitmap Index Scan on test_id_idx (cost=0.00..5.19 rows=102 width=0)
     Index Cond: (id < 100)

conclusion

  • When there is an index, the cost will be greatly reduced.
  • The execution plan has a lot to do with the distribution of data.
  • Indexed analysis is a bit more complex, you can refer to the official source code first. I’ll add it later.

summary

Above is the whole content of this article. I hope the content of this article has some reference value for your study or work. Thank you for your support to developpaer.