An Example of Recursive Query with PostgreSQL Tree Structure

Time:2019-6-26

background

A common design for dealing with hierarchical structures with uncertain depths, such as organizational structures, is to store IDs and arent_IDs in a table and construct a tree by self-joining. This approach is very friendly to the process of writing data, but the query process becomes relatively complex. Without introducing MPTT model, a node and its subordinate subnodes must be queried by recursive algorithm.

Oracle provides Conneby extension grammar, which is simple and easy to use. But other RDBMS are not as humane (or I don’t know). Recently, PostgreSQL was used in the project to query tree data and record it.

Constructing sample data

drop table if exists demo.tree_data;
create table demo.tree_data (
 id integer,
 code text,
 pid integer,
 sort integer
);

Insert into demo. tree_data values (1,'China', null, 1);
Insert into demo. tree_data values (2,'Sichuan', 1, 1);
Insert into demo. tree_data values (3,'Yunnan', 1, 2);
Insert into demo. tree_data values (4,'Chengdu', 2, 1);
Insert into demo. tree_data values (5,'Mianyang', 2, 2);	
Insert into demo. tree_data values (6,'Wuhou District', 4, 1);
Insert into demo. tree_data values (7,'Kunming', 3, 1);

Conneby function

If you install the tablefunc extension, you can use the PG version of the connectby function. This is not as powerful as Oracle, but it meets the basic requirements.

-- The API is as follows
Conneby (text relname) -- Table name
  Text keyid_fld,--id field
  Text parent_keyid_fld -- parent ID field	
  [, text order by_fld], -- Sort fields
  Text start_with,--id value of the starting line
  Int max_depth -- Tree depth, 0 means infinite
  [, text branch_delim] -- path separator
- The basic usage is as follows. The returned field name and type must be defined through the AS clause.
select * 
	from connectby('demo.tree_data', 'id', 'pid', 'sort', '1', 0, '~')
	as (id int, pid int, lvl int, branch text, sort int);
	
--Query results
id | pid | lvl | branch | sort
----+-----+-----+---------+------
 1 | | 0 | 1 | 1
 2 | 1 | 1 | 1~2 | 2
 4 | 2 | 2 | 1~2~4 | 3
 6 | 4 | 3 | 1~2~4~6 | 4
 5 | 2 | 2 | 1~2~5 | 5
 3 | 1 | 1 | 1~3 | 6
 7 | 3 | 2 | 1~3~7 | 7
(7 rows)
Only the basic usage can query the relevant information of ID. If you want to query other fields such as code, you need additional join operations.
select 
	t.id, n.code, t.pid, p.code as pcode, lvl, branch
from (
	select * from connectby('demo.tree_data', 'id', 'pid', 'sort', '1', 0, '~')
		as (id int, pid int, lvl int, branch text, sort int)
) as t
	left join demo.tree_data as n on (t.id = n.id)
	left join demo.tree_data as p on (t.pid = p.id)
order by t.sort ;	

 id | code | pid | pcode | lvl | branch
----+--------+-----+-------+-----+---------
 1 | China | | | | 0 | 1
 2 | Sichuan | 1 | China | 1 | 1 ~ 2
 4 | Chengdu | 2 | Sichuan | 2 | 1 ~ 2 ~ 4
 6 | Wuhou District | 4 | Chengdu | 3 | 1 ~ 2 ~ 4 ~ 6
 5 | Mianyang | 2 | Sichuan | 2 | 1 ~ 2 ~ 5
 3 | Yunnan | 1 | China | 1 | 1 | 1 ~ 3
 7 | Kunming | 3 | Yunnan | 2 | 1 ~ 3 ~ 7
(7 rows)

PS: Although the node code can be queried by join, the branch part can not be directly converted to the corresponding code, which is not convenient to use.

CTE grammar

The recursive query of tree data is realized by using CTE grammar and with recursive. Although this method is less direct than connectby, it is more flexible and shows better results.

-- 
with recursive cte as
(
 - Query the root node first 
 select
 id, code, pid, '' as pcode,
 code as branch
 from demo.tree_data where id = 1
 union all
 - Recursive query of direct child nodes of root node through CT 
 select
 origin.id, origin.code, cte.id as pid, cte.code as pcode,
 cte.branch || '~' || origin.code
 from cte
 join demo.tree_data as origin on origin.pid = cte.id
)
select
 id,code, pid, pcode, branch, 
 By calculating the number of separators, the depth of the tree is simulated.
 (length(branch)-length(replace(branch, '~', ''))) as lvl
from cte;

-- 
 id | code | pid | pcode | branch  | lvl
----+--------+-----+-------+-----------------------+-----
 1 | China | | | China | 0
 2 | Sichuan | 1 | China | China ~Sichuan | 1
 3 | Yunnan | 1 | China | China ~ Yunnan | 1
 4 | Chengdu | 2 | Sichuan | China ~Sichuan ~Chengdu | 2
 5 | Mianyang | 2 | Sichuan | China ~Sichuan ~Mianyang | 2
 7 | Kunming | 3 | Yunnan | China ~Yunnan ~Kunming | 2
 6 | Wuhou District | 4 | Chengdu | China ~Sichuan ~Chengdu ~Wuhou District | 3
(7 rows)

Statement of execution process

As can be seen from the example above, the WITH RECURSIVE statement contains two parts

  • Non-recursive term (non-recursive part), which is the front part of union all in the preceding example
  • Recursive term, the latter part of union all in the preceding example

The implementation steps are as follows

  • Execute non-recursive term. (If union is used instead of union all, the result needs to be de-duplicated.) The result is a reference to result in recursive term, and this part of the result is put into a temporary working table
  • Repeat the following steps until the working table is empty: replace the recursive self-reference with the content of the working table, execute the recursive term (remove duplicate data if union is used instead of union all), and replace the working table with the result (if union is used instead of union all, then the result is de-duplicated).

Take query as an example to see the process

Execute non-recursive query

-- Step 1 implementation
 select
 id, code, pid, '' as pcode,
 code as branch
 from demo.tree_data where id = 1
 
The result set and working table are
 id | code | pid | pcode | branch
----+------+-----+-------+--------
 1 | China | | | China

Execute recursive query

-- Step 2 performs recursion, where the data in self-referencing CT is the result of step 1
 select
 origin.id, origin.code, cte.id as pid, cte.code as pcode,
 cte.branch || '~' || origin.code
 from cte
 join demo.tree_data as origin on origin.pid = cte.id
 
 The result set and working table are
 id | code | pid | pcode | branch 
----+--------+-----+-------+---------------------
 2 | Sichuan | 1 | China | China ~ Sichuan  
 3 | Yunnan | 1 | China | China ~ Yunnan

3. Continue executing recursive query until the result set and working table are empty

4. End the recursion and merge the results of the first three steps to get the final WITH RECURSIVE result set.

Strictly speaking, this process is an iterative process rather than a recursive one, but the key word RECURSIVE was established by the SQL Standards Committee, so PostgreSQL also extends the key word RECURSIVE.

summary

Above is the whole content of this article. I hope the content of this article has some reference value for your study or work. Thank you for your support to developpaer.