Live review | Japanese engineer Amit langote talks about table partition in Postgres

Time:2020-10-31

Table partitioning is a set of techniques for handling large data, as well as convenient options for managing data. In short, dividing a large table into smaller tables that are easier to manage is called partitioning.

On June 27, Beijing time, Amit langote, developer of PostgreSQL international community and engineer of EDB Japan company, was invited by China PostgreSQL branch to attend“ Postgres.Live “Cloud Technology Salon phase II” theme activity, chatting with Chinese PostgreSQL fans, followers and practitioners about the table partition in PostgreSQL.

No.1 technology sharing

In the lecture, Amit langote shared with you what partition is; the introduction of partition in Postgres, including previous partition, declarative partition, syntax, index, constraint, trigger, replication and optimization; and partition best practices and traps. The content is concise and interesting.

Amit langote mentioned that statement level triggers are allowed on partitioned tables after PostgreSQL V10, row level after triggers on partitioned tables after PostgreSQL V11, and row level before triggers on partitioned tables in the upcoming PostgreSQL V13.

In addition, until PostgreSQL V12, replication could only handle unpartitioned tables. In other words, the partition table cannot be added to the publication, and the subscription cannot receive it to the partition table. In the upcoming PostgreSQL V13, partition tables can be replicated.
Live review | Japanese engineer Amit langote talks about table partition in Postgres

No.2 Q & A interaction

In the subsequent Q & a interactive session, according to the questions raised by everyone on the spot in the chat bar, Bruce Momjian was happy to answer them one by one.

The following is a summary of the questions and answers:

Question: how big is the data table suitable for table partition, and how to select partition key?
A: usually it reaches 1TB, but some people can only become partition table when they reach 4tb, and some people will become partition table after 100GB, which depends on your application scenario. How to select the partition key is discussed in detail in my speech just now for your reference.

Q: what is the maximum number of partitions that PostgreSQL supports?
A: PostgreSQL doesn’t limit how many partitions you can create, but I don’t recommend creating more than 1000 partitions because it will affect your performance in other aspects.

Q: what business scenarios are suitable for table partitioning? Please give me an example of what you do.
A: I haven’t encountered the application scenario of table partition in my actual work. My job is to write code and write PostgreSQL. However, I have learned that many enterprises use time-related data to partition tables, which is quite common.

Q: what do you do when you have a large table and want to turn it into a partition table?
A: as I mentioned in my speech just now, there are many ways to do it, which are determined by your application scenarios. I don’t have a good indicator answer to answer this question.

Q: is there a tool to test the performance of the partition table? For example, before the actual product deployment, can we use it to do a simple performance test?
A: at present, there is no official PostgreSQL tool to test the partition table to see its performance. However, there is another tool called hypopg, which can create some fake data for testing, but it does not support the latest version of PG at present.
Hypopg address:https://hypopg.readthedocs.io…

Q: how is partitioning implemented in nginx or haproxy for high availability?
A: I don’t have a lot of suggestions in this respect. I mainly work on the kernel. I know that FDW can be used to implement fragmentation and implement high availability application scenarios together with partition tables.

Q: when my partition key is changed, my data does not belong to that partition. How does PG handle this?
A: in the previous version of PG, errors were reported directly. In the latest 12 and 13 versions, it will automatically do a partition key migration for you.

Q: does the hash partition of PG consider data distribution balance? Do users need to do it themselves?
A: the default hash algorithm used by PG can allocate data uniformly in theory. If you are not satisfied with the default hash algorithm, there is an option to use the custom “hash operator” class. PostgreSQL partitions allow users to determine what method (operator) to use for data distribution, whether range, list or hash.

Q: is there a tool to create partition tables automatically? Is there any other way than Shell Scripting?
A: Yes, you can refer to one called “PG”_ Partman “. It can create and manage partitions to make your operations easier.

Q: will PostgreSQL implement and design its distributed support based on current partition tables? For example, partition table + FDW.
A: now you can create similar settings with FDW plug-ins and partitions, but you can’t expect all the parts to work, such as you don’t have distributed atomic transactions, distributed snapshots, etc. You can implement a read-only partition across cluster silos, which may be better.

Question: how can the primary key limit be added to the partition table?
A: you can put the “primary key” in the partition table. Note that the value of “primary key” must be consistent with the partition key.

Q: what is the difference between PG and oracle? What are the advantages and disadvantages of each?
A: I don’t know much about Oracle. Oracle has been doing partition for a long time, and its user experience will be better. If you used Oracle partition for a long time, you may not be used to PG partition at first. But the basic functions are the same.

Q: does the partition table support online scale out? If so, can you tell us how to implement it to ensure that online business is not affected.
A: you can use FDW module and table partition to do the extension function. It is recommended to do read-only FDW extension.

Question: do partition keys have to create unique primary key indexes? Can indexes be different between partitions?
A: you can create the partition key of each partition as “primary key”, but I don’t think it is of great significance. Usually, we create the “primary key” on the secondary partition, so that each sub partition can achieve its effect. To create a “primary key”, you should make sure that it is consistent with your partition key.

Q: what is the basis of partition pruning in pg11?
A: in fact, there is no complicated logic. To put it simply, it is not necessary to scan the partitions that do not need to be scanned.

The above are the on-site questions and answers collected by Xiaobian.
Live review | Japanese engineer Amit langote talks about table partition in Postgres
Finally, I would like to thank Amit langote for being invited to participate in the“ Postgres.Live Secondly, thank volunteers Cary Huang and David Zhang for their accurate Chinese and English real-time translation, and finally thank PostgreSQL fans and followers from all walks of life for their participation and support from PostgreSQL practitioners.

Active playback video address:https://www.bilibili.com/vide…
How to get ppt activity:
Concerned about the official WeChat official account of China PostgreSQL branch and reply to “zoning”.

For more wonderful content, please pay attention to the following platforms and websites:

Official account of China Postgre SQL chapter (technical articles and technical activities):
PostgreSQL branch of Open Source Software Alliance

China postgre SQL branch technical Q & a community:
www.pgfans.cn

Official website of China postgre SQL branch:
www.postgresqlchina.com

Recommended Today

Introduction to pyzmq

Introduction to pyzmq ZMQ (hereinafter referred to as ZMQ) is a simple and easy-to-use transport layer, like a socket library framework, it makes socket programming more simple, concise and higher performance. Is a message processing queue library that can be flexibly scaled between multiple threads, kernel and mainframe. ZMQ’s clear goal is to “become part […]