On May 26, the annual PG developer conference, pgcon 2020, arrived as scheduled. Different from previous years, affected by the epidemic situation, this year’s pgcon adopted the way of online meeting. Although there was no face-to-face communication, under the careful arrangement of the organizer Dan langille, the meeting had a wider audience and was full of dry goods. Hubert Zhang (Zhang Huan), a Greenplum kernel engineer from Greenplum’s original factory, and ASIM Praveen jointly delivered a speech entitled distributed snapshot and global deadlock detector. In the speech, Hubert explained the technical route of Postgres single node deadlock and distributed deadlock detection in Postgres foreign server cluster by combining theory with examples.
Now let’s review the wonderful speech through this article!
In the era of big data, with the explosive growth of data volume, the demand for distributed database is also rising. As one of the most outstanding open source databases, Postgres is also vigorously exploring and developing distributed solutions. Among them, Postgres foreign server cluster is a very active topic about distributed Postgres in Postgres developer’s mailing list pgacker. The scheme supports logical partition tables and physical storage on multiple different Postgres nodes through foreign data wrapper and partition table technology. In order to ensure the acid of transactions in distributed environment, Postgres community is actively developing distributed transaction related patches based on foreign server cluster（ https://commitfest.postgresql… .
But for distributed systems, in addition to supporting distributed transactions, we also need to consider global snapshot, global deadlock detection and other issues. Greenplum, as the pioneer and successful representative of distributed Postgres, has mature and stable solutions in many fields of Postgres distributed execution. Therefore, Hubert, the author of this lecture, draws lessons from the principle and implementation of global deadlock detection in Greenplum, and discusses how to realize an efficient distributed deadlock detection system in Postgres foreign server cluster.
Single node deadlock principle
First, let’s look at single node deadlocks. The following figure is an example of a single node deadlock. Suppose there are two concurrent Postgres sessions, corresponding to two Postgres back-end processes. Initially, process 1 holds lock a and process 2 holds lock B. Next, process 1 acquires lock B and process 2 acquires lock a. Because a lock is usually released at the end of a transaction, a local deadlock occurs.
PostgreSQL deadlock detector
Postgres uses deadlock detector to deal with deadlock problem. The deadlock detector is responsible for detecting and breaking the deadlock. The detector uses the wait for graph to model the wait relationship between different back-end processes. The nodes of the graph are identified by the process identifier PID. The edge from node a to node B indicates that node a is waiting for the lock held by node B.
The basic idea of PostgreSQL deadlock detector is as follows:
- If the lock acquisition fails, the process will enter sleep mode.
- The sigalarm signal is used to wake up the process after a timeout.
- The sigalarm handler checks the proclock shared memory to build the wait graph. Starting from the current process, check whether there is a ring. Ring means deadlock. The current process will actively exit to break the deadlock. Postgres deadlock detector can deal with local deadlock.
Deadlock in Distributed Cluster
What about deadlocks in distributed clusters? What is the difference between a cluster and a single node?
Let’s start with an example. In the figure below, we have a cluster with one master node and two slave nodes. Suppose we have two concurrent distributed transactions. First, distributed transaction 1 runs on node a, and then transaction 2 runs on node B. Next, transaction 1 will run on node B blocked by transaction 2, so distributed transaction 1 will be suspended. At the same time, assuming that transaction 2 also attempts to run on node a blocked by local transaction 1, distributed transaction 2 will also hang. In this case, a deadlock occurs.
Note that there are no deadlocks on either node a or node B, but they do occur. From the perspective of the master node, this is called a global deadlock.
Now let’s look at a more specific example of Postgres foreign server cluster. In the figure below, we have two external servers that act as slave nodes in the previous figure. On the primary Postgres server, we create a partition table, deploy a partition on the external server a, and also deploy a partition on the external server B. Then we insert some rows, some of which are on external server a and others on external server B.
Global deadlock detector in distributed system
Next, we run the following update query on two concurrent sessions, and we can see that both sessions are suspended due to deadlock. However, the local Postgres deadlock detector on each external server cannot detect them.
So how can we solve the deadlock problem? The answer is to introduce global deadlock detector in distributed system.
In this talk, we will propose an idea about how to implement global deadlock detector in Postgres FDW cluster.However, this concept is very common and can be used as a reference for other Postgres cluster implementations. In fact, we refer to the implementation of Greenplum global deadlock detector. First of all, the global deadlock detector is implemented as the background worker of Postgres to make it more compatible with Postgres. High availability and other requirements can be realized through the background worker of Postgres. Secondly, we propose a centralized detection algorithm, which means that we only need to start a work process on the master node to collect transaction waiting relationships and detect deadlocks regularly. Note that in Postgres’s local deadlock detector, the Postgres back-end process detects the deadlock from itself. Since we use a global detector, we must perform a full wait graph search to detect deadlocks. This requires a better algorithm to detect deadlocks, because Postgres’s per vertex lookup loop algorithm is not efficient.
Global deadlock detector module
1. Waiting diagram
The global deadlock detector still uses the wait for graph to model the lock wait relationship. But different from Postgres local deadlock detection, first of all, the wait graph is based on the whole cluster, so we need to merge the local wait graphs on each external server to generate a global graph. In addition, the node in the waiting graph is no longer a single Postgres process ID, but a process group. We use distributed transaction ID to represent a node in the waiting graph.
The nodes in the waiting graph have four main properties
- Distributed transaction ID.
- Out side list
- Entry edge list
- The PID and sessionid information of the lock wait or holder.
Starting from the node is waiting for the lock, pointing to the node is the lock holder.
2. Wait for the edge of the picture
Edges in the wait graph represent lock wait relationships on any node. Edge also has four main properties
- Out of degree node, hold lock.
- Enter the node and wait for the lock.
- Edge type: not all locks are released at the end of a transaction. For example, xidlock can be released early without waiting for a distributed transaction to commit. We use the virtual edge to represent this kind of waiting relation. Corresponding to this is the lock wait relationship of real edge, which is released only after the end of transaction. Later, we will show the different treatment of these two edges in the global deadlock detection algorithm.
- Lock mode and lock type in the lock wait relationship.
Working principle of global deadlock detector
Next, let’s take a look at how the cluster handles global deadlocks through the global wait graph.
The basic idea is as follows: the background worker process on the master node periodically establishes the global wait graph by querying the cluster. Next, delete nodes and edges that are not related to deadlock. Repeat this process until you cannot delete any nodes or edges. If there are still edges, there is also a global deadlock and we need to select a session to cancel.
Next, let’s describe the above steps in detail.
To build the wait graph, we need to collect lock information on each segment. This is a two-stage process.
1. Build a global map
First, it uses the Postgres internal function getlockstatusdata to obtain the lock wait relationship from the proclock shared memory. We need to extend the lockinstancedata structure to cover the distributed transaction ID and the holdtillendxact flag. After that, the background worker process needs to collect local lock information from each foreign server and form a global lock waiting graph.
Each local lock wait graph includes the following attributes: segment ID, distributed transaction ID of lock wait and lock holder, marked as real edge or virtual edge, and other attributes, such as PID, sessionid, lock type and lock mode. It covers the four main attributes of node and edge.
2. Eliminate nodes and edges
The next step is to eliminate irrelevant nodes and edges. We use a heuristic greedy algorithm.
There are two strategies. One is greedy for global graph, which means to delete all the nodes with zero degree out and delete their corresponding edges. This is an example. In the global graph, node D is not out of degree, so it is deleted. Then, the outbound degree of node C is also changed to zero, so node C is also deleted.
Another strategy is greedy on the local graph, which means finding all the virtual edges on each local graph. If the degree of the virtual edge pointing to the node is zero, the blocking relationship represented by the dotted edge may disappear before the end of the transaction, so we can also eliminate the virtual edge.
In the example below, the degree of node C in the global graph is 1, but in the local graph of server 0, the degree is 0, so we can delete the virtual edge from node a to C.
The last step of the global deadlock detector is to break the deadlock. Centralized detector is different from Postgres local deadlock detector, the latter can only exit the current process, the former can choose to cancel any session according to the policy. Common strategies include canceling the latest session or strategies based on CPU, memory and other resources.
So far, we have introduced the overview and algorithm of global deadlock detector. Finally, let’s look at two other examples to better understand how the global deadlock detector works.
The first is data preparation, as shown in the figure below.
In the first example, there are three concurrent sessions. Session C first updates the tuple with id = 2, which will hold the XID lock on Server1. Session a updates the tuple Val = 3, which holds the XID lock on server2. Next, Session B will update the tuple Val = 3 or id = 2, which will be blocked by session a and session C on Server1 and server2, respectively. Finally, session a updates the tuple Val = 2 on Server1.
Note that when Session B is unable to acquire the XID lock on Server1, it holds the tuple lock to ensure that it can be obtained after session C releases the XID lock. Session a will be blocked by Session B on the tuple lock. Note that the tuple lock is released before the distributed transaction ends, so this is a virtual edge. The original global wait graph is in the upper left corner. You can see that there is a loop in the global wait graph.
Now, let’s see how to eliminate irrelevant nodes. First, the degree of node C is zero. We can delete the node and the corresponding edge. Now on the local waiting graph of server 1, the virtual edge pointing to point B has no degree, so it can also be deleted. After deleting the virtual edge, the out degree of node a becomes zero, which can be deleted, and finally node B can be deleted. There are no edges, so there is no global deadlock in this case.
The second example in the figure below includes three concurrent sessions. Session C will first update the tuple with id = 2, which holds the XID lock on Server1. Session a then updates the tuple of Val = 3, which holds the XID lock on server2. Session B wants to update the tuple Val = 2, which will be blocked by session C on Server1.
Next, session a wants to update the tuple Val = 2 on Server1. As in case 1 above, session a is blocked by Session B on the tuple lock and forms a virtual edge. Finally, session C will update the tuple with id = 3, which will be blocked by session a holding XID lock on server 2. The original global wait graph is in the upper left corner, and the global wait graph also contains loops.
Recall from the previous graph that the global wait graph of case 1 is the same as that of case 2, but the only difference is the local graph.
Now let’s see how to eliminate irrelevant nodes. First of all, let’s check the global graph: there are no nodes with degree zero, so there are no nodes that can be deleted. Next, we examine the imaginary edges on the local graph. From node a to node B, we have a virtual edge, but the degree of node B is not zero, so we cannot delete the virtual edge. We can’t delete any nodes or edges, so in this case elimination fails and global deadlock exists.
It can be concluded from the above situation that even if the global wait graph is the same, their global deadlock detection results will be different.
The above is the main content of this pgcon speech. In retrospect, this lecture first discusses the implementation of Postgres local deadlock detector, and illustrates that the local deadlock detector can not solve the global deadlock problem through an example, and further puts forward the ideas and problems needing attention to realize global deadlock detection in Postgres foreign server cluster. Interested readers can focus on the Greenplum Chinese community official account (GreenplumCommunity).