In the last issue, we mainly introduced how to use tables with different splitting rules. Next, we will introduce the global table.
What is a global table?
We have a definition: for dictionary table with small amount of data (such as supermarket goods). For the retail system, for example, the supermarket commodity SKU may have hundreds of thousands of data, and the SKU will not have a large number of additions, deletions and modifications, and there will be no concurrent addition, deletion and modification. For such data features, it is very suitable for global tables.
A global table is one copy of the same on multiple partitions.
Here’s an example. The sales table is a detailed sales situation table, which is split by time and date. As like as two peas, the date on the left is 20190211, the date on the right is 20190212, and the date field is split, and the commodity table can be copied exactly to each library. Because the commodity table is not often added, deleted or modified, and there is no great concurrent pressure, it is very suitable for global tables.
What’s the advantage of such a design? When I join, I can only route data according to the distribution area of the sales table, without paying attention to the distribution of the commodity table. In this way, direct routing is correct. Let’s see how the global table is configured in dble.
Global table configuration operation
Let’s go and see schema.xml In the document, we have previously defined two global tables. You can see TB here_ Global1 and TB_ Two configurations for global2. There are two properties, a type and a datanote.
Tb_ The datanode of the global2 table is dn1-4, which points to four nodes. In this case, as like as two peas, the number of copies is four copies, and it needs to maintain a table data in the same four nodes.
In this way, if the partitions involved in other splitting tables are also the four or a subset of their ranges. When inter database inner join is used, SQL can be distributed to middleware as a whole, and the correct results can be obtained by simple merging.
Join operation of global
It’s still TB_ Mod, or the table split by modulus 4. The other global table has only two pieces of data, which are placed in each node of this table configuration. At this time, I go to join. What will happen? Let’s look at the case.
First, let’s show you the capabilities of global tables, that is, how global tables work on four nodes. Let’s take a look at the global table. We should have written two pieces of data in advance. Now I add a piece of data to see how to maintain it in these four tables. This is an insert data. The ID is 1024, which can be found. Then we go to the real database node, ports 33061 and 33062. We’ve just seen DN1 through DN4. For the 33061 example, it is dB_ 1 and DB_ 3。 Let’s look at the next data state, and we can see TB_ global_ 2 already has this data in it. Let’s switch from DB_ Three of them are also three pieces of data. Let’s change it and demonstrate from port 33062. In fact, dble is implemented through two-phase commit distributed transactions. This is the basic structure of a global table with consistent data from four nodes. So let’s go back to port 8066 to do TB_ Join of mod table and global table.
Let’s talk about inner join first. Inner join is a relatively simple case. We can join with the local database directly. The result is correct, because there are only two pieces of data now. Then let’s see if it is the same as the ordinary join step through explain. I collect all the data to the middleware node and join again. The result should be six lines. In fact, the first four SQL contents are the same, but they are routed to different nodes. The whole join is directly distributed, and then it is simply merged in the middleware.
Let’s look at the left join again. It is possible to split the table left join global. Right join is actually a left join, which, in turn, is equivalent to a global table as a left join split table. At this time, we can find that the query results are different. We have come back to the cross database join we talked about. Such an implementation method is to collect the data of all tables. Then join in middleware instead of issuing statements directly.
You can imagine why right join can’t work?
In fact, this involves MySQL semantics. If I distribute the join as a whole, because my right join needs to filter out the global tables that some split tables don’t have. If I issue the right join directly, each partition will lose some of this type of data.
Although inner join of global table and split table can be optimized, left join will return to cross database query, so you need to strictly audit SQL to avoid performance problems.
OK, let’s introduce it here.
In order to facilitate reading, some spoken words are optimized without affecting learning. The manuscript and the video will be consistent as far as possible.