Micro course | lesson 5 horizontal split

Time:2020-11-21

https://v.youku.com/v_show/id…

In the last installment, we demonstrated the directory structure after dble is installed. Next, we will introduce the second basic function.

Micro course | lesson 5 horizontal split

Principle of horizontal split

The core function of dble is to split data horizontally. First of all, let’s talk about data splitting. If you don’t know how data splitting works, I’ll briefly introduce it here. Originally, our table is complete, but this one is too large, with more than 100 million data. Single table query or add, delete or modify, will encounter great problems.

Micro course | lesson 5 horizontal split

In the diagram, a routing algorithm f is used to route a split field to other library tables. Through such a reasonable algorithm split into multiple tables, the number of single tables will decrease, and the operation mechanism will be improved. Then if you need a variety of horizontal expansion nodes, the algorithm can be used to control it. Next, let’s take a look at how we configured the three XML configuration files.

Three XML configurations

https://v.youku.com/v_show/id…

1、rule.xml
rule.xml As the name suggests, it is the configuration of a splitting rule. rule.xml There are two large subitems: one is called table rule and the other is called function. The number of table rules is the number of splitting rules; the number of functions is the specific number of splitting rules. Let’s look at the specific modular algorithm. You can check the specific configuration in the document, but we will not introduce it more. In general, the way of configuration is to define the name, define the algorithm, and define the modulus cardinality. We all know such a concept, what kind of concept is the specific hush algorithm, because the document is more detailed, so I will not repeat it here. The function name here will be used in the configuration process, so the dependency relationship needs to be concerned. Now let’s go back to the table rule, which is actually the relationship between the column and the splitting algorithm. For example, our modular algorithm actually refers to the function just mentioned, and there is also a column. Columns is actually the module splitting of 4 by ID. So we can define it rule.xml How to use this algorithm? We can get there schema.xml Go up and see.

2、schema.xml

schema.xml There are about three parts in it. The first is the schema defined library, and then there are tables under the library. In the table, there are attributes called rule and datanode. The definition of datanode is shown below. Rule defines the type of our rule. Datanode defines several nodes, which are the references of datanote in the second layer of XML. Then datahost 1-2 corresponds to the next four databases. The following are two database instances. The datanodes we built are deployed on different instances. In this way, my table structure, split algorithm and my real database instance are established. Our configuration is divided into three layers: schema / datanote / datahost. Through the datahost instance, the next actual library is mapped to datanodes, and then pointed to these datanodes through the splitting algorithm just now. In this way, the basic split algorithm is completed.

3、server.xml

server.xml First, there is the system item, which is some system and function parameters. Most of them do not need to be configured by default, and some of the more concerned functions are enabled. Then the next important thing is whether some optional functions, such as IP, port and some basic functions, need to be enabled, such as compression protocol function, some slow query, global table consistency check, log recording configuration, XA transaction related configuration, etc., are all put here. This part of the document also describes in more detail. In addition to the systerm configuration, we also have a firewall configuration, which we will describe in detail later. Next, let’s take a look at the user‘s configuration. We just saw a user on the management side. The user on the management side is distinguished by a manager keyword. There are also ordinary users. Ordinary users can configure the schema to control which schemas these users can access, such as a schema called testdb. If I delete one of them here, the deleted schema will not be visible from the login of this user, which is a bit similar to MySQL. Manager users cannot configure schema. Because of the security problem, our users will be encrypted. If there is no encryption before, we will encrypt it. The password position is a string of ciphertext, and there are some readonly switch configurations to control whether it is read-only. This is it. server.xml Yes.

summary

Let’s go back. First rule.xml The split algorithm is defined. then schema.xml The relationship between the schema is defined. In the schema, table will use the splitting algorithm to route to the datanode. Datanote is the database under the instance, while datahost is directly mapped to the instance. Our different databases come back to our table as Shards. last server.xml The basic parameters and user information of dble are defined. To emphasize, the dble project inherits from MYCAT. The configuration is slightly unreasonable, but in order to be compatible with MYCAT, the user configuration is not changed from server.xml Take it out. In the future, it is possible to make great changes in the organizational structure of XML to make it more reasonable. OK, let’s introduce it here.

https://actiontech.github.io/…

In order to facilitate reading, some spoken words are optimized without affecting learning. The manuscript and the video will be consistent as far as possible.

Dble and related project code address:

https://github.com/actiontech…
https://github.com/actiontech…
https://github.com/actiontech…