Wonderful review l Rust chat room: Xline cross-data center consistency management

Time:2022-11-24

On October 15, 2022, Datan Technology and the Rust language Chinese community cooperated in the Rust chat room event, hosted byShi Jicheng, co-founder of DatenLordI shared about open source distributed storage technology, focusing on Datan Technology’s new open source project Xline, how this cross-cloud metadata (metadata) KV storage product can achieve high-performance cross-data center data consistency management of.

Introduction

Shi Jicheng first introduced the motivation for making the Xline product. Originally 2021 UCBerkeley proposedSky Computing (“Sky Computing”)the concept of. Different from the cloud computing (“cloud computing”) that we are all familiar with, sky computing, as its superficial meaning says, is that there are many clouds in the sky, and it is to solve the problem of cross-cloud. How to break the gap between different clouds, or get through and maximize the use of cross-cloud data is a difficult problem we are currently facing, and it is also our goal. When talking about cloud computing nowadays, we don’t need to consider cloud resource deployment, scalability and other issues, because cloud vendors have already helped. However, once cross-cloud, no matter the migration of computing power or data migration, it is a difficult problem to be overcome before our eyes. What Datan Technology is doing is to solve the “How to do cross-cloud data interaction“, we are committed to spreading the distributed system from the data center to the global scale, so that even if a single data center goes down, it still does not affect users’ use of data.

The most important thing about cross-cloud storage is to solve the problem of consensus consistency. With the growth of business volume, the number of servers has also increased from one at the earliest to multiple later. Although the computing power can be separated so that requests are not concentrated on one server, it also brings challenges, namely how to ensure data consistency between different servers, and how to preserve previous decisions when a few servers are down. There will not be a situation where “the public says the public is right, and the woman says the woman is right”, but will always retain a consistent result and ensure high availability, which is what the consensus algorithm wants to achieve.

Immediately, Shi Jicheng introducedRaft and Paxos algorithmsconcept, briefly introduced the development and operation process of the two. Their ideas are similar to the message passed, but the latter does not have a definite leader. As far as how many messages are transmitted, there is no big difference. Both of them will pass two Round Trip Time (2RRT) to complete a consensus protocol request.

In the case of serious latency problems between global data centers, the multi-raft solution came into being. It does not delve into the protocol, but only data slicing. While this case seems perfect, there are still problems. For example, if a data center area is not available for some reason, the raft group running in the data center cannot respond to the request, and the corresponding data will not be available. Is it possible to solve this problem by backing up the data in different data centers? can solve the problem. But as mentioned earlier, it takes 2RRT to complete a request. If placed in different data centers, Latency is too large. So we don’t have to ask“Is it necessary to have 2RRT?”

This problem prompted everyone to re-examine the Raft protocol. Need to reviewWhat the hell is the reason we have two RRTs? One is to determine the position of the request in the log, that is, the global order, and the other is to distribute the request to followers to ensure that data is not lost. The ability of the latter RRT as a consensus algorithm cannot be abandoned, so we set out to improve it from the former. For reasons of global order, the concept of MVCC has to be mentioned. MVCC requires the system to maintain a physical or logical clock, and the version of the global sequence is a logical clock to some extent. The log id of the consensus protocol can be regarded as the version. At this time, the problem seems to have not been resolved, but if there is no contention, then there is no need for version protection. paper”Exploiting Commutativity For Practical Fast Replication“It happens to be this principle, that is,CURP protocol. The advantage of CURP is that on the one hand, in the case of no conflicts, one RRT can complete the request; on the other hand, in the case of conflicts, it can degenerate into a back-end protocol, that is, Raft or Paxos, etc., and finally guarantees that there is no request. Has a global version.

Our product: Xline

Based on the CURP protocol,Datan Technology developed our product Xline, which is a metadata storage interface compatible with the etcd interface. Shi Jicheng finally proposed that our ultimate hope is that we can replace etcd to achieve better performance in the case of cross-cloud deployment, and at the same time be able to separate the CURP protocol, which can be reused by others later. Right now the Xline project is still in an early state, with several interfaces and basic tests done. However, the interface still needs to be improved, and the follow-up stability needs to be iterated. We manually built a container environment to simulate cross-cloud scenarios, and manually increased the delay of the network link. When the Client and Leader are together, the superiority of Xline in terms of latency cannot be reflected, but when the Client and Leader are not in the same data center, Xline’s The advantage is quite obvious, and this test result is also in line with our initial discussion of the principle.

Xline project is written in Rust language, welcome everyone to participate in our open source project, GitHub link:https://github.com/datenlord/… 

To watch the Rust Chat Room-Xline: Cross-Data Center Consistency Management, please click the following link at station B:
https://www.bilibili.com/vide…