What is the skill tree of a distributed storage engineer?

Time:2022-5-6

Many engineers who have just entered the storage industry or want to switch to the distributed storage industry often have confusion, that is, “what is the skill tree of a distributed storage engineer?”

So we raised this question and went to our R & D experts.

What is the skill tree of a distributed storage engineer?

Let’s start with the topic: “what’s the skill tree of a distributed storage engineer?” We believe that storage in a broad sense should include databases, but distributed databases are generally “unique”: they are unique in technical characteristics, technical difficulties and application scenarios.

The discussion of distributed database can be a separate topic, so let’s focus on distributed storage outside of distributed database.

The topic of distributed storage is still very big. From the perspective of access semantics, it is generally divided into object storage, block storage and file storage.

Let’s start with object storage. Its application scenario is relatively narrow. It’s not as good as block storage. It’s the cornerstone of cloud computing (in the IAAs era, if you create a virtual machine, you have to have a hard disk). It’s not as long as file storage and has a wide range of applications. This is mainly determined by the access semantics of the upper applications. Most of the upper applications access semantics through virtualized block storage or file POSIX semantics.

In terms of technical implementation, AWS S3 is the de facto standard for object storage. Objects do not support change and can accept data inconsistency between short-term replicas.

Object storage is characterized by the pursuit of low cost, and the implementation of EC is often necessary.

Next, we discuss distributed block storage and distributed file storage. Their commonness is generally the pursuit of high performance and strong data consistency. Distributed block storage generally provides virtual hard disks for virtual machines, that is, EBS products of major public clouds. In most application scenarios, it is only used by one client. The distributed file system is often used by multiple clients concurrently, which will face concurrent and massive file reading and writing, and the technical difficulty coefficient is relatively high.

Our yrcloudfile is positioned as a high-performance distributed file system, so we focus on discussing the skill tree of distributed file system engineers. Since each part can be discussed separately, here is only a brief introduction.

We consider the skill tree backbone of distributed file storage Engineers:

1) Media:

Design based on HDD is one thing, and design based on high-performance SSD is quite another. For example, it’s enough to use ordinary synchronization threads based on HDD. Based on SSD, synchronization threads don’t run well with hardware. If IO polling is required, libaio is not easy to use. Use the latest io of the kernel_ Whether the running stability is OK or not is a skill to be used when designing distributed file storage for SSD. On the other hand, the low latency of the device also highlights the performance loss of thread context switching.

2) Metadata schema:

Is it a metadata structure or a distributed structure without metadata.

Through comparison and selection, we chose the one with yuan. How to segment the directory tree? How to establish and maintain the cluster membership relationship? Under various abnormal network conditions, can each member of the cluster reach an agreement on whether a node is dead or alive? This is also what distributed file storage engineers need to master in their skill tree.

3) Reliability and consistency:

In order to prevent a disk from completely breaking, fault tolerance is required. Generally, copies are made, either multiple copies or EC. How to ensure the consistency of data between replicas?

4) Efficiency:

Many customers are used to using the Linux local file system. Because there is pagecache, even if the hard disk is HDD, customers generally use it very smoothly. After switching to the distributed file system, the data is distributed. The data may be local or remote. Can the performance meet the needs of customers? How to guarantee?

Distributed file systems generally use various caches, such as client caches. How to ensure cache consistency among clients?

We believe that distributed file storage has both theoretical and engineering challenges. In recent years, domestic technology has made rapid progress and technology sharing is also very comprehensive. There are many key theories of distributed storage on the Internet, but it needs new friends to sort out the context, so as to establish their own knowledge tree.

The Internet not only leveled the world, but also leveled the knowledge boundary. You can always find technical information about a sub topic, but the difficulty is that you need to know that there is a sub topic, and the difficulty is that you need to know what information to find.

Therefore, we believe that for senior distributed storage engineers, the challenge is mainly excellent engineering practice, mainly how to make the module simpler, more reasonable and higher performance.

For those who are new to the industry and want to be distributed storage engineers, or more specifically, friends who want to be distributed file storage, they need to have the opportunity to experience the actual process of building distributed storage. The challenge is mainly how to build a correct knowledge tree – which is also the goal of this topic.

According to our experience, it takes a long time to build such a knowledge tree alone and requires high personal comprehensive ability.

Therefore, we highly recommend you to participate in a distributed storage project.

What is the skill tree of a distributed storage engineer?

It can be an open source project or join a distributed storage company.

Recommended Today

React explanation – parent component calls child component content [updating]

preface This article belongs to react communication > parent-child communication > parent component calling child component.Scenario of parent component calling child component: Sub components are used in multiple places and need to be encapsulated separately The logic of sub components is heavy, and the cost of using fully controlled mode is high Using parent components […]