Powerdotnet platform software architecture design and implementation series (05): etcd distributed key value storage platform

Time:2021-12-27

Etcd is currently used in powerdotnetRegistration Centerandconfiguration management(the common configuration center is only a small module in powerdotnet). As an important part of infrastructure, etcd is of great importance.

This paper briefly summarizes and introduces some personal experience in developing, using and managing etcd.

Etcd was born in coreos. It was originally used to solve the problems of distributed concurrency control of OS upgrade and storage and distribution of configuration files in cluster management system.

according toofficialExplanation (a distributed, reliable key value store for the most critical data of a distributed system), etcd is a distributed and reliable key value pair storage system used to store key data in the distributed system.

For those familiar with zookeeper, the functions provided by etcd are very similar to zookeeper. For example, etcd is a general consistent meta information storage, provides a watch mechanism for change notification and distribution, and is also used by distributed systems as shared information storage.

In fact, it is said that etcd is influenced by zookeeper anddoozerdIn addition to having similar functions, a project inspired by it focuses on the following four points:

1. Simple: the API based on HTTP + JSON makes it easy to use curl

2. Security: optional SSL client authentication mechanism

3. Fast: each instance supports 1000 writes per second

4. Trusted: using raft algorithm to fully realize distributed

The following two paragraphs introduce the basic concept and working principle of etcd, mainly copying books. Thank you very muchThis articleThe author of the article is very helpful to improve personal understanding.

1、 Etcd composition

Etcd is mainly divided into four parts:

1. HTTP server: used to process API requests sent by users and synchronization and heartbeat information requests of other etcd nodes.

2. Store: used to handle transactions of various functions supported by etcd, including data index, node state change, monitoring and feedback, event processing and execution, etc. it is the specific implementation of most API functions provided by etcd to users.

3. Raft: the concrete implementation of raft strong consistency algorithm is the core of etcd.

4、 WAL:Write Ahead Log (pre written log) is the data storage mode of etcd. In addition to storing the status of all data and the index of nodes in memory, etcd is persistent stored through wal. In wal, all data will be recorded in advance before submission. Snapshot is a status snapshot to prevent excessive data; entry indicates the specific log contents stored.

Usually, a user’s request will be forwarded to the store via HTTP server for specific transaction processing. If node modification is involved, it will be handed over to the raft module for status change and log recording, and then synchronized to other etcd nodes to confirm data delivery. Finally, submit the data and synchronize again.

As a highly available key value storage system, etcd is naturally designed for clustering. Etcd has three configuration schemes for cluster startup: static configuration startup, etcd service discovery, and service discovery through DNS.

As we can see from the figure, an etcd cluster, It usually consists of three or five nodes (since the raft algorithm requires the votes of most nodes when making decisions, etcd generally deploys clusters to recommend an odd number of nodes, and the recommended number is 3, 5 or 7 nodes to form a cluster). Multiple nodes complete distributed consistency collaboration through the raft consistency algorithm. The algorithm will elect a master node as the leader, and the leader is responsible for data synchronization and counting Distribution according to. When the leader fails, the system will automatically select another node as the leader and complete the data synchronization again. In multiple nodes, the client only needs to select any one of them to complete data reading and writing. The internal status and data collaboration are completed by etcd itself.

Here is a brief summary of the important concepts of etcd:

Raft: the core of etcd, an algorithm to ensure strong consistency of distributed systems.

Node: an instance of the raft state machine.

Member: an etcd instance that manages a node and can provide services for client requests.

Cluster: an etcd cluster composed of multiple members that can work together.

Peer: the name of another member in the same etcd cluster.

Client: the client that sends HTTP requests to the etcd cluster.

Wal: pre written log, which is the log format used by etcd for persistent storage.

Snapshot: etcd is a snapshot set to prevent excessive wal files and store etcd data status.

Leader: nodes generated by election in raft algorithm to process all data submissions.

Follower: the failed node acts as the slave node in the raft to provide strong consistency guarantee for the algorithm.

Candidate: when the follower fails to receive the leader’s heartbeat for a certain period of time, it changes to candidate to start the campaign.

Term: a node becomes a leader until the next election, which is called a term (term).

Index: data item number. Term and index are used to locate data in raft.

2、 Etcd common usage scenarios

1. Service discovery

2. Message publishing and subscription

3. Load balancing

4. Distributed notification and coordination

5. Distributed lock

6. Distributed queue

7. Cluster monitoring and leader election

With so many common application scenarios, etcd is powerful. Now it can completely replace zookeeper with complex operation and maintenance management in most Internet applications.

3、 Etcd management

After the introduction of basic concepts, we finally look forward to an exciting practice link ^ ^ ^ ^.

To reuse etcd functions, we need to develop powerful and flexible background dynamic configuration management etcd. Of course, we can also use ready-made tools such asetcd-managerEtcd management.

1. Etcd cluster

2. Etcd server

For each etcd server in the cluster, it can accurately locate, make statistics, add, delete and modify data.

For example, statistics:

View key value pairs:

For example, the configuration of the configuration center

View key value pair details

Or API interface application deployment information

Another example is some common tools

Such as cluster:

User roles and permissions

Dynamic tools, etc.:

The management background has made targeted development, improvement and Optimization for common application scenarios. Therefore, it has also developed etcd client (power. Etcd):

This client project has the opportunity to sort out open source, mainly because some special logic and dependencies of powerdotnet project need to be cleaned up.

3. Etcd grouping

For large and medium-sized enterprises, there are still many use scenarios of etcd, and overall management must be carried out, otherwise there will be operation and maintenance and management problems sooner or later.

Powerdotnet perfectly supports etcdroute binding according to system and application to achieve group management.

Define etcdroute first. An etcdroute is equivalent to a distributed key value pair grouping:

Bind etcdroute to specific applications, so that the business system can automatically obtain the distributed key value pair capability as long as it calls the encapsulated public SDK (the distributed key value pair CLIENT SDK is named power. Etcd). It doesn’t even need to write any configuration. It’s really easy to click a button in the configuration center to get everything done.

With the etcd distributed key value pair management platform, etcd can be reused to the greatest extent, the accessed applications are easier to manage, operate and maintain, and the troubleshooting and positioning of problems are more convenient, concise and efficient.

reference resources:

https://etcd.io/

https://github.com/etcd-io/etcd

https://www.infoq.cn/article/etcd-interpretation-application-scenario-implement-principle

https://www.cnblogs.com/alisystemsoftware/p/12016601.html

https://etcdmanager.io/

https://blog.csdn.net/shlazww/article/details/38736511

https://www.jianshu.com/p/5aed73b288f7

http://thesecretlivesofdata.com/raft