Registration center of Betta live cloud native practice

Time:2022-4-20

author

Kong Lingzhen, chief architect of douyu, is fully responsible for the planning and construction of the technical architecture system of douyu whole station. He has more than 10 years of experience in medium and large-scale Internet product architecture, and is good at architecture and scheme design in high concurrency and high availability scenarios.

Yu Jing, an operation and maintenance expert of Betta technical support, is responsible for the construction of high availability infrastructure of Betta. He is good at technical fields such as registration center and monitoring system. At the same time, he is also the person in charge of basic support of Betta multi activity.

Tang Cong, senior engineer of Tencent cloud, author of geek time column “etcd practical course” and active contributor to etcd, is mainly responsible for product R & D and design of Tencent cloud large-scale k8s / etcd platform, stateful service containerization, offline mixing department and so on.

Chen Peng, an architect of Tencent cloud container service products, has focused on the field of cloud native for many years, helped a large number of users transform cloud native containers and implement production, has rich front-line practical experience, and has also published a large number of cloud native technology articles.

Business background and pain points

As the industry-leading game live broadcasting platform, Betta live provides high-quality game live viewing, interaction and entertainment services for hundreds of millions of Internet users every day.

With the popularity of the live broadcast market in recent years, as an Internet company with good reputation and experience in the industry, the number of users of Betta live broadcast platform has also experienced a blowout growth. The stability and technical challenges brought by a large number of users to the platform are becoming stronger and stronger. The old architecture of Betta is shown in the figure below. There are certain risks and hidden dangers in both business support and architecture design.

Douyu old architecture

Figure 1 old structure of douyu

In order to bring better usability Experience to users, Betta urgently needs to solve the problem of single data center and upgrade the old architecture from single data center to multi data center.

Multi data center challenges

In the process of upgrading from single activity to multi activity, we face a series of challenges in order to ensure trouble free migration and upgrading, such as:

How can stateful services such as etcd and zookeeper synchronize with multiple data centers?
There is a complex tree or network dependency between applications. Where should we start the migration?
According to what dimension to divide the boundary of the target, how to avoid the business welding together, resulting in a situation where there is no way to start?
If there is a problem after the migration, how to recover quickly without involving the successfully migrated business?
Since there are many systems involved in the process of upgrading from single live to multi live, this article will be the first in the series of transformation of Betta live broadcast and multi live, which only focuses on the registration center module. Therefore, we will first introduce you to etcd and zookeeper behind the registration center.

The role of ZK / etcd

Dubbo solves the problem of service registration and discovery in large-scale clusters through the registry. The following is the architecture diagram of the registry:

Dubbo supports zookeeper registry by default. Although etcd is also implemented in the new version, it still lacks the precedent of large-scale production. It is also rare for Java technology stack to use etcd as the registry.

When zookeeper is used as the Dubbo registration center, the registration relationship is a tree structure, and the detailed structure is shown in the following figure:

Because zookeeper stores data based on the tree structure similar to the file system, but etcd uses key value pairs. The difference between the two will bring great difficulties to the synchronization of registration relationship.

In addition, if you migrate from zookeeper to etcd, during the whole migration process: the existing online services cannot be damaged, let alone stopped; If the migration fails, you can also go back to zookeeper.

New architecture of double living and multi living in the same city

In order to realize multi activity, we have successfully solved the above challenges through technical means and operation and maintenance concepts such as cross data center synchronous service, service dependency sorting and boundary division, controllable change, and designed the following new architecture to realize multi activity, as shown in the figure below:

Figure 2 new architecture of douyu multi activity

Under the new architecture, fine-grained traffic can be scheduled by domain name or even URL. The RPC layer also has the ability to automatically call nearby. The local architecture of the registry is as follows:

Figure 3 old structure of Betta Registration Center

Selection and objectives of multi activity scheme of Registration Center

In the process of multi activity transformation of the registration center, we are faced with several schemes, as shown in the table below:



Due to historical reasons, we have zookeeper (hereinafter referred to as zk) and etcd registration centers. In addition, we have four technology stacks: Java, go, C + +, PHP. Therefore, there are still some deficiencies in the field of registration centers. We hope to unify etcd to solve the pain points and achieve the following objectives:

Reduce maintenance costs: previously, it was necessary to operate and maintain ZK + etcd two registration centers. What is more difficult is that ZK + etcd also needs to be adapted when doing multi activity solutions, which doubled the R & D cost of multi activity of the registration center. As etcd is a part of k8s, operation and maintenance etcd is inevitable, which is the first reason for choosing etcd.

Embrace a more prosperous ecosystem: etcd has cloud native hosting solutions. Some manufacturers manage 10K node level k8s clusters through etcd. Etcd also comes with various peripheral tools such as proxy, cache and mirror. Dubbo on the Java side also supports etcd as the registration center. Etcd has better development prospects than ZK. This is the second reason why etcd is selected.

Enhance cross language capability: etcd can communicate based on HTTP or grpc protocol, and supports long polling, with strong cross language capability. ZK needs to introduce a special client. In addition to Java client, other language clients are not mature. We have four R & D languages: Java, go, C + +, PHP, which is the third reason for choosing etcd.

For the above reasons, we chose scheme 4. The four new architectures of scheme 4 are shown in the figure below:

Figure 4 new architecture of Betta Registration Center

Difficulties and challenges of multi activity of Registration Center

In order to realize the new registration center and achieve our expected design objectives, the registration center faces the following difficulties and challenges in the transformation process:

How to solve ZK’s multi data center synchronization problem? In particular, the zookeeper watch mechanism is unreliable, and the problem of losing watch events may occur? (correctness)
How to solve the problem of multi data center synchronization of etcd? From the following scheme selection, we can see that the community does not have any mature and available solutions in the production environment. (correctness)
How to solve the performance problem of cross data center reading? (performance)
How to solve the problem of service stability across data centers? What if the network link, such as the intranet dedicated line, is interrupted? In terms of synchronization service design, will etcd / ZK synchronization service enter the full synchronization logic with extremely slow performance, and whether the synchronization service itself has high availability? How should we design test cases for disaster recovery testing? In terms of operation and maintenance, how can we quickly find hidden dangers, eliminate potential faults, and build a visual and flexible multi activity operation and maintenance system? (stability, operability)

Analysis on the difficulties of multiple activities of Registration Center

How to ensure the interworking of old and new services during the migration process?

Develop zk2etcd

Many of our businesses developed in Java use the Dubbo framework for service governance. The registry is zookeeper. We hope that all businesses developed in Java and go use etcd as the registry, which also paves the way for the possibility of cross language invocation.

Due to the large number of businesses, the transformation and migration cycle will be very long, which is expected to last for 1-2 years. In this process, we need to synchronize the registered data in zookeeper to etcd in real time, and ensure the data consistency and high availability. At present, there is no tool to meet our needs in the market, so we cooperated with Tencent cloud tke team to develop a zk2etcd to synchronize zookeeper data to etcd, And it has been open-source, and the overall scheme will be introduced in detail.

How to realize etcd remote disaster recovery?

Through zk2etcd synchronization service, we have successfully solved the problem of zookeeper data migration, so that the registry data of new and old businesses are stored by etcd.

Therefore, the importance of etcd is self-evident. Its availability determines our overall availability, and the current deployment architecture of Betta live depends heavily on a core computer room. Once the core computer room fails, it will lead to the overall unavailability. Therefore, the next pain point of Betta live broadcast is to improve the availability of etcd, hoping to achieve etcd cross city disaster tolerance and remote disaster tolerance.

The ideal etcd cross city synchronization service of Betta live broadcast should have the following characteristics:

After etcd cross city disaster recovery deployment, the read-write performance does not decline significantly, which can meet the basic demands of business scenarios.
The synchronization component reaches the availability level of the production environment and has complete consistency detection, logging, metrics monitoring, etc.
Businesses that do not have strong requirements for data consistency can access nearby etcd cluster services in the same region, and businesses with strong demands for consistency can access the main etcd cluster.
After the main cluster fails, the business operation and maintenance can quickly promote the standby cluster to the main cluster according to the consistency monitoring.
So what are the options? What are the advantages and disadvantages of each scheme? Finally, the following schemes are evaluated:

Single cluster multi location deployment scheme

Etcd community make mirror scheme
Etcd community learner program
Tencent cloud etcd syncer solution
Single cluster multi location deployment scheme
The single cluster multi location deployment scheme is shown as follows:

In this scheme, the etcd leader node copies the data to the follower node in each region through the raft protocol.

The advantages of this scheme are as follows:

After the regional networks are interconnected, the deployment is simple and there is no need to operate and maintain additional components

Data is strongly synchronized across cities. In the 3-node deployment scenario, any city failure can be tolerated without losing any data

After introducing its advantages, let’s look at its disadvantages, as follows:

In the scenario of 3-node deployment, any write request requires at least two nodes to respond and confirm. When different nodes are deployed everywhere, the Ping delay will rise from a few milliseconds to about 30ms (Shenzhen Shanghai), which will lead to a sharp decline in write performance.

The default read request of etcd is linear read. When the follower node receives the read request initiated by the client, it also needs to obtain relevant information from the leader node. After confirming that the local data catches up with the leader, it can return the data to the client to avoid reading old data. In this process, it will also lead to the increase of etcd read delay and the decrease of throughput.

The quality of cross city deployment networks is also easy to fluctuate, resulting in service quality jitter, etc.

For the configuration of client accessing etcd cluster, in order to prevent single point of failure, multiple etcd nodes must be configured, which may lead to client accessing remote etcd nodes and increasing service request delay.

Etcd community make mirror scheme

After introducing the single cluster multi location deployment scheme, let’s take a look at the make mirror scheme provided by etcd community. Its schematic diagram is as follows:

In this scheme, we have deployed a set of independent etcd clusters in different cities to realize cross city data replication through the make mirror tool provided by etcd community.

The principle of make mirror tool is as follows:

After specifying the prefix of data synchronization, traverse all data under this prefix from the main cluster through the etcd range read interface and write it to the destination etcd. (full synchronization)
Then specify the “version number” returned by the read request through the etcd watch interface to listen for all change events after this version number.
After receiving the key value change event pushed by the primary etcd cluster, make mirror writes the data to the hot standby cluster through the TxN transaction interface. (incremental synchronization)
The advantages of this scheme are as follows:

The main etcd cluster has high read-write performance and is not affected by cross regional network delay and network quality fluctuation as a whole
If the business can tolerate short-term inconsistency, the nearest etcd cluster can be accessed nearby
If the service requirements are strongly consistent, the main etcd cluster can be accessed through the intranet dedicated line
Do not rely on higher versions of etcd
After introducing its advantages, let’s look at its disadvantages, as follows:

When the write request is large, the standby cluster may have some data lag and may read dirty data.
After the make mirror synchronization link brought by the community is interrupted, the exit and restart will enter the full synchronization mode again, with poor performance and unable to meet the demands of the production environment.
The community built-in make mirror tool lacks a series of features such as leader election, data consistency detection, logs, metrics and so on, and does not have the availability of production environment.
Synchronization of non key value data, such as auth authentication related data and lease data, is not supported.

Etcd community learner program

After introducing the make mirror scheme of etcd community, let’s take a look at the learner scheme provided by etcd community. Its schematic diagram is as follows:

Its core principles are as follows:

The etcd raft algorithm library has supported the learner node since 2017. For details, please refer to PR 8751.
In version 3.4 launched in August 2019, etcd community officially supports the learner node. It joins the cluster as a non voting member node, does not participate in voting such as cluster election, and only performs data replication.
After receiving the write request, the leader synchronizes the log to the follower and learner nodes, and uses a data structure called progress in memory to maintain the log synchronization progress information of the follower and learner nodes.
When the data gap between the learner node and the leader is small, it can be promoted to a voting member node to join the cluster.
The advantages of this scheme are as follows:

After the regional networks are interconnected, the deployment is simple. You only need to add a learner node to the etcd cluster without operating and maintaining additional components
The learner node can synchronize any type of data, such as key value, auth authentication data and learn data
After introducing its advantages, let’s look at its disadvantages, as follows:

The learner node only allows serial reading, that is, if the service reads nearby, it will read the old data.
Relying on the higher version of etcd, etcd 3.4 and above supports the learner feature, and only one learner node is allowed
After the overall failure of the primary cluster, the learner node cannot be quickly promoted to a writable independent etcd cluster.
After introducing several existing schemes, we found that none of them can meet the demands of the business production environment, so we completed the implementation of etcd synchronization services available in the production environment by ourselves, which will be introduced in detail in the chapter of overall scheme implementation.

How to ensure the stability and operability of etcd and ZK synchronization services?

In order to ensure the stability of etcd and ZK synchronization services, five types of common faults are simulated to test the self-healing ability of the service under these typical fault scenarios. The detailed test scheme is as follows.

Fault scenario

Redis flash outage (zk2etcd service dependency), such as redis version upgrade and non smooth capacity expansion.

Zk2etcd offline, such as oom, container eviction, host failure.

Etcd2etcd offline, for example: oom, container eviction, host failure

Network flash off, such as oom, container expulsion, host failure.

Weak network environment, for example: temporarily replace with public network after the special line is cut off.

The actual trigger causes of the above five scenarios are diverse, and only one situation needs to be simulated.

Drill scheme

Redis flash off: simulate that redis is unreachable by changing the host, and the automatic correction stops at this time; After simulated redis recovery, automatic correction will also be restored automatically.

Zk2etcd offline: simulate zk2etcd hanging up by killing the container node, and k8s automatically pull up within 15 seconds. After pulling up, the synchronization is normal and the data is consistent.

Etcd2etcd offline: simulate zk2etcd hanging up by killing the container node, and k8s automatically pull up within 15 seconds. After pulling up, the synchronization is normal and the data is consistent.

Network flash off: change the host to simulate ZK and etcd inaccessibility. At this time, the synchronization is interrupted, and then remove the host to simulate network recovery. After recovery, the synchronization is normal and the data is consistent.

Weak network environment: simulate the weak network environment by switching to the public network. After switching to the public network, the synchronization efficiency is reduced within 4 times, and one full synchronization can still be completed in 1 minute.

In addition, for the problem of operability, both etcd and ZK synchronization services provide detailed metrics and logs. We have configured visual observation views and alarm strategies for each core scenario and abnormal scenario.

Overall scheme implementation

Overall architecture

The composition of etcd cluster multi movable frame is as follows:

explain

Black solid line: special line access under normal conditions

Black dotted line: access through public network

Red solid line: dedicated line access after active / standby switching of etcd cluster

Red dotted line: public network access after active / standby switching of etcd cluster

Etcd2etcd / zk2etcd data synchronization service diagram is as follows:


Engineering practice of ZK synchronous service

The storage structure of zookeeper and etcd is inconsistent, which increases the difficulty of synchronization. Zookeeper storage is a tree structure, while etcd V3 is a flat structure. Zookeeper cannot list all keys according to prefix like etcd; Etcd cannot query the child nodes in a directory through list children like zookeeper, which also increases the difficulty of synchronization.

How to perceive data changes in zookeeper? Unlike etcd, zookeeper’s watch can simply sense the addition of any key. It needs to recursively watch all nodes. After receiving the childrenchanged event, get all the child nodes under the corresponding node of the event, and then compare it with the data in etcd to get the new data and put it into etcd synchronously. Similarly, you can use the recursive method to watch the deletion events of all nodes and delete the data in etcd synchronously.

In addition, zookeeper’s watch has congenital defects. The watch is one-time, so you must watch again after receiving the event. In theory, the event may be lost between two watches, which may occur when the same key is changed for many times in succession. If the loss event occurs, the data consistency will be destroyed. We have introduced the ability of automatic diff and correction, that is, calculating the differences between the data in zookeeper and etcd. Each time, two rounds of diff calculation will be carried out, because in the case of frequent data changes, there are often some “pseudo differences” caused by non strong consistency synchronization in one round of diff calculation. When the diff calculation results, these differences will be automatically fix eliminated.

How to solve the problem of coexisting with etcd2 etcd? In the same path, both etcd2etcd and zk2etcd write data synchronously. In the automatic correction logic of zk2etcd, the accountant calculates the difference and corrects the difference. However, we do not want to delete the data written by etcd2etcd by mistake. We solved this problem by introducing redis for zk2etcd to store the state. When zk2etcd writes or deletes data to etcd synchronously, it also records and deletes data in redis synchronously:

Then zk2etcd when automatically correcting the calculation difference, only the data written by this tool is considered to avoid deleting the data written by other synchronization tools.

Etcd 2 etcd engineering practice

In order to solve the problem of etcd synchronization, we have investigated the following two schemes. Next, we will introduce its principle in detail:

Etcd syncer’s mirror plus version

First, let’s introduce the mirror plus scheme of etcd syncer. As the name suggests, it is an enhanced version of make mirror in etcd community. In order to solve various defects of make mirror, it realizes the following features and advantages:

The whole network is no longer worried about the quality of breakpoints and service continuity, and the whole network is no longer synchronized
High availability. The instance responsible for the same data path replication supports multi replica deployment. After one replica fails, the other replicas will be locked after 5 seconds. Based on the synchronization progress of the previous instance, it can be quickly recovered
Support consistency check (full data check, snapshot check)
Support multi instance concurrent replication to improve performance (different instances are responsible for different paths). It is recommended to configure multiple instances in the production environment, and each instance is responsible for different paths
Good operation and maintenance capability, one click deployment based on k8s deployment, rich metrics and logs, complete E2E test cases, covering core scenarios (HTTP / HTTPS scenarios, abnormal service interruption, network exceptions, etc.)
So what are its disadvantages? Because its core principle still relies on the mvcc + Watch feature of etcd, it cannot guarantee strong consistency of data and only synchronize key value data.

Breakpoint renewal depends on the retention time of mvcc historical version. It is best that the business can save historical data for at least 1 hour.
When the write request is large, the standby cluster may have some data lag and may read dirty data.
Synchronization of non key value data, such as auth authentication related data and lease data, is not supported.

Raretsyn CD version

In order to solve all types of data synchronization problems and eliminate the dependence on etcd mvcc historical data, Tencent cloud can also provide the raft version of etcd syncer based on the raft log synchronization scheme.

Its deployment diagram is shown below. As a learner node, etcd syncer synchronization service joins the main etcd cluster.

The main etcd cluster leader synchronizes the raft log data to etcd syncer through msgapp / snapshot and other messages. Etcd syncer parses the raft log and applies TxN / delete / auth and other requests corresponding to the raft log entries to the destination etcd cluster.

It has the following advantages:

It has all the features and advantages of the mirror plus version of etcd syncer without relying on etcd mvcc historical data.

Based on the raft log synchronization data at the bottom of etcd, various types of data such as key value, auth and lease can be synchronized.

Does not rely on higher versions of etcd.

Complete disaster recovery test

grpc-proxy

This scheme introduces grpc proxy proxy service, which is also the first time to use it. In order to understand the performance of this proxy service, we used the benchmark provided by etcd for reading and writing test, and handwritten a gadget for watch test. The following are some test contents.

Write test

Direct access to the load balancing portal of etcd service

Accessing etcd service through grpc proxy proxy

Grpc proxy proxy can write normally when the endpoints are configured to use a dedicated line or a public network
When the total number of keys written is certain, the larger the number of connections and clients, the lower the total time
The larger the total number of keys written, the average time of a single write will increase, but it is still in milliseconds
When the total number of keys written at one time is 100000, the error of too many requests will appear in the direct etcdserver, but grpc proxy does not
The performance of the public network is lower than that of the dedicated line
The average time consuming of grpc proxy is higher than that of direct connection, but it meets the demand
Read test

Direct access to the load balancing portal of etcd service

Accessing etcd service through grpc proxy proxy

Grpc proxy proxy can read normally when the endpoints are configured to take the private line or the public network

The average time consuming of grpc proxy is higher than that of direct connection, but it is within the acceptable range

Watch test

According to an etcdwatcher service written by ourselves, we can test the grpc proxy: we can set the total number of watchers, update frequency and test time, and print the briefing at the end

./etcdwatch -num=100 -span=500 -duration=10 -endpoint=http://grpc-proxy-addr:23791
test done
total 100 task
0 task failed
current revision is 631490
least revision is 631490
0 task is not synced

Parameter Description:

Num number of tasks

Span update interval in milliseconds

Duration total test time in seconds

Current revision: represents the revision written

Least revision: indicates the revision with the slowest synchronization among num tasks

If failed is 0, it indicates normal; If task not sync occurs, it indicates that the watch and put are out of sync

From the above test results, the number of failed is 0, and the watch test is normal

zk2etcd

We use version 1.2.5, which is deployed through k8s deployment

Simulated ZK server lost contact

scene
Resolve addresses by injecting errors into hosts

phenomenon
No ZK lost connection error log was found during the period
No abnormality was found in the monitoring indicators
After the restart, the fixed operands did not increase significantly (in version 1.2.4, there was a bug that although full sync was executed regularly, it was not aware of the key requiring fix. After restarting zk2etcd service instance, the fixed operands may increase significantly)

Simulated redis loss

Simulated operation
09:56:49 inject redis error resolution address into hosts
10: 07:34 restore redis
10: Restart the synchronization service pod at 16:00 (the restart operation is to observe whether full sync is normal)

phenomenon
During this period, the number of fixed operations did not increase, and no obvious abnormalities were found in other monitoring indicators
After the instance is restarted, there is no convex increase in the number of fixed

Analog etcd lost contact

Simulated operation
16: 15:10 etcd server lost contact

16: 30 recovery

16: 45 restart pod

phenomenon
During this period, the number of fixed operations did not increase, and no obvious abnormalities were found in other monitoring indicators

After restart, the number of fixed operations increased (it is uncertain whether full sync did not take effect or there was an update and repair after restart)

summary

As long as the full sync mechanism works normally, all abnormal scenarios can be recovered after the next full sync is triggered

The minimum recovery interval depends on the set full sync timing execution interval (5m by default). The business can adjust the parameters for this interval tolerance

In addition, in order to prevent the full sync mechanism from running regularly after an exception occurs, but it is not aware of the occurrence, it is safe to restart the zk2etcd service at the first time afterwards

For the additional etcd public network test, the operation of full sync completed, ZK and etcd is time-consuming, which has a certain (second level) increase compared with the intranet

etcd2etcd

Etcd2 etcd synchronization service, I use deployment dual copy deployment

Multi copy backup capability

expect
⼯ as a backup for node failure ⽤ the node will take over the synchronization task after 5S

Test scheme
Etcd syncer dual instance deployment

Kill the running work node for observation

conclusion
During both incremental synchronization and full synchronization, the active / standby switching can work normally (it should be noted that when the active / standby switching occurs in full synchronization, it will become incremental synchronization, which may lead to slow comparison)

Breakpoint continuation capability

expect
After fault recovery, the synchronization can continue from the breakpoint

In fact, in part 1, the standby node takes over the synchronization work after switching to the main node, fast_ The change of path to 1 also proves the continuous transmission ability of breakpoints. We also add several additional verification scenarios:

(a) Short time fault

Fault scenario

During the synchronization from the central etcd cluster to the hot standby cluster, the key of – etcd syncer meta – also exists in the central etcd cluster as the source, which triggers the synchronization service error (the same key cannot be included in the same TxN), resulting in data differences

phenomenon

Add the running parameters of the synchronization service to the filter of – etcd syncer meta -, and then observe that after catching up with the data for a period of time, the final Miss number drops to the same level

(b) Long time fault

Fault scenario

Stop deployment of synchronization service

Wait for the data difference between etcd clusters on both sides and a compact before starting the synchronization service

phenomenon

After the data difference occurs and compact occurs, restart the synchronization service. The log is as follows: full synchronization is triggered due to the occurrence of compact

Synchronization service monitoring indicators: (a) DST Miss key will soon drop; (b) SRC Miss key increased and remained unchanged

analysis

After the synchronization service is stopped, the number of keys in the source etcd changes a lot. It can be seen from the monitoring chart that there is a decrease during this period, indicating that the key has been deleted

There is also a small problem exposed here. When SRC Miss keys appear, they cannot be repaired automatically at present. They need to be accessed manually to clean up redundant keys

  1. Reset triggers full synchronization
    When there is a major difference in synchronization (such as DST miss) for emergency repair, configure the — reset last synchronized Rev parameter to delete the breakpoint continuation information to trigger the full amount of synchronous repair difference

phenomenon
Due to some abnormality, DST miss (example of yellow line in the figure) occurs during synchronization. To repair, add the — reset last synced Rev parameter to the new instance and run it

analysis

slow_ Path is 1, indicating that full synchronization is triggered (green line example in the figure)

The DST Miss value of the green line instance has not increased, indicating that it has reached a consistent level

  1. Network failure
    Dedicated line interruption between two etcd clusters

Incremental synchronization in progress

Full synchronization

Test scheme

When the dedicated line is interrupted to switch to the public network, it is necessary to modify the etcd cluster access address in the operation parameters, that is, restart will occur (the restart scenario test has been covered earlier and will not be repeated here)

summary

Etcd syncer synchronization service has a good active and standby mechanism, which can switch in time and effectively

The breakpoint continuous transmission performance after short-term failure meets the expectation; For long-time faults and complex compact situations, SRC miss may occur after synchronization is restored, which may require manual access

By configuring the — reset last synchronized Rev parameter, it has a good effect on the exception repair of SRC miss

About us

More about cloud native cases and knowledge, you can focus on the same name [Tencent cloud primer] official account.

Welfare: the official account of background reply (manual) can get the “Tencent cloud native roadmap manual” & “cloud cloud native best practices”.

[Tencent cloud native] cloud says new products, Yunyan new technology, cloud tours, new cloud and cloud reward information, scanning code concern about the same official account number, and get more dry cargo in time!!