How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Time:2020-8-2

SOFAStack(Scalable Open Financial AArchitecture stack) is a financial level cloud native architecture independently developed by ant financial services. It contains all the components needed to build the financial level cloud native architecture, and is the best practice tempered in the financial scene.  

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

SOFARegistry is an open source of ant ConfigServer services, which has high availability service registration center with the ability to register and subscribe to massive services. It originated from Taobao’s first edition of the ConfigServer, and has evolved to the fifth generation in the past ten years, driven by the business development of Alipay / ants.

This paper is the last one of the analysis of the sofa registry framework,The author 404p (huamingyantu)。 The analysis of sofa registry framework series is produced by sofa team and source code enthusiasts< SOFA:RegistryLab/ >At the end of the paper, there are a series of articles in the past.

GitHub address: https://github.com/sofastack/sofa-registry

preface

Under the microservice architecture, the service registry is committed to solving the problem of service discovery among microservices. In the case of a small number of services, each machine in the service registration center cluster keeps a full amount of service data. However, with the emergence of ant financial’s massive services, a single machine can no longer store all the service data, and data fragmentation has become an inevitable choice. After data slicing, each machine only saves a part of the service data, which will easily cause data fluctuations when the nodes are online or offline, which can easily affect the normal operation of the application. This paper introduces the segmentation algorithm of sofa registry and related core source code to show how ant financial can solve the above problems. ~ ~

Introduction to service registry

In the microservice architecture, there are a lot of services calling each other behind the server of an Internet application. For example, service a depends on service B on the link. Then, service a needs to know the address of service B to complete the service call. In the distributed architecture, each service is often deployed in a cluster, and the machines in the cluster are also constantly changing, so the address of service B is not fixed. To ensure the reliability of the service, the service caller needs to sense the address change of the called service.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 1 service addressing in microservice architecture

Since tens of thousands of service callers are aware of such changes, this perception ability will sink into a fixed architecture model in microservices: Service Registry.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 2 service registry

In the service registry, there are two important roles of service provider and service consumer. The service caller is the consumer and the service callee is the provider. For the same machine, it often has both roles, which are called by other services as well as other services. The service provider publishes the service information provided by itself to the service registry, and the service consumer perceives whether the information of the dependent service has changed through subscription.

Overall architecture of sofa registry

The architecture of sofa registry includes four roles: client, session, data, and meta, as shown in Figure 3

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 3 overall architecture of sofa registry

  • Client layer

Application server cluster. The client layer is the application layer. Each application system uses the service publishing and service subscription capabilities of the service registry by relying on the relevant client jar package of the registry.

  • Session layer

Session server cluster. As the name suggests, the session layer is the session layer, which maintains communication with the application server of the client layer through a long connection, and is responsible for receiving the client’s service publication and service subscription requests. This layer only stores the publish and subscribe relationship of each service in memory, and only transmits and forwards the specific service information between the client layer and the data layer. Session layer is stateless and can be expanded with the increase of application scale of client layer.

  • Data layer

Data server cluster. The data layer stores the service registration data of the application by means of partitioned storage. According to datainfoid (the unique identification of each service data), the data is hashed and multi copies are backed up to ensure the high availability of data. The following focuses on how to achieve smooth expansion and reduction of data layer without affecting business with the growth of data scale.

  • Meta layer

Metadata server cluster. The scope of this cluster is the server information of session server cluster and data server cluster, and its role is equivalent to the service registry within sofaregistry architecture. However, as a service registry, sofaregistry serves the broad application service layer, while the meta cluster serves the session cluster and data cluster within sofaregistry The layer can sense the changes of session node and data node, and inform other nodes in the cluster.

How to break through the bottleneck of single machine storage

Under the business scale of ant financial services, a single server can no longer store all the service registration data. Sofa registry adopts the scheme of data fragmentation, in which each machine only keeps part of the data, and each machine has multiple copies of backup, which can theoretically expand the capacity infinitely. According to different data routing methods, the common data fragmentation can be divided into two categories: range fragmentation and hash fragmentation.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 4 data fragmentation

  • Range slicing

Each data slice is responsible for storing the values of a key value range. For example, the partition is performed according to the time period, and the key of each hour is placed on the corresponding node. The advantage of interval range fragmentation is that data fragmentation has continuity, which can realize interval range query, but the disadvantage is that the data is not randomly scattered, which is easy to exist hot data problems.

  • Hash fragmentation

Hash slicing is to distribute data randomly and evenly in each node through a specific hash function. It does not support range query, but only supports point query, that is to obtain the content of data according to the key of a certain data. Most kV (key value) storage systems in the industry support this mode, including Cassandra, dynamo, Membase, etc. The common hash slicing algorithms in the industry include hash modulus method, consistent hash method and virtual bucket method.

Hash modulus

Hash function of hash modulus is as follows:

H(Key) = hash(key) mod K;

This is a key machine function. Key is the primary key of the data, and K is the number of physical machines. The key of the data can be directly routed to the physical machine. When k changes, it will affect the whole data distribution. The data on all nodes will be redistributed, which is difficult to be smoothly completed without system perception.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 5 hash modulus

Consistent Hashing

Distributed hash table (DHT) is a common technology in P2P network and distributed storage. It is a distributed extension of hash table, that is, under the premise of each machine storing part of the data, how to route the data by hash. Its core is that each node not only keeps a part of the data, but also maintains a part of the routing, so as to realize decentralized distributed addressing and storage of P2P network nodes. DHT is a technical concept. One of the most common implementation methods in the industry is the Chord algorithm implementation of consistent hashing.

  • Hash space

The hash space in the consistent hash is a logical ring space shared by data and nodes. Data and machines can get their respective positions in the hash space through their respective hash algorithms.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 6 data item and data node share hash space

Figure 7 is a hash space with a binary length of 5. The value range of the space can be expressed is 0-31 (2 ^ 5), which is a circular sequence with the end and the beginning. The large circle on the ring represents different machine nodes (generally virtual nodes). It is represented by $$Ni $$, and $$I $$represents the node’s position in the hash space. For example, if a node hashes according to its IP address and port number and gets a value of 7, then N7 represents the node’s position in the hash space. Because the configuration of each physical machine is different, usually the physical node with high configuration will be virtual into multiple nodes in the ring.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 7 hash space of length 5

The nodes in the ring divide the hash space into several intervals, and each node is responsible for storing the data of one of the intervals. For example, N14 node is responsible for storing data with hash value ranging from 8 to 14, and N7 node is responsible for storing data with hash value between 31 and 0-7. The small circle on the ring represents the actual data to be stored. When a piece of data calculates its position in the hash ring through hash, it will find the nearest node in the ring clockwise, and the data will be saved on the node. For example, if a data calculated by hash is 16, it should be stored on N18 node. Through the above methods, the data can be distributed stored in different nodes of the cluster to realize the function of data fragmentation.

  • Node offline

As shown in Figure 8, node N18 fails and is removed. Then the hash ring interval of N18 node is moved clockwise to n23 node, and the storage interval of n23 node is expanded from 19-23 to 15-23. After N18 node is offline, the data item with hash value of 16 will be saved on n23 node.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 8 nodes offline in a consistent hash ring

  • Node Online

As shown in Figure 9, if a new node is put online in the cluster, and its IP and port hash value is 17, then its node name is n17. Then, the hash range of n17 node is 15-17, and that of n23 node is 18-23. After n17 node is online, data items with hash value of 16 will be saved on n17 node.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 9 nodes online in consistent hash ring

When the nodes change dynamically, the consistent hash can keep the data balance, and avoid the global data rehashing and data synchronization. However, the data distribution range of two adjacent nodes is still changing, which brings inconvenience to data synchronization. Data synchronization is generally achieved through operation log, while the operation log of consistent hash algorithm is often associated with data distribution. In the case of unstable data distribution, the location of operation log will change with the dynamic online and offline of the machine, so it is difficult to achieve accurate synchronization of data in this situation. For example, the hash ring in the above figure has 0-31 values. If the log file is named according to this hash value, the file log of data-16.log is initially in N18 node. After N18 node is offline, n23 node also has data-16.log. After n17 node is online, n17 node also has data-16.log. Therefore, we need a mechanism to ensure that the location of the operation log will not be affected by the dynamic changes of nodes.

Virtual bucket pre slicing

Virtual bucket decomposes the key node mapping and introduces the virtual bucket layer between data items and nodes. As shown in the figure, data routing is divided into two steps. First, hash operation is performed by key to calculate the slot corresponding to the data item, and then the mapping relationship between slot and node is used to determine which node the data item should exist on. The number of slots is fixed, the hash mapping relationship between key and slot will not change due to the dynamic changes of nodes, and the operation log of data also corresponds to the slot, so as to ensure the feasibility of data synchronization.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 10 virtual bucket pre slicing mechanism

The mapping relationship between all nodes and all slots is stored in the routing table, and the mapping between slots and nodes is balanced as far as possible. In this way, when the nodes change dynamically, we only need to modify the relationship between the slot and the dynamic nodes in the routing table, which not only ensures the elastic expansion and reduction, but also reduces the difficulty of data synchronization.

Partition selection of sofa registry

By comparing the above-mentioned consistent hash fragmentation and virtual bucket fragmentation, we can summarize the differences between them: consistent hashing is more suitable for distributed cache scenarios. This scenario focuses on solving the problems of balanced distribution of data, avoiding data hotspots and cache acceleration, and does not guarantee the high reliability of data, for example Memcached; while virtual bucket is more suitable for scenarios where data is highly reliable through multiple copies of data, such as TAIR and Cassandra.

Obviously, sofa registry is more suitable for virtual bucket, because the service registry has high reliability requirements for data. However, due to historical reasons, sofa registry was the first to choose consistent hash fragmentation, so it also encountered the problem of data synchronization caused by unstable data distribution. How did we solve it? We record operation logs in dataserver memory with the granularity of datainfoid, and do data synchronization between dataservers with the granularity of datainfoid (a service is identified only by a datainfoid). In fact, the idea of this kind of logging is consistent with virtual bucket, but each datainfoid is equivalent to a slot, which is a compromise scheme adopted for historical reasons. In the scenario of service registration center, datainfoid often corresponds to a published service, so the total amount is still relatively limited. With the current scale of ant financial, the number of datainfoids carried in each dataserver is only tens of thousands, barely realizing the data multi replica synchronization scheme with datainfoid as slot.

Dataserver scaling related source code

Note: this source code interpretation is based on the 5.3.0 version of registry server data.

The core startup class of dataserver is dataserver bootstrap, which mainly includes three types of components: bolt communication component between nodes, event communication component within JVM, and timer component.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 11 core components of dataserver bootstrap

  • External node communication component: there are three server communication objects in this class, which are used to communicate with other external nodes. Among them, httpserver mainly provides a series of HTTP interfaces for dashboard management, data query, etc.; datasyncserver mainly deals with some data synchronization related services; dataserver is responsible for data related services; from the registered handler, the responsibilities of datasyncserver and dataserver are overlapped;
  • Internal communication component of JVM: the internal logic of dataserver is mainly realized by event driven mechanism. Figure 12 lists the interaction process of some events in the event center. It can be seen from the figure that an event often has multiple delivery sources, which is very suitable for using event center to decouple the logic between event delivery and event processing;
  • Timer components: such as timing detection node information, timing detection data version information;

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 12 core event flow in dataserver

Data server node expansion

Suppose that with the growth of business scale, the data cluster needs to expand the capacity of new data nodes. As shown in Figure 13, Data4 is a new data node. When the new node Data4 is started, Data4 is in the initialization state. In this state, the data writing operation for Data4 is prohibited, and the data reading operation will be forwarded to other nodes. At the same time, the data belonging to the new node in the stock node will be pulled by the new node and its replica node.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 13: data server node expansion scenario

  • Forward read operation

Before the data synchronization is completed, all data reading operations on the new node will be forwarded to the data node that owns the data fragment.

Query service data processor getdatahandler

public Object doHandle(Channel channel, GetDataRequest request) {
    String dataInfoId = request.getDataInfoId();
    if (forwardService.needForward()) {  
           //... if it is not working, the read operation needs to be forwarded
        return forwardService.forwardRequest(dataInfoId, request);
    }
}

Forwarding service impl

public Object forwardRequest(String dataInfoId, Object request) throws RemotingException {
    // 1. get store nodes
    List<DataServerNode> dataServerNodes = DataServerNodeFactory
        .computeDataServerNodes(dataServerConfig.getLocalDataCenter(), dataInfoId,
                                dataServerConfig.getStoreNodes());
    
    // 2. find nex node
    boolean next = false;
    String localIp = NetUtil.getLocalAddress().getHostAddress();
    DataServerNode nextNode = null;
    for (DataServerNode dataServerNode : dataServerNodes) {
        if (next) {
            nextNode = dataServerNode;
            break;
        }
        if (null != localIp && localIp.equals(dataServerNode.getIp())) {
            next = true;
        }
    }
    
    // 3. invoke and return result 
}

When forwarding the read operation, it is divided into three steps: first, calculate the node list of the data items to be read according to the data center where the current machine is located (each data center has a hash space), datainfoid and the number of data backup (the default is 3); secondly, find an IP address from these node lists Finally, the read request is forwarded to the target node, and the read data item is returned to the session node.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 14 read request of data server node during capacity expansion

  • Disable write operations

Before the data synchronization is completed, it is forbidden to write data to the new node to prevent the new data inconsistency during the data synchronization.

Publishing service processor publishdatahandler

public Object doHandle(Channel channel, PublishDataRequest request) {
    if (forwardService.needForward()) {
        // ...
        response.setSuccess(false);
           response.setMessage("Request refused, Server status is not working");
        return response;
    }
}        

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 15: write requests for data server node expansion

Data server node size reduction

Taking Figure 16 as an example, the read and write requests of the data item key 12 fall on the N14 node. When the N14 node receives the write request, it will synchronize the data to the subsequent nodes n17 and n23 at the same time (assuming that the number of copies at this time is 3). When N14 node is offline, MetaServer will reject N14 node and push nodechangeresult request to each node after sensing the connection failure with N14. After receiving the request, each data node will update the local node information and recalculate the ring space. At this time, all the data on n17 that request to refresh are on the node of n17, so all the data on n17 that fall on the node are smoothed.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 16: smooth switching when the data server node is downsized

Data synchronization when node changes

The MetaServer will sense that the new node is online or offline through the network connection. All dataservers run a task called connectionrefreshtask to refresh the connection regularly. This task regularly polls the MetaServer to obtain the information of the data node. It should be noted that in addition to the dataserver taking the initiative to pull node information from MetaServer, MetaServer will also actively send nodechangeresult requests to each node to notify node information changes. The final effect of pushing and pulling to obtain information is consistent.

When the polling information returns that the data node has changed, it will post a dataserverchangeevent event to the eventcenter. In the event processor, if it is determined that the information of the current machine room node has changed, it will post a new event, localdataserverchangeeventhandler, the processor of the event If it is a new node, it will send a notifyonlinerequest request to other nodes, as shown in Figure 17

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 17 logic of new node when dataserver node goes online

Local data server change event handler

public class LocalDataServerChangeEventHandler {
    //Same cluster data synchronizer
    private class LocalClusterDataSyncer implements Runnable {
        public void run() {
            if (LocalServerStatusEnum.WORKING == dataNodeStatus.getStatus()) {
                //if local server is working, compare sync data
                notifyToFetch(event, changeVersion);
            } else {
                dataServerCache.checkAndUpdateStatus(changeVersion);
                //if local server is not working, notify others that i am newer
                notifyOnline(changeVersion);;
            }
        }
    }
}

Figure 17 shows the processing logic of the newly added node receiving the node change message. If the node that has been running online receives the node change message, the previous processing flow is the same. The difference is that the local data server change event handler will use the hash The ring calculates the range of data fragmentation and its backup node of the changed node (in the expansion scenario, the change node is the new node, and in the shrink scenario, the changed node is the successor node of the offline node in the hash ring).

The current node traverses the data items in its own memory, filters out the data items that belong to the partition range of the change node, and then sends the notifyfetchdatumrequest request to the change node and its backup node. After the change node and its backup node receive the request, its processor will synchronize the data to the sender (not ifyFetchDatumHandler.fetchDatum )As shown in Figure 18.

How to realize the smooth expansion and reduction of dataserver in ant financial services registry

Figure 18: Logic of existing nodes when data server node changes

summary

In order to solve the scenario of massive service registration and subscription, the consistent hash algorithm is adopted in the data server cluster for data fragmentation, which breaks through the bottleneck of single machine storage and provides the possibility of unlimited expansion in theory. At the same time, in order to achieve high availability of data, sofa registry records the service data with the granularity of datainfoid in the memory of dataserver, and synchronizes the data between dataservers through the latitude of datainfoid, which ensures the data consistency and realizes the smooth expansion and reduction of dataserver.

Sofa registry Lab Series

  • Service registry data consistency analysis | sofa registry analysis
  • How to realize the second level service online and offline notification | sofaregistry resolution in the service registry
  • Analysis of session storage policy | sofaregistry in service registry
  • Detailed explanation of data fragmentation and synchronization scheme in service registry
  • Service registry MetaServer function introduction and implementation analysis | sofa registry analysis
  • Service registry sofaregistry analysis and optimization of service discovery
  • Introduction of registry sofaregistry architecture based on massive data