Serverless road of Century Lianhua

Author Zhu Peng (min Cang)
Source|Serverless official account

1、 Introduction to Century Lianhua Supermarket

1. Company profile

Hangzhou Lianhua Huashang Group Co., Ltd. was established in July 2002. Its main business covers retail formats such as shopping centers, hypermarkets, supermarkets and convenience stores. It is the construction and guarantee unit of the food material warehouse of the G20 Hangzhou summit. It is a leading business enterprise in Zhejiang Province.

Among the group’s more than 200 stores, they are mainly involved in POS transactions, Lianhua supermarket, city life, Tianhua Century City, etc. in addition, there are online selected apps to provide online purchase and home delivery services. They will also launch activities such as coupon collection and limited time second kill from time to time.

2. Evolution scheme of Century Lianhua Technology Architecture

  • Since its establishment in 2002, the company has been using physical stand-alone architecture.
  • In 2014, due to the double 12 incident, the company had to make changes and move its business to the central computer room.
  • In 2018, with the development of domestic public cloud, it began to deploy comprehensive cloud deployment.
  • In June 2019, there was too much database pressure on the public cloud, and Century Lianhua began to explore new architecture.
  • By November 2019, in only about four months, Century Lianhua will move some businesses to Alibaba cloud serverless, including API gateway, function computing and table storage. During the double 11, the application performance of these three products is very excellent, which makes Century Lianhua decide to all in serverless.
  • By November 2020, all in serverless has greatly improved the development efficiency of the whole company and greatly saved costs.

2、 Technology architecture evolution

1. Physical stand-alone architecture

Under the physical stand-alone architecture in 2014 and before, a supermarket usually had only 2 ~ 20 POS machines and up to 20 clients. The architecture is very simple. As long as the local database is deployed on one physical machine, the trading system, member system and commodity management are all put in one process. If you want to do relevant operations, such as fetching a transaction, registering relevant information for users and adjusting commodity prices, you can only make corresponding changes through the admin client connection process. Generally speaking, as long as a large supermarket buys a machine with strong enough performance, it can serve the requests initiated by dozens of POS machines.

Advantages and disadvantages of stand-alone architecture:

1) Advantage

  • Simple architecture;
  • Not affected by the external network environment;
  • After POS machines are dispersed, the impact of single machine is relatively small.

2) Disadvantages

  • Data migration query summary difficulty

Problems were gradually exposed in 2014. For example, in the headquarters in Hangzhou, it was almost impossible to query the real-time transaction volume of a store in Huzhou. Cross network query and large amount of data are difficult to solve.

  • Data distribution depends on periodic synchronization

For example, the membership card registered by customers in store a is difficult to consume in store B. they can only rely on regular synchronization to copy the data of store a to store B regularly. There are many problems and it is very troublesome for consumers.

  • In case of failure, it is difficult to maintain and repair at the first time

It is impossible for us to send a professional maintenance personnel to each store. If the machine breaks down, we can only call the engineers at the headquarters. In this case, it is difficult to rush to the site for repair at the first time, which is a very serious problem.

  • Single point of failure disaster recovery difficult

Because all businesses are contained in one process, if an exception occurs in the process, there is no way to hand over the business to another process.

  • Upgrade difficulty

We have hundreds of stores in Zhejiang Province. Each upgrade requires professional operation and maintenance personnel to deploy the new code package to different machines.

  • New business deployment has a huge impact on a single machine

Take a case, in 2014, two dozen, Alipay launched the use of Alipay wallet payment can play 50 percent off of the offline promotions, when the national line of nearly 100 brands, more than 20 thousand stores were involved, Century Lianhua also participated, but on the same day, a large number of consumers have no way to check out in the supermarket queue.

Because we have just introduced a new payment method, all businesses are in a single process, and the coupling degree is too high. At that time, the number of centralized checkout visits was too large, resulting in payment problems. The whole single machine access could not continue, and other business modules were affected. Finally, we had to restart the machine. Because of this problem, Century Lianhua began to try to make new changes.

2. Central computer room deployment architecture

The biggest problem with a single machine is that if there is a problem in the store, the relevant engineers cannot rush to the site at the first time, especially when there are problems in multiple machines and stores at the same time. At this time, the best way is to gather all machines together for centralized data repair, operation and maintenance management and software upgrade.

From 2014 to 2018, Century Lianhua gradually moved the whole stand-alone architecture to the central computer room. The central computer room is self built. The practice is to split the database, trading system, member system and commodity management into multiple processes. In this way, if the member system hangs up, you can also buy anonymously temporarily; There is a temporary problem with commodity management, but it can be put on top as long as the trading system is OK. Once the coupling is reduced, the business guarantee of the whole store has been greatly improved.

Here we make a node node, which connects the database of the central computer room and various system modules. If there is a problem, you only need to repair it in the central machine room. In addition, if you need to adjust the commodity price, you only need to set it directly in the central computer room, and then synchronize it to the node nodes of all stores.

Improvement and deficiency of central computer room deployment architecture:

1) Improvement

  • Problems can be centrally maintained and handled;
  • All commodity price adjustments are distributed online;
  • Data can be queried and summarized in a centralized way.

2) Insufficient

  • The administrator needs to control the details of the machine;
  • It is difficult to investigate downtime and network disconnection events, and the emergency plan is weak;
  • High hardware upgrade cost;
  • A large amount of hardware needs to be purchased in advance for disaster preparedness;
  • High cost of software and system batch deployment;
  • Resource budget difficulties.

3. Full cloud

After 2016, with the rapid development of domestic public cloud, it is unstoppable to launch the cloud in an all-round way. During this period, Alibaba cloud has made many breakthroughs and improvements in technology, such as the external release of ECs. From 2018 to 2019, Century Lianhua gradually migrated various system modules in its self built computer room to the public cloud. The overall architecture has not changed much, so the migration is relatively smooth.

Improvements and shortcomings of comprehensive cloud deployment:

1) Improvement

There are three main aspects:

  • No longer need to care about the hardware details of the network and operating system

For example, Alibaba cloud ECS will make scheduling and early warning in advance, transfer user data and prepare multiple copies of data to prevent disk failure.

  • Hardware upgrade is fast and simple

For example, users use a 4-core machine. When they find that the business grows rapidly and need to upgrade the hardware, they only need to make an image. For example, if you take a disk snapshot at night, re apply for a new machine, and then restore the snapshot, you can complete the one click migration. For Century Lianhua, this is a very fast way and a good experience for developers.

  • The expansion time of the machine is greatly shortened

The above mentioned is the capacity expansion of a single machine, such as 4-core to 8-core and 16g to 32g memory. In addition, there are horizontal expansion, such as the API interface of the user trading system. With the development of business, it is necessary to expand from the original two machines to eight machines. In this case, users only need to apply for machines and then expand the image to different machines.

2) Insufficient

It mainly includes the following six aspects:

  • Resource budget difficulties

Since it is impossible to estimate the volume that can be achieved when the business encounters large promotion and other activities, it is impossible to accurately calculate the amount of hardware required.

  • Horizontal expansion

Horizontal expansion has high requirements for R & D. For example, whether the data should be stateless. If stateless, it will be easier to expand horizontally. If stateful, the data may need to be cached, which will involve database related problems, such as data expiration, consistency, etc. If you don’t understand these thoroughly, it will be difficult to expand horizontally.

  • Water level monitoring

Many developers are not perfect in water level monitoring. If they mix various business systems on one machine, it is particularly difficult to quickly troubleshoot problems and timely carry out flow control, splitting and temporary repair when the water level of the machine is high.

  • Financial budget difficulties

Similar to resource budget difficulties.

  • High cost of hardware upgrade

In order to achieve user insensitive and lossless upgrade, it may involve the consistency between the processing on the connection and the database. If multiple modules need to be upgraded at the same time, pay attention to the compatibility of data structures.

  • Single point of failure of database

Many manufacturers put all the data in one database. If it is not handled properly, it may cause a single point of failure. This requires data splitting. For rough splitting, you need to pay attention to issues related to transactions and locks, which will greatly reduce the efficiency; In case of detailed disassembly, it will be difficult to query and sort, which will cause some trouble to the business implementation.

4. Exploration and attempt of serverless

1) Prevention of uncontrollable online business

During the big promotion in mid-2019, due to the uncontrollable access of online business users and the large amount of data, MySQL single machine access was exploded, resulting in problems in the storage database, affecting multiple systems and causing certain losses.
After this incident, Century Lianhua wanted to replace MySQL directly. At this time, we found that Alibaba cloud has a product calledTable storage, the biggest advantage of table storage is that users do not need to care about the proportional relationship between the number of visits and the number of machines. As long as the access volume is expanded, the background will automatically expand the machine to meet the high concurrent data reading; When the data concurrent requests are reduced and in the low peak period, the background will recycle the machines, and users no longer need to care about the number of machines and how to transfer them.

To solve the problem of uncontrollable user traffic, Century Lianhua introduced Alibaba cloud products“API gateway”, API gateway can perform control release and flow control for different channel providers. For example, if the wechat channel traffic is found to be abnormal, you can limit the flow with the help of API gateway.

In addition, computing is also a very important issue. Century Lianhua has explored and found Alibaba cloud“Function calculation”It fits our business scenario very well. For example, regular rush buying, coupon delivery and other activities have caused a huge burst impact. It must be too late to buy a machine when it is found that there are not enough computing resources, and the function of timely capacity expansion of function calculation solves this problem well. In addition, its data observation and abnormal alarm functions have also attracted Century Lianhua.

Century Lianhua combined these three products to replace the original member query function, and finally successfully overcome the double 11 promotion difficulties in 2019.

2) New dawn brought by serverless

  • Rapid iterative deployment

Serverless has fast R & D efficiency, high operation and maintenance efficiency and decoupling architecture.

  • High concurrency and high elasticity

Serverless does not require manual capacity expansion and operation and maintenance control.

  • Stable, reliable and safe

Serverless makes the overall experience of rush buying activities and promotion very smooth.

  • Data, operation and cost control

Serverless provides complete operation and maintenance observation and alarm monitoring functions, which makes it much easier for operation and maintenance engineers; In addition, it is billed according to the resources used, and the resource utilization rate can reach 100%.

5. Function calculation 2.0 and all in serverless

  • Curve 1: similar to ECS scheme, the curve shows insufficient resources and waste of resources.
  • Curve 2: when the machine is expanded, there are delays and errors. It needs to be operated in advance. Its real-time performance and scalability are poor.
  • Graph 3: function calculation 2.0 reservation mode, with reserved resources and elastic resources, can be expanded in real time.
  • Resource Management: manual operation and maintenance → cloud platform tool operation and maintenance → serverless operation and maintenance free, realizing full automation.
  • Resource utilization: low utilization of budget procurement → high utilization of limited elasticity → 100% resource utilization of serverless.
  • Resource cost: fixed cost expenditure → scale according to resource policy → serverless adapts according to business policy.

After the double 11 in 2019, Century Lianhua quickly went to the cloud, transformed its online core business into a mid platform model with full serverless architecture, and adopted “function computing + API gateway + OTS” as the computing network storage core to elastically support the resources required for daily and peak valley promotion, and easily support 618 / double 11 / double 12 promotion.

Figure: Double 11 promotion in 2020

In 2020, the double 11 promotion, Century Lianhua online business will realize all in serverless, with the graph of flow & time on the top and call delay & time on the bottom.

Figure: serverless helps Century Lianhua reduce costs and improve efficiency

3、 Design architecture evolution Summary

Architecture evolution from physical stand-alone to all in serverless:

  • Physical stand-alone

    • Simple architecture
    • Highly coupled
    • Difficult data synchronization
    • Upgrade difficulty
    • Unable to expand horizontally
  • Self built machine room

    • Unified maintenance and upgrade
    • Data synchronization and unification
    • Difficult system deployment
    • High hardware cost
    • Non business investigation difficult
    • Temporary expansion
  • Full cloud

    • Simple hardware upgrade
    • Capacity expansion capacity improvement
    • Improvement of disaster preparedness capability
    • High design requirements
    • Original monitoring alarm
    • Database single point
    • Flow control problem
  • Serverless attempt

    • Database single point problem
    • Flow control problem solving
    • Scale out
    • Monitoring alarm
    • Cost free budget
    • Partial delay is large
  • All in Serverless

    • decoupling
    • Cold start experience improvement
    • Improvement of R & D Efficiency
    • Cost reduction

4、 Introduction to function calculation

1. Overview of Alibaba cloud function computing products

Function computing is the most complete and functional serverless product in China. It will become a reality for developers to go to the cloud and one click serverless.

2. Industry trends

Who is using function calculation?

About the author:
Zhu Peng, flower name: Min Cang, front-line technical expert of function computing, focusing on the design and R & D of function computing resource scheduling.

This article is compiled from [serverless live series live broadcast] on January 28
Live view link:

Serverless Ebook Download

Highlights of this book:

  • Starting from the architecture evolution, this paper introduces the serverless architecture and technology selection, and constructs the serverless thinking;
  • Understand the operation principle of the popular serverless architecture in the industry;
  • Master the top 10 real landing cases of serverless, learn and use them flexibly.

Download link:

