Introduction: in 2020, we made a very big upgrade on the underlying infrastructure of serverless. For example, the computing was upgraded to the fourth generation Shenlong architecture, the storage was upgraded to Pangu 2.0, and the network entered the Baig Luoshen network. After the overall upgrade, the performance was doubled; The baas level has also been greatly expanded. For example, it supports event bridge and serverless workflow to further improve the system capability.
1、 Achievements of large-scale implementation of serverless group
In 2020, we made a very big upgrade on the underlying infrastructure of serverless, such as upgrading the computing to the fourth generation Shenlong architecture, upgrading the storage to Pangu 2.0, and entering the Baig Luoshen network on the network. After the overall upgrade, the performance was doubled; The baas level has also been greatly expanded. For example, it supports event bridge and serverless workflow to further improve the system capability.
In addition, we have also cooperated with more than a dozen Bu in the group to help business parties implement serverless products, including the double 11 core application scenario, and help them successfully pass the double 11 traffic peak test, which proves that serverless still performs very stably in the core application scenario.
2、 Two backgrounds and two advantages – accelerating the implementation of serverless
1. Serverless background
Why can we quickly implement large-scale serverless within the group? First, we have two premises and backgrounds:
The first background isUpper cloud, the group’s cloud launch is an important prerequisite. Only when the group goes to the cloud can it enjoy the elastic dividend on the cloud. If it is still an internal cloud, the subsequent effect and cost reduction is actually very difficult to achieve. Therefore, on the double 11 th day of 2019, Ali realized 100% cloud launch of the core system. With the cloud launch premise, serverless can play a very important role.
The second background isComprehensive cloud original biochemistry, it has created a powerful cloud family of cloud native products, empowered the internal business of the group, and helped the business achieve two main goals based on cloud: improving efficiency and reducing cost. In 2020, tmall’s double 11 core system will be fully cloud original, with efficiency increased by 100% and cost reduced by 80%.
- Two advantages of serverless
- Improve efficiency
A standard cloud native application, from R & D to launch to operation and maintenance, needs to complete all the work items marked in orange in the figure above before it can complete the formal launch of microservice applications. First, CI / CD code construction, and then the visualization work project of system operation and maintenance. It not only needs to be configured and connected, but also needs to carry out traffic assessment, security assessment, traffic management, etc. on the overall data link, This obviously requires very high manpower threshold. In addition, in order to improve resource utilization, we also need to mix various businesses, and the threshold will be further raised.
It can be seen that for the overall cloud native traditional applications, the work items that need to be completed to realize the launch of micro services are very difficult for developers and need to be completed by multiple roles. However, in the serverless era, developers only need to complete the blue box coding in the above figure, and all the remaining work items will be completed later, The R & D platform of serverless can directly help the business go online.
- cost reduction
Improving efficiency mainly refers to the saving of human cost, while reducing cost is aimed at the utilization of application resources. In common applications, we need to reserve resources for the peak, but the trough will cause great waste. In the serverless scenario, we only need to pay on demand and refuse to reserve resources for the peak, which is the biggest advantage of serverless in reducing costs.
The above two backgrounds and advantages are in line with the trend of cloud technology, so the business parties within the group hit it off at once. Some large Bu have upgraded the serverless landing to the campaign level to accelerate the serverless scenario of business landing. At present, the serverless scenarios implemented in the group have been very rich, involving some core applications, personalized recommendation, video processing, AI reasoning, business inspection and so on.
3、 Serverless landing scenario – front end light application
At present, the front-end scenario within the group is the fastest and most widely used scenario for serverless, including more than 10 + Bu of Taoxi, Gaode, Feizhu, Youku, Xianyu, etc. So why is the front-end scenario suitable for serverless?
The above figure is the capability model diagram of the whole stack of engineers. In general micro applications, there are three roles: front-end engineer, back-end development engineer and operation and maintenance engineer, who jointly complete the online release of applications. In order to improve efficiency, the role of full stack engineer has emerged in recent years. As a full stack engineer, he needs to have the ability of these three roles, not only the front-end application development technology, but also the development skills at the back-end system level, and pay attention to the underlying kernel and system resource management, which is obviously very high for the front-end engineer.
In recent years, the rise of node.js technology can replace the role of back-end development engineer. As long as the front-end engineer has the front-end development ability, he can play two roles, namely, front-end engineer and back-end development engineer, but the operation and maintenance engineer can still not be replaced.
The serverless platform solves the bottom three layers in the triangular structure above, which greatly reduces the threshold for front-end engineers to become full stack engineers, which is very tempting for front-end business developers.
Another reason is that the business characteristics are consistent. Most front-end applications have the characteristics of traffic peak, which requires business evaluation in advance and has evaluation cost; At the same time, the front-end scene update iteration is fast, fast up and down, and the operation and maintenance cost is high; And lack of dynamic expansion and contraction capacity, there are resource fragments and waste of resources. If you use serverless, the platform will automatically help you solve all the above worries, so serverless is very attractive to the front-end scenario.
1. Front end landing scene
The above figure lists several main scenarios and technical points of front-end landing:
BFF to SFF layer: BFF is mainly backend for frontend. Front end engineers do the main operation and maintenance, but in the serverless era, the operation and maintenance is completely handed over to the serverless platform. Front end engineers only need to write business code to complete this work.
Slimming down: sink the front-end business logic to the SFF layer, reuse the logic by the SFF layer, and hand over the operation and maintenance capability to the serverless platform to realize the lightweight and efficiency improvement function of the client.
Cloud integration: a code multi terminal application, which is a very popular development framework, also needs SFF as a support.
CSR/SSR: server side rendering and client side rendering are met through serverless to realize the rapid display of the front-end first screen. Serverless combined with CDN can be used as a solution for front-end acceleration.
NoCode: it is equivalent to packaging on the serverless platform. Just drag and drop several components to build a front-end page. Each component can be packaged and aggregated with serverless to achieve the effect of Nocode.
Background scene: it is mainly the rich application scenario of a single application. The single application can be completely hosted in serverless mode to complete the launch of middle and background applications, which can also save operation and maintenance capacity and reduce costs.
2. Front end coding changes
What are the changes in coding after applying serverless in the front-end scenario?
Everyone who has some knowledge of the front end knows that the front end is generally divided into three layers: state, view and logic engine. At the same time, it will sink some abstract business logic into FAAS layer cloud functions, and then use cloud functions as FAAS APIs to provide services. In terms of coding, various aactions can be abstracted, and each aaction can have FAAS function APIs to provide services.
Take a simple page as an example. On the left side of the page are some rendering interfaces, which can obtain commodity details, receiving address, etc., which are implemented based on FAAS API; On the right are some interactive logic, such as purchase, addition, etc., which can be completed by FAAS API.
In page design, all FAAS APIs can not only be used for one page, but can be reused for multiple pages. After reusing these APIs or dragging, the front-end page can be assembled, which is very convenient for the front-end.
3. R & D and efficiency improvement of front-end light application: 1-5-10
After applying serverless on the front end, we simply summarize the effect of serverless on the R & D efficiency of the front end as 1-5-10, which means:
1 minute quick start: we summarize the main scenarios and classify them as application templates. When each user or business party starts a new business, it only needs to select the corresponding application startup template to help the user quickly generate business code. Users can start quickly by writing their own business function code.
5 minutes online application: completely reuse the operation and maintenance platform of serverless, and use the natural ability of the platform to help users complete gray publishing and other capabilities; And cooperate with front-end gateway, cutting flow, etc. to complete Canary test and other functions.
Troubleshooting in 10 minutes: Based on the serverless function after online, it provides the display of business indicators or system indicators. Through indicators, you can not only set alarms, but also push error logs to users on the console to help users quickly locate and analyze problems, and master the health status of the whole serverless function within 10 minutes.
4. Front end landing serverless effect
What is the effect of the front-end implementation of serverless? We compare the performance and man hours required by the three apps in the traditional application R & D mode with the FAAS scenario. It is obvious that on the basis of the original cloud native,The efficiency can also be improved by 38.89%, this is very effective for serverless applications or front-end applications. At present, the serverless scenario has almost covered the whole group to help the business party realize serverless and realizeImprove efficiencyandcost reductionTwo main objectives.
4、 Technology output and expand new scenarios
During the implementation of the group’s serverless, we found many new business demands, such as how to quickly migrate stock businesses and save costs? Can the execution time be increased or extended? Can resource allocation be increased? And so on. We propose some solutions to these problems. Based on these solutions, we abstract some functions of the product. Next, we introduce some important functions:
1. User defined image
The main purpose of user-defined image is to realize seamless migration of stock business, help users realize zero code transformation, and completely migrate business code to serverless platform.
The migration of stock business is a very big pain point. In a team, it is impossible to have two R & D modes for a long time, which will cause great internal friction. To move the business side to the serverless R & D system, we must launch a thorough transformation scheme to help users realize the transformation of the serverless system. We not only need to support the use of serverless for new businesses, but also help stock businesses realize zero cost rapid migration. Therefore, we have launched the self-defined container function.
Characteristics of traditional web monomer application scenarios：
- Apply modern fine-grained responsibility splitting, service governance and other operation and maintenance burdens;
- The historical burden is not easy to be serverless: the business code on and off the cloud is not unified in dependence and configuration;
- Capacity planning, self built operation and maintenance and monitoring system;
- Low resource utilization (low traffic services monopolize resources).
Function calculation + container image advantage：
- Low cost migration monomer application;
- Operation and maintenance free;
- Automatic scaling without capacity planning;
- 100% resource utilization to optimize idle costs.
The custom container function allows traditional single web applications (such as springboot, WordPress, flask, express, rails and other frameworks) to migrate to function computing in a mirror manner without any transformation, so as to avoid resource waste caused by monopolizing the server by low traffic services. At the same time, you can also enjoy the benefits of no capacity planning, automatic scaling, freight free and so on.
2. Performance examples
High performance instances, reduce use restrictions and expand more scenarios. For example, the code package is increased from 50m to 500m, the execution time is increased from 10 minutes to 2 hours, and the performance specification is more than 4 times higher than the original. It can support large-scale instances of 16g and 32g to help users run some very time-consuming and long tasks.
Function computing services many scenarios. In the service process, we have received many demands, such as many constraints, high use threshold, insufficient computing scenario resources and so on. Therefore, for these scenarios, we have introduced the performance instance function. The goal is to reduce the use restrictions of function computing application scenarios and reduce the use threshold. In terms of execution time and various indicators, users can configure flexibly and on demand.
The 16 core 32G as like as two peas, which we support, is fully equipped with the same computing power as the same specification ECS, and can be applied to high-performance business scenarios such as AI reasoning, audio and video transcoding, etc. This function is very important for the subsequent expansion of application scenarios.
- Elastic instances have many constraints, and there is a certain threshold for use, such as execution time, instance specification, etc;
- In traditional single application, audio and video and other heavy computing scenarios, the business needs to be split and transformed to increase the burden;
- Vcpu, memory, bandwidth and other resource dimensions are not explicitly promised by elastic instances.
- Reduce the use limit of function calculation and reduce the use threshold of enterprises;
- Compatible with traditional applications and recalculation scenarios;
- Give users a clear commitment to resources.
- Launch performance examples with higher specifications and clearer resource commitments;
- In the future, performance instances will have higher stability SLAs and richer functional configurations.
Computational tasks, long running tasks, elastic scaling insensitive tasks.
- Audio and video transcoding processing;
- AI reasoning;
- Other computing scenarios requiring high specifications.
In addition to relaxing the restrictions, the performance instance still retains all the capabilities of the current function computing product: pay as you go, reservation mode, single instance multiple requests, integration of multiple event sources, disaster recovery in multiple availability zones, automatic scaling, application construction and deployment, operation and maintenance free, etc.
3. Link tracking
Link tracking functions include link restoration, topology analysis and problem location.
A normal microservice does not complete all work in one function and needs to rely on upstream and downstream services. When the upstream and downstream services are normal, link tracking is generally not required. However, if the downstream services are abnormal, how to locate the problem? At this time, you can rely on the link tracking function to quickly analyze the upstream and downstream performance bottlenecks or locate the occurrence point of the problem.
Function computing has also investigated many open source technology solutions inside and outside the group. At present, it supports X-Trace function, is compatible with open source solutions, embraces open source, and provides opentracing compatible product capabilities.
The above figure is the demo diagram of link tracking. By calculating tracing, you can visually see the database access cost of back-end services, avoid complex verification relationships between a large number of services, and increase the difficulty of problem troubleshooting. Function calculation also supports function code level link analysis capability to help users optimize cold start, key code implementation, etc.
Serverless products bring huge benefits from a business perspective, but packaging also brings a phased problem – black box problem. When we provide users with link tracking technology and expose black box problems to users, users can also improve their business capabilities through these black box problems. This is also the direction of improving the user experience of serverless in the future. In the future, we will continue to increase investment in this regard and reduce the cost of using serverless.
- Serverless products have great benefits from a business perspective, but packaging brings black box problems;
- Serverless is connected to the cloud ecosystem, and a large number of cloud services cause complex call relationships;
- Serverless developers still have requirements for link restoration, topology analysis, problem location, etc.
Main advantages of FC + X-Trace：
- Function code level link analysis to help optimize the implementation of key codes such as cold start;
- Service invocation level link tracking helps to connect cloud ecosystem services and analyze distributed links.
4. Asynchronous configuration
In the serverless scenario, we provide functions such as offline task processing and message opposite consumption. In function calculation, the utilization rate of such functions accounts for about 50%. In a large number of message consumption, there are many asynchronous configuration problems, which are often challenged by the business side. For example, where do these messages come from? Where are you going? By what services? Consumption time? What is the success rate of consumption? wait. The visualization / configurability of these problems is an important topic to be solved at present.
The above figure shows the working principle of asynchronous configuration. First, the asynchronous call is triggered from the event source specified by the user. The function calculation immediately returns the request ID. at the same time, you can also call the execution function to return the execution result to the function calculation or message queue MNS. Then, the trigger can be configured through the event source, and these effects or themes can be consumed again. For example, if a message processing fails, you can configure it for secondary processing.
Typical application scenarios：
- everythingEvent closed loopFor example, analyze the delivery results (such as collection of monitoring indicators and alarm configuration); In production events, customers can not only use FC consumption events, but also use FC to actively produce events.
- The second isDaily exception handlingFor example, failure handling, retry strategy, etc.
- Three isrecycling, users can customize the inventory time, discard useless messages in time and save resources, which is a great optimization for asynchronous scenarios.
Introduction to the author：
Zhao Qingjie (Lu Ling), currently working in Alibaba cloud’s native serverless team, focuses on serverless, PAAS and distributed system architecture, and is committed to building a new generation of serverless technology platform to make the platform technology more inclusive. Once worked in Baidu, responsible for the largest PAAS platform in the company, undertook 80% of online business, and had rich experience in PAAS direction, back-end distributed system architecture and other fields.
This article is the original content of Alibaba cloud and cannot be reproduced without permission