Add picture annotation, no more than 140 words (optional)
Dongdao Alibaba technology expert guide: hosting knative out of the box, you don’t need to pay any cost for these resident instances. Combined with the ability of gateway provided by SLB cloud products and the reservation specification function based on sudden performance instance, it can greatly save your IAAs expenses, and every penny you pay is not wasted. < < Alibaba cloud primary official account > reply to the report to download the complete survey report > CNCF released annual survey report shows that Serverless technology in 2019 has been further recognized. 41% of the respondents said they were already using serverless, while 20% said they planned to adopt serverless technology in the next 12-18 months. Among the many open source serverless projects, knative is the most popular one. As shown in the figure below, knave accounts for 34% of the share, far ahead of openfaas, which is the first choice to build its own serverless platform.
The popularity of knative has something to do with the ecology of the container. Different from FAAS mode, knative does not require users to make very big changes to the application. As long as the user’s application is containerized, it can be deployed in knative. Moreover, knative provides a more focused application model on kubernetes, so that users don’t need to spend energy on application upgrade and traffic gray, all of which are automatically completed. The development of virtual machine before the emergence of cloud computing, enterprises need to rent the physical machine in IDC before the emergence of cloud computing, and then deploy the application on IDC physical machine. The performance of physical machines has been growing at the rate of Moore’s law for the past ten years. This leads to a single application can not make full use of the resources of the entire physical machine. So we need a technology to solve the problem of resource utilization. Simply think that if one application is not enough, deploy several more. However, the mixed deployment of multiple applications under the same physical machine will bring many problems, such as:
- Port conflict
- Resource isolation
- System dependence and operation and maintenance difficulties
In this way, the deployment of multiple virtual machines on one virtual machine can not only solve the problem of multiple applications on one virtual machine. With the development of the enterprise, an enterprise may maintain many applications. Each application needs a lot of operations such as publishing, upgrading, and rolling back. At the same time, these applications may also need to deploy one set in different regions. This brings a lot of operation and maintenance burden. The most important problem is the application running environment. Therefore, container technology emerged later. Container technology not only has almost the same isolation experience as VM, but also brings a huge innovation, container image. Through the container image, it is very easy to copy the running environment of the application. Developers only need to make the application dependencies in the image. When the image is running, it directly uses its own dependency to provide services. This solves the running environment problems in the process of application release, upgrade, rollback and multi region deployment.
When people begin to use container technology on a large scale, it is found that the burden of maintaining the instance running environment is greatly reduced. At this time, the biggest problem is the coordination of multiple instances and the coordination among multiple applications.
Therefore, kubernetes appeared soon after the popularity of container technology. Different from the previous VM and container technology, kubernetes is naturally a distributed end state oriented design, not a single machine capability. Kubernetes abstracts a more friendly API for IAAs resource allocation, and users do not need to pay attention to the specific allocation details. Kubernetes controller will automatically complete the allocation, fail over and load balancing according to the life oriented to the end state. This makes the application developers do not need to care about where the specific instance is running, as long as kubernetes can allocate resources when necessary. Whether it is the early physical machine model or the current kubernetes model, application developers themselves do not want to manage any underlying resources, application developers just want to run the application. In the physical machine mode, people need to monopolize the physical machine. In kubernetes mode, people don’t care which physical machine their business processes are running on. In fact, they can’t predict in advance. As long as the application can run well, as for where it runs, it doesn’t really matter.
The whole process of physical machine, virtual machine, container and kubernetes is actually to simplify the threshold for the application to use IAAs resources. In this evolution process, we can find a clear context. The coupling between IAAs and application is getting lower and lower. As long as the basic platform allocates the corresponding IAAs resources to the application when the application needs to run, the application manager is only the user of IAAs and does not need to perceive the details of IAAs allocation. Before introducing knave, let’s take a look at the differences between using kubernetes for traffic access and application publishing, and knave for traffic access and application publishing through a web application.
As shown in the figure below, kubernetes mode is on the left, and gnative mode is on the right.
- In kubernetes mode: 1. Users need to manage the progress controller by themselves; 2. Maintain the relationship between progress and service in order to expose services; 3. If you want to do gray level observation during publishing, you need to use multiple deployment rotations to complete the upgrade;
- In the naive mode, the user only needs to maintain a naive service resource.
Of course, knative cannot completely replace kubernetes, which is based on kubernetes’ capabilities. In addition to the different resources that users need to manage directly, kubernetes has a huge concept difference: kubernetes is used to decouple IAAs and applications, and reduce the cost of IAAs resource allocation. Kubernetes mainly focuses on the arrangement of IAAs resources. While knative is more inclined to the application layer, and it is an application arrangement with elasticity as the core. Knative is a server less choreography engine based on kubernetes. Its goal is to develop a cloud native, cross platform serverless choreography standard. This serverless standard is realized by integrating container building, workload management (dynamic scaling) and event model. Serving is the core module to run the server less workload.
- Application Hosting
- Kubernetes is an abstraction oriented to IAAs management. There are more resources to be maintained when directly deploying applications through kubernetes
- Through the knave service, a resource can define the hosting of an application
- Traffic management
- Knative applies traffic through the gateway results, and then it can segment the traffic by percentage, which lays the foundation for basic capabilities such as elasticity and grayscale
- Grayscale Publishing
- It supports multi version management, and it is easy to implement online service provided by multiple versions of the application at the same time
- Different versions can set different traffic percentages, and it is easy to realize the functions such as grayscale publishing
- The core capability of knative to help applications save costs is flexibility, which can automatically expand when traffic increases and automatically shrink when traffic drops
- Each grayscale version has its own elastic policy, which is associated with the traffic allocated to the current version. Knative will make decisions on capacity expansion or reduction according to the amount of traffic allocated
Please move here or here to learn more. Why is it that kubernetes in ask community needs you to purchase the host in advance and register as a kubernetes node to schedule pod and purchase hosts in advance. This is not in line with the application logic. Application developers just want to allocate IAAs resources when they need to run application instances, and they do not want to maintain complex IAAs resources. Therefore, if there is a kubernetes API that is fully compatible with the kubernetes API of the community, but you do not need to operate and manage complex IAAs resources by yourself, you can automatically allocate resources when needed, which is more in line with the application’s concept of using resources. Ask adheres to this concept and brings you the experience of using serverless kubernetes. The full name of ask is serverless kubernetes, which is a kind of server free kubernetes cluster. Users can directly deploy container applications without purchasing nodes, do not need to carry out node maintenance and capacity planning for the cluster, and pay on demand according to the amount of CPU and memory resources configured by the application. Ask cluster provides perfect kubernetes compatibility, while greatly reducing the threshold of using kubernetes, allowing users to focus on applications rather than managing the underlying infrastructure.
In other words, you can directly create a kubernetes cluster without preparing ECS resources in advance, and then you can deploy your own services. For a more detailed description of ask, see here. When analyzing the history of serverless, the main development thread of serverless summarized by us is that the coupling between IAAs and application is getting lower and lower. As long as the basic platform allocates the corresponding IAAs resources to the application when the application needs to run, the application manager is only the user of IAAs and does not need to perceive the details of IAAs allocation. Ask is the platform for allocating IAAs resources at any time. Knative is responsible for perceiving the real-time status of applications and automatically “apply” IAAs resources (POD) from ask when necessary. The combination of knative and ask can bring you a more extreme serverless experience.
For a more in-depth introduction to ask, see serverless kubernetes – ideals, reality and the future. Highlights: the gateway naive community based on SLB supports istio, GLOO, contour, kourier and ambassador by default. Istio is of course the most popular of these implementations, because istio can be used as a gateway as well as a servicemesh service. Although these gateways are fully functional, the gateways as serverless services are somewhat against their original intention. First of all, gateway instances need to be resident and running. In order to ensure high availability, at least two instances must be backup to each other.
Secondly, the management and control ends of these gateways also need to be resident for operation. The IAAs fees and operation and maintenance of these resident instances are the costs to be paid by the business. In order to provide users with the ultimate serverless experience, we have implemented the knight gateway through alicloud SLB. All the required functions are provided and are supported by cloud product level.
No resident resources not only save your IAAs cost, but also save a lot of operation and maintenance burden. Low cost reserved instance retention instance is a unique feature of ask knave. By default, the community’s knave can be reduced to zero when there is no traffic, but the cold start problem from zero to one after scaling to zero is difficult to solve. Cold start not only solves the problems of IAAs resource allocation, kubernetes scheduling and image pulling, but also involves the application startup time. The application start-up time ranges from milliseconds to minutes, which is almost uncontrollable at the general platform level. Of course, these problems exist in all serverless products. Most of the traditional FAAS products run different functions by maintaining a public IAAs pool. In order to protect the pool from being full and with extremely low cold start time, the solution of FAAS products is mostly to limit the user’s functions. For example:
- Timeout time for processing request: if you do not get up after this time, it will be considered as failure;
- Sudden increase of Concurrency: by default, all functions have a concurrency upper limit. If the number of requests exceeds this limit, the flow will be limited;
- CPU and memory: cannot exceed the upper limit of CPU and memory.
Ask knative’s solution to this problem is to balance the cost and cold start problem by using low price reserved instances. Alibaba cloud ECI has many specifications. The computing power of different specifications is different, and the price is also different. The following is a comparison of the price of the computational instance and the burst performance instance in the 2c4g configuration.
Through the above comparison, it can be seen that the burst performance instance is 46% cheaper than the computational one. It can be seen that using the burst performance instance to provide services when there is no traffic can not only solve the problem of cold start, but also save a lot of costs. In addition to the price advantage, the burst performance instance also has a very bright function, which is CPU integral.
The burst performance instance can use CPU integral to deal with burst performance requirements. When the performance cannot meet the load requirements, the computing performance can be improved seamlessly by consuming the accumulated CPU points without affecting the environment and application deployed on the instance. Through CPU points, you can allocate computing resources from the perspective of the overall business, and transfer the remaining computing power of the business in the peak period seamlessly to the peak period.
See here for more details on burst performance instances. Therefore, the strategy of ask naive is to replace the standard computing instance with the burst performance instance in the service trough, and switch to the standard computing instance seamlessly when the first request comes. This can help you reduce the cost of low traffic, and the CPU points obtained at the low point can also be consumed when the business peak comes, and every penny you pay is not wasted.
The use of burst performance instances as reserved instances is only the default policy, and you can specify other types of instances that you want as the specification of reserved instances. Of course, you can also turn off the ability to reserve instances by specifying a minimum reserved standard instance. Demo shows that after the server less kubernetes (ask) cluster is created, you can apply for the knave function through the following nail groups. You can then use the capabilities provided by knative directly in the ask cluster.
After the knave function is turned on, a service called ingress gateway will be created in the naive serving namespace. This is a loadbalance type service, and an SLB will be automatically created through CCM. As shown below, 126.96.36.199 is the public IP of SLB. Next, you can access the knight service through this public IP. #Kubectl – N reactive serving get SVC name type cluster-ip external-ip port (s) age ingress gateway loadbalancer 172.19.8.35 188.8.131.52 80:30695/tcp 26h, we will start our follow-up series of operations from a cafe.
Suppose the coffee shop has two categories: coffee and tea. Let’s deploy coffee service first and then tea service. At the same time, it will also include the demonstration of version upgrade, traffic grayscale, custom ingress and automatic elasticity. Deploy the coffee service and save the following to coffee.yaml File, and then deploy to the cluster through kubectl:
# cat coffee.yaml apiVersion: serving.knative.dev/v1 kind: Service metadata: name: coffee spec: template: metadata: annotations: autoscaling.knative.dev/target: “10” spec: containers: – image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:160e4dc8 env: – name: TARGET value: “coffee”
Execute kubectl apply – F coffee.yaml
Deploy the coffee service. After a while, you should see that the coffee service has been deployed.
# kubectl get ksvc NAME URL LATESTCREATED LATESTREADY READY REASON coffee http://coffee.default.example… coffee-fh85b Coffee-fh85b true coffee.default.example . com is the only subdomain created by knative for each ksvc.
Now you can access the deployed service by specifying the host and the SLB public IP address through the curl command. As shown below, hello coffee! Is the content returned by the coffee service.
# curl -H “Host: coffee.default.example.com” http://184.108.40.206 Hello coffee!
Autoscaler is a first-class citizen of knative, which is the core capability of knative to help users save costs. Knative’s default kPa elastic policy can automatically adjust the number of pods according to real-time traffic requests. Now let’s take a look at the current pod information
#Kubectl get pod name ready status restarts age coffee-bwl9z-deployment-765d98766b nvwmw 2 / 2 running 0 42s you can see that there is one pod running now. Next, start preparing for pressure test.
Before starting the pressure test, let’s review the configuration of coffee application. There is one such configuration in yaml, as shown below autoscaling.knative.dev/target “10” means that the maximum concurrency limit of each pod is 10. If there are more than 10 concurrent requests, new pods should be expanded to accept requests. See here for a more detailed introduction to knave autoscaler.
# cat coffee-v2.yaml … … name: coffee-v2 annotations: autoscaling.knative.dev/target: “10” spec: containers: – image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:160e4dc8 … …
OK, let’s start the pressure test now. Use the hey command to initiate the pressure test request. The download link of the hey command is as follows:
hey -z 30s -c 90 –host ” coffee.default.example .com” ” http://220.127.116.11/?sleep=100 “The meaning of the above order:
- -Z 30s means continuous pressure measurement for 30 seconds;
- -C 90 indicates that 90 concurrent requests are used to initiate pressure test;
- –host ” coffee.default.example . com “means to bind the host;
- ” http://18.104.22.168/?sleep=100 “Is the request URL, where sleep = 100 represents sleep 100 ms in the test image, simulating a real online service.
Carry out the above command for pressure measurement, and then observe the change of pod number. You can use the command Watch – N 1 ‘kubectl get pod’ to view the number of pods monitored by pods. As shown in the following figure, making half side is the change of pod number, and the right side is the process of pressure test command execution. The change of pod in the process of pressure measurement can be observed through the GIF diagram. When the pressure comes up, knative will expand automatically, so the number of pods will increase. When the pressure test is finished, knative monitors that the flow rate will be reduced automatically, which is a fully automatic process of capacity expansion and reduction.
In the highlight section before retaining instances, ask knative uses reserved instances to solve cold start and cost issues. Next, let’s take a look at the switching process between the reserved instance and the standard instance. After the previous pressure test is finished, use kubectl get pod to check the number of pods. You may find that there is only one pod, and the pod name is XXX reserve XX. The meaning of reserve is to reserve instances. At this point, the reserved instance is already used to provide services. When there is no request on the line for a long time, naive will automatically expand the reserved instance, and reduce the standard instance to zero, so as to achieve the purpose of cost saving. #Kubectl get pod name ready status restarts age coffee-bwl9z-deployment-reserve-85fd89b567-vpwqc 2 / 2 running 0 5m24s what happens if there is traffic coming in at this time? Let’s verify it. From the following GIF, we can see that if there is traffic coming in, the standard instance will be automatically expanded. After the standard instance is ready, the reserved instance will be reduced.
The reserved instance will use ecs.t5-lc1m2.small (1c2g) by default. Of course, some applications need to allocate memory (such as the JVM) when they start by default. If an application needs 4G of memory, ecs.t5-c1m2.large (2c4g) may be used as the specification of reserved instances. Therefore, we also provide a method for users to specify the specification of reserved instances. Users can specify the specification of reserved instances through annotation when submitting the knuckle service, such as knative.aliyun.com/reserve -Instance ECI use specs: ecs.t5-lc2m1.nano this configuration means to use ecs.t5-lc2m1.nano as the reserved instance specification. Save the following to coffee set- reserve.yaml Document:
# cat coffee-set-reserve.yaml apiVersion: serving.knative.dev/v1 kind: Service metadata: name: coffee spec: template: metadata: annotations: knative.aliyun.com/reserve-instance-eci-use-specs: “ecs.t5-c1m2.large” autoscaling.knative.dev/target: “10” spec: containers: – image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:160e4dc8 env: – name: TARGET value: “coffee-set-reserve”
Execute kubectl apply – f coffee set- reserve.yaml Commit to kubernetes cluster. Wait a moment. After the new version is reduced to the reserved instance, check the pod list
# kubectl get pod NAME READY STATUS RESTARTS AGE coffee-vvfd8-deployment-reserve-845f79b494-lkmh9 2/2 Running 0 2m37s
By viewing the reserved instance specification of set reserve, you can see that the ecs.t5-c1m2.large specification has been set to 2c4g
# kubectl get pod coffee-vvfd8-deployment-reserve-845f79b494-lkmh9 -oyaml |head -20 apiVersion: v1 kind: Pod metadata: annotations: … … k8s.aliyun.com/eci-instance-cpu: “2.000” k8s.aliyun.com/eci-instance-mem: “4.00” k8s.aliyun.com/eci-instance-spec: ecs.t5-c1m2.large … …
Upgrade coffee service. Before upgrading, let’s take a look at the current pod instance:
#Kubectl get pod name ready status restarts age coffee-fh85b-deployment-8589564f7b-4lsnf 1 / 2 running 0 26S now let’s upgrade the coffee service. Save the following to the coffee-v1.yaml file:
# cat coffee-v1.yaml apiVersion: serving.knative.dev/v1 kind: Service metadata: name: coffee spec: template: metadata: name: coffee-v1 annotations: autoscaling.knative.dev/target: “10” spec: containers: – image: registry.cn-hangzhou.aliyuncs.com/knative-sample/helloworld-go:160e4dc8 env: – name: TARGET value: “coffee-v1”
Two points have been added to this version of the current deployment:
- A name of coffee-v1 is set for the revision of the current deployment (if it is not set, it will be generated automatically);
- The word V1 is set in the environment variable. In this way, it can be judged from the content returned from HTTP that V1 version is currently provided. Execute the kubectl apply – F coffee-v1.yaml command to deploy V1 version. After deployment, continue to use curl – H “host: coffee.default.example .com” [ http://22.214.171.124 ] http://126.96.36.199 ）To verify.
In a few seconds, you can find that the returned content is hello coffee-v1!. in this process, the service is not interrupted, and there is no need to manually switch. After the modification, you can directly submit to complete the switch between the new version and the old version instance automatically.
# curl -H “Host: coffee.default.example .com” http://188.8.131.52 Hello Coffee-v1! Now let’s take a look at the status of the pod instance. You can see that the pod instance has switched. The old version of pod is automatically replaced by the new version.
#Kubectl get pod name ready status restarts age coffee-v1-deployment-5c5b59b484-z48gm 2 / 2 running 0 54s there are more demo demonstrations of complex functions, please move here. In conclusion, knative is the most popular server less orchestration framework in kubernetes ecology. Community native knative requires resident controllers and resident gateways to provide services. These resident instances not only need to pay IAAs cost, but also bring a lot of operation and maintenance burden, which brings some difficulties to the application of serverless. So we have fully hosted knave serving in ask. Out of the box, you don’t have to pay any cost for these resident instances. In addition to providing gateway capability through SLB cloud products, we also provide reservation specification function based on burst performance instances, which can greatly reduce IAAs expenses for your service in the traffic trough period, and the CPU points accumulated during the traffic trough period can be consumed in the peak traffic period, and every penny you pay is not wasted.