On December 14, 2019, youpai cloud and Apache APIs IX community held the best practice of API gateway and high performance service open talk Guangzhou station. Li Ling, background technical director of hellotalk, Inc., shared the topic of “hellotalk’s global exploration based on openresty”.
Li Ling, head of back-end technology at hellotalk, Inc., focuses on the architecture of service going to sea and IM service and related technology platform based on golang / CPP, and has 5 years of service governance and use experience based on openresty, Apache APIs IX Committee.
Here is the full text:
Hello everyone, I’m Li Ling from hellotalk. This time, I mainly introduce what business hellotalk does and how to use openresty and Apache APIs IX based on what scenarios.
Hellotalk: technically, it’s based on tiny wechat around the world
Hellotalk is the world’s largest social community for foreign language learning, with 16 million users learning 150 foreign languages, conducting cross-cultural exchanges and making friends through hellotalk and global language partners. Users are widely distributed in China, Japan, South Korea, the United States, Europe, Brazil and other countries, of which overseas users account for 80%. From a technical point of view, hellotalk is a tiny version of wechat based on the world.
Hellotalk has many KOL users abroad who help promote it on youtube, instagram, twitter and other platforms. It has a high popularity. Its products take into account chat, error correction, translation and other functions. Users can change words while chatting. Voice to text and translation support more than 100 languages.
From the operational level, many enterprises do not know how to do the first step when they go out to sea, and they also face the same problem in technology – how to go out to sea and provide high-quality services for global users. In order to make users in every country have a better experience, we have to mention the help openresty has brought us.
As shown in the figure above, users of hellotalk are scattered. We need to find the best balance point and deploy the connection nodes in the way of cost performance optimization. In the Asia Pacific region, such as South Korea, Japan, China, Yangtze River Delta and Pearl River Delta, the distribution of users is relatively concentrated and easy to handle. However, in other regions with highly dispersed users (such as Europe, Africa and the Middle East in the figure above), it poses a higher challenge to provide stable and reliable services.
Why use openresty
In the early days, hellotalk used C + + to write IM services. At that time, it used the high-performance network framework of a large factory. The protocols were all drafted internally, and HTTP protocol was rarely used. This is a high cost for small companies. Assuming that the internal writing service is to be exposed to external use, you need to develop your own proxy server, and the newly added command word needs to be adapted, which is very troublesome.
So from 2015, hellotalk began to introduce openresty. Based on openresty, it acts as an agent in the front, directly converts the protocol to internal services, reducing a lot of costs.
In addition, if the service is simply exposed to external use, it will need the function of WAF. Some of our early APIs were implemented based on PHP, and some vulnerabilities were often found due to the framework, which led some hackers to do various injections and attacks. The main method was to post various PHP keywords or carry PHP keywords in the URL.
At that time, we solved this problem by adding a small amount of code (based on regularity) to openresty. Later, we found that even if we added the WAF function, there would not be much loss in performance.
- TLV：0x09+Header(20 bytes)+Body+0x0A
In the early days of IM development, we all wanted the protocol to be short and concise, and the protocol header of hellotalk was also relatively simple, all of which were TCP protocol bodies. Interestingly, by adding two special byte symbols before and after, we can define the middle content, that is, 0x09 + header (20 bytes) + body + 0x0a, which can basically ensure that the data packet will not be disordered. If there are no two packets before and after 0x09 and 0x0a, there is still a certain probability that wrong packets will be generated.
- Due to the research and development cost of customized Protocol HTTP, efficient proxy service is urgently needed for protocol conversion
In the early days, hellotalk adopted the TLV + Pb protocol mode. At that time, the business was developing rapidly, and it needed to be changed to external real + JSON. The first step was to transfer PB to JSON.
There is a problem in protocol parsing: openresty uses the PBC parser written by Yunfeng, which is very troublesome to parse and write. You must know the structure of the inner layer. Suppose the structure has three layers. You have to write three layers of judgment code and throw it out layer by layer. However, it was later found that Apache APIs IX was based on Lua protobuf, so we changed to use Lua protobuf library, which can directly convert a Pb object into JSON, which is very convenient.
- Security analysis of TCP protocol based on cosocket
The process of protocol parsing is basically to read the socket constantly, read the length field in the packet header in the figure above, and then read the body section. You can see that it is troublesome to parse the protocol, because you have to adapt each protocol.
- Fast implementation of a web im
After we finished the C + + IM communication service at that time, we saw that the mainstream im apps such as WhatsApp and wechat all had web im. We quickly compatible and modified their protocols based on openresty. In about two weeks, we quickly implemented a webim version of hellotalk from the server.
Just like the wechat web version, it scans, logs in and chats, basically does not change the protocol, only adds a layer of openresty in the middle to do websocket protocol conversion.
- Control message frequency
If public services are exposed, people will send messages to all people frequently, so we need to do message flow limiting, which is directly based on resty.limit.req Yes, of course, the same is true for API frequency control.
- WAF protects PHP service
Having done PHP development, we should know that all kinds of intrusions are actually function names and keywords injected into PHP. But when I put all the PHP function names in the WAF, I never found that they were attacked again, but I found many in the log, which means that they were all intercepted and couldn’t reach PHP.
1. Fast implementation of pure TCP protocol;
2. HTTP service exposure based on openresty;
3. API gateway (APACHE APIs IX) plus golang microservice development and governance.
Challenges and problems in the process of internationalization
- Hellotalk user distribution area is very scattered, we need to find a way to solve the problem of user distribution area dispersion;
- About 20% of hellotalk users in China are faced with firewall problems;
- The overseas language environment is as complex as the network environment, so it is difficult to deal with the problem of language adaptation.
How to improve the global access quality of users
I have compared the solutions offered by many service providers on the market
1. Alibaba cloud global acceleration (BGP + dedicated line) directly means layer 4 acceleration.
2. Alibaba cloud dcdn accelerates the whole station.
3. The global accelerator scheme of AWS.
4. The XPath scheme of ucloud.
5. Dedicated line service (VPC at both ends, dedicated line in the middle, edge offload, HTTPS)
But we need to consider two issues: cost and real service quality.
When solving cross-border problems, we need to consider the geographical location of 20% of domestic users and the company’s headquarters, so we accelerated the deployment based on the whole Alibaba cloud site. Originally, we all used the public network agent to Alibaba cloud in Hong Kong, and adopted the form of VPC on both sides and private line in the middle. But sometimes we encountered the problem of delay increase caused by the jitter of private line network, so we made an openrest based solution in Shenzhen Gateway agent for. But the actual situation is: if the special line is not available, choose the public network, the public network delay is about 14ms, the special line is 4ms.
This will involve upstream detection. When the line is blocked, you need to quickly switch to another line. This part of the problem is based on the resty library provided by youpaiyun.
From Alibaba computer room in Hong Kong to Tencent computer room in Hong Kong, we feel that they are in the same area, because our test delay is about 0.3ms ~ 0.4ms.
For other overseas users, they are basically directly accelerating their return to Alibaba in Hong Kong, but the direct acceleration will cause the network quality of clients to be seriously affected by geographical problems. Therefore, we have set up some fail over mechanisms to ensure the user experience.
Access line control and flow management
- For example, the delay from Europe to Hong Kong is 244 MS to 150 ms;
- Dynamic upstream control (Lua resty healthcheck) ensures the reliability of services by flexibly switching between multiple service providers;
- Part of the logic can be directly processed at the edge. Serverless (the principle is based on pcall + loadstring). We now transform the serverless into apsche apimix + etcd.
Access node and quality control
At present, hellotalk’s access nodes are mainly distributed in the eastern United States, Frankfurt, Singapore, Tokyo and Hong Kong. It may not be possible for the United States to go directly to Hong Kong. At this time, it will return to Hong Kong through Germany according to the established mechanism. Japan and South Korea will also return to Hong Kong. There are also many users in Brazil, but only AWS is doing it for Brazilian cloud manufacturers. Basically, all of them are connected to the United States. If they are not connected, they will choose between multiple lines. In fact, this link is completed by cloud vendors or CDN vendors, but it is found that there are always some areas that do not do well. Therefore, in order to ensure that the user experience is not damaged, we have to have some fail over mechanism to ensure the switching between multiple service providers and ensure that the user’s service is reliable.
Choice of 7-layer and 4-layer acceleration
Many service providers provide 7-tier acceleration and 4-tier acceleration, but there are also some problems to be solved.
- Layer 4 acceleration: SSL handshake time is too long, easy to fail, unable to get the client’s IP, inconvenient to do node quality statistics.
Layer 4 acceleration can’t get the client’s IP. (Note: some cloud vendors support it, but need to patch it on the server). It provides this function in the TCP packet, which is not very friendly. If there is a problem with the patch, who will take the responsibility?
In addition, the quality of monitoring has become a problem. We need to know which line is good and which line is not. Although there is a switching mechanism, we need to know its real communication route. As a matter of fact, we will take the real IP with us in every traffic layer agent. If we use alicloud, alicloud will help us fill in a header and constantly bring the real IP of the client to the next node.
- Layer 7 acceleration: it can’t guarantee that IM service needs long connection to keep reliable message arrival
The problem of layer 7 acceleration is that it makes IM service mechanism become long polling or short connection round robin mechanism. However, in the actual process, we find that it consumes a lot of traffic, and IM service needs a long connection to ensure the reliable and timely arrival of messages. However, most layer 7 accelerators do not support websocket, and some manufacturers support the edge of websocket to unload HTTPS It’s also very expensive, especially for foreign products like AWS. In addition, if the edge node of the cloud manufacturer goes down, it will have a bad impact on the users. Therefore, we have done a lot of logic design (built-in IP mechanism) in the client between multiple cloud manufacturers. In case of failure, we can effectively guarantee the switch to another node and ensure the connection quality.
Management scheme of global access in multi cloud environment
- It supports 7-layer acceleration of websocket. (cloud service + self built)
- Self built low rate VPC + dedicated channel. (considering the cost performance, Im does not have a lot of traffic. It only sends notification messages.)
- Mixed message sending and receiving: websocket + long polling + httpdns + built-in IP failure mechanism
Of course, which IP is built into the client is also a problem. For example, for European users, it is definitely necessary to allocate the European IP. First of all, we need to store the European server IP. How can we save it? When can I deposit it? Here, we allocate, cache and update through the httpdns + openrest timer mechanism of Tencent cloud. The IP in the figure above is the user’s real IP. At this time, the httpdns service provider will do the IP resolution of the domain name according to the IP parameters.
From self built API gateway to deep experience of Apache APIs IX
Self built API gateway for dynamic camouflage
In the early days, we directly changed the system nginx.conf I think the performance of bare nginx is definitely the highest. But the problem is that many people don’t necessarily remember the priority order rules of location configuration, and we often correct them. Moreover, our requirements are relatively fixed: dynamically update SSL certificate, location, and upstream. At that time, the practice was similar to the current k8s update mechanism, that is, generate nginx through template_ template.conf +JSON -> PHP -> nginx.conf ->PHP cli > reload. However, this solution can be considered to be replaced after encountering Apache APIs IX.
Apache APIs IX becomes hellotalk’s choice:
- Its own requirements are relatively simple, and it is too heavy to rely on RDMS, which brings additional maintenance costs;
- Code is extremely simple and easy to understand, within the scope of personal ability, can be understood;
- Based on etcd, it can save maintenance cost;
- The main maintainers of the project have almost real-time online support, and the QQ group and e-mail response are timely.
From 0 to 1: Apache API IX’s way to Apache
Alibaba Wang fakang: the evolution of load balancing algorithm of Alibaba’s seven layer traffic entrance