JD Zhilian cloud, as the cornerstone of JD group’s technical support, carried the Pb level data growth pressure of JD logistics and JD retail core business system during 11.11, which is the peak of access during 618 in 2020258%。 As of the early morning of November 12, JD’s cumulative order amount on November 11 exceeded271.5 billion yuan, up from 201933% 。
Facing the rapid growth of orders and turnover of JD 618 and 11.11 every year, JD Zhilian cloud database, as the technical support of most of the business systems behind JD, faces great challenges. This article will share how JD Zhilian cloud database Department ensured the stability and security of the sudden increase in data pressure during the promotion on November 11, and introduce what technical means were available to escort JD’s massive orders during the promotion?
JD’s life cycle process for each order of users(product search – purchase – add shopping cart – order)All need real-time response, especially in the ordering of goods. 11.11 during the promotion period, affected by commodity activities and preferential time, users often place orders in a fixed time period, such as a small peak after zero. In the face of the sudden increase in the number of orders during the centralized time period, QPS is the total amount of the whole day at ordinary times30-50%At the same time, in order to ensure the real-time of user order life cycle and query operation, JD adopts cloud database jchdb as the technical cornerstone of data analysis and guarantee.
Analytical cloud database jchdb is an online analysis (OLAP) service built by JD Zhilian cloud based on Clickhouse. It adopts a distributed architecture and can realize multi-core and multi-node parallel large-scale queries. Its query performance is 1 ~ 2 orders of magnitude faster than that of traditional open source databases, which can fully meet the needs of business system data analysis during the promotion period.
The real-time analysis of JD’s massive orders requires the fast query and concurrent processing capability of the data warehouse. The user’s order life cycle data consumes Kafka messages in real time, flows through Flink and then writes them to the jchdb cluster. This scenario needs to ensure not only the real-time performance of data analysis, but also the performance of the cluster, and the query response time remains unchanged. Jchdb supports a write speed of about 50-200mb / s for batch data writing, but real-time order analysis often writes in small batches frequently, which is not friendly to the zookeeper node of the cluster. Cloud databaseBy optimizing the JVM parameters of zookeeper and the concurrency of cloud disk, users can write in small batches with high frequency,Stable operation on jchdb cluster.
In the scenario of real-time writing by users, it will happen that the data writing speed is too fast and the data merge is not timely, resulting in frequent write failures. Cloud database teamThe write operation to jchdb cluster is optimized. On the premise of meeting the constant real-time query performance, write in large quantities at low frequencies as much as possible, so as to give full play to the high-performance advantages of the cluster.At the same time, it also optimizes the parameters of the cluster, which can serve the business system well when it is under great pressure, and improves the overall stability of the system.
The overall flow chart of jchdb’s analysis of massive real-time order data is as follows:
JD Zhilian cloud database team needs to steadily support thousands of core business systems of JD group that have been put into the cloud during November 11 to resist the pressure of millions of QPS and Pb level data during the promotion period. Early plan preparation and pressure test, plan drill and real-time monitoring are essential links. 11.11 during the promotion period, cloud resources can be expanded as needed, and a complete business degradation plan can be prepared to cope with sudden business pressure.
Every year, the promotion and preparation during 11.11 is a critical moment for all departments of JD group to work together. According to 618 experience, the promotion and preparation during 11.11 is divided into 8 steps:
(1) Identify the scope of support;
(2) Business volume estimation and pre inspection;
(3) Plan arrangement;
(4) Monitoring and alarm sorting;
(5) Business pressure test;
(6) Plan drill;
(7) 11.11 duty;
(8) Technical recovery.
For specific technical details of promoting war preparation, please refer toJingdong promotion and preparation manual。
8-step flow chart of great promotion and preparation:
Plan sorting is an important link for cloud database to ensure the stability and security of massive data during 11.11. Through a series of database level technical means such as service high availability architecture, automatic failover and elastic capacity expansion mechanism, RDS ensures that there are certain plans and response mechanisms during 11.11 promotion to ensureData can be backed up, failover, incremental capacity can be expanded,Calmly deal with the massive data pressure during the promotion period. Customers only need to pay attention to the growth of the business itself without worrying about the troubles caused by the rapid growth of data operation and maintenance and business pressure.
Data can be backed up:JD Zhilian cloud database adopts high availability architecture and supports two deployment modes: single availability zone and multiple availability zones. During the creation of single availability zone deployment, the anti affinity ensures that the active and standby can not be in the same rack, so as to avoid the problem that the instance cannot be accessed due to single rack failure. Multi availability zone deployment reduces the network latency to within 2 milliseconds to ensure the timeliness of standby database replication. The self-developed sentinel system supports dynamic expansion and can carry a large number of instance monitoring services. It can detect and access instances with abnormal heartbeat reports through HTTP, TCP and other protocols. It can objectively offline and initiate automatic high availability processes when most sentinel nodes vote.
Schematic diagram of cloud database high availability architecture:
Failover:Another advantage of high availability architecture is that it can achieve second level failover. When the cloud database management node receives the automatic high availability request from the sentinel system, it will probe the fault instance again, confirm the connectivity from the user subnet and management network, and start the failover after ensuring that the main database is truly offline. In order to ensure data completion, the backup database playback log will be applied first, and then the target IP behind the VIP will be switched, so that the service can continue to access after a short-term failure. At the same time, the system will automatically create a new standby database to ensure high availability architecture.
Automatic failover process of cloud database:
Incremental expandable capacity:JD Zhilian cloud database team and cloud disk team cooperated in depth to optimize the read-write performance and make full use of the unique elastic expansion and incremental snapshot technology of cloud disk to make database expansion simple and efficient. Instances using cloud disk can be expanded to any storage space within 3 ~ 5 minutes. The instance of local disk supports in-situ vertical capacity expansion. It takes effect in seconds through online thermal expansion to meet the capacity expansion needs of users.
Schematic diagram of cloud disk based capacity expansion of cloud database:
JD Zhilian cloud database department not only supported the order and data analysis of real-time massive data to promote orders during 11.11, but also a series of standardized schemes and complete technical preparation processes to support the challenges of massive data and business during 11.11. During the “good start” of November 11 this year, the overall QPS peak of cloud database reached5024000 times / S, peak data flow1183Gbps, JD Zhilian cloud carried the peak of traffic and ensured the smooth operation of all business systems during the promotion period.
JD Zhilian cloud database provides one-stop database services from creation, configuration, capacity expansion, monitoring and alarm, performance analysis, and realizes the transformation from self-service operation and maintenance to automatic operation and maintenance,It supports a large number of core businesses such as JD retail, JD logistics, JD AI and JD health, and responds to the challenges of peak data flow and business pressure during the promotion through a series of standardized schemes and preparation processes.At the same time, JD Zhilian cloud database has the characteristics of high availability of services, high reliability of data, online elasticity and scalability, which can meet the sudden peak pressure and harsh business scenarios of users. After many great promotion tests, it is the best choice for enterprises to go to the cloud.
- 11.11 Devops in the war preparation guide
- 11.11 PAAS of war preparation guide
- 11.11 security chapter of war preparation guide
Welcome to click【JD Zhilian cloud】, learn about the developer community
More wonderful technical practices and exclusive dry goods analysis
Welcome to the official account of Jingdong developer cloud.