Industry case | application of mongodb in Tencent retail premium code

Time:2022-5-25

This article mainly shares the application of the excellent code business of Tencent smart retail team in mongodb. Using Tencent cloud mongodb as the main storage service has brought great benefits to the business, mainly including: high performance, fast DDL operation, low storage cost, large storage capacity and other benefits, greatly reducing the business storage cost and improving the efficiency of business iterative development.

I Business scenario

From connecting consumers to connecting channel terminals, Tencent Youma realizes the digital upgrading of enterprises based on the digitization of goods, including the upgrading of marketing ability and dynamic marketing ability. Tencent Youma consists of three sub products: genuine product link, store link and member link.

Industry case | application of mongodb in Tencent retail premium code
Overall view of Tencent youcode

1.1 genuine Pintong

Tencent youcode genuine product provides anti-counterfeiting and authentication capability, realizes the whole process genuine product tracing of one object and one code, and stores the whole link data to the blockchain to ensure authenticity; It can also go directly to the private domain of the brand to realize the further transformation of traffic; At the same time, zhengpintong provides brand protection ability in wechat domain, blocks the spread of brand counterfeit websites and helps consumers identify counterfeit goods.

The product mainly includes the following core features:

Industry case | application of mongodb in Tencent retail premium code

1.2 store communication

Tencent Youma store link is the core role of the four retail links of service brands, dealers, industry representatives and terminal stores, so as to realize the upgrading of sales management means and sales promotion based on terminal sales stores.

The product mainly includes the following core features:

Industry case | application of mongodb in Tencent retail premium code

1.3 member communication

Tencent Youma membership is a SaaS + customized service product for retail brands. It takes code scanning as the starting point and connects online and offline scenes. Provide rich code scanning / interactive activity models and activity evaluation system to help brands connect consumers.

The product mainly includes the following core features:

Industry case | application of mongodb in Tencent retail premium code

II Code storage selection

Tencent smart retail premium code business stores the QR code information of retail goods, which is the core data information of smart retail and provides relevant services of “from connecting consumers to connecting channel terminals to realize the digital upgrading of enterprises based on the digitization of goods”. Therefore, the problem of code data storage is the core of the project.

2.1 demand and scheme

To solve the problem of code storage, we first need to analyze the characteristics of code storage. After analysis, the main characteristics of code storage problems are:

Massive data: the commodity QR code made by Tencent youcode. With more and more commodities using Tencent youcode business, the QR code data began to show exponential growth.

Association storage: there are 1:1 and 1: n: n association relationships between codes. You need to store this relationship and provide corresponding association queries.

Multi dimension query: condition queries of different dimensions are required for different application scenarios.

After obtaining the code storage characteristics, after investigation and investigation by many parties, two storage schemes are preliminarily selected:

  1. MySQL + es: MySQL sub database stores symbol data by table, providing reading and writing scenarios requiring high performance; Then part of the data is synchronized to es according to the requirements to deal with various complex query scenarios.
  2. Mongodb: mongodb is the highest ranked distributed NoSQL database in the world. Its core features are no schema, high availability and distributed, which is very suitable for distributed storage.

2.2 scheme analysis

2.2.1 MySQL + es scheme analysis

MySQL + es is a common storage solution and is widely used in many fields, such as member or commodity information storage. The advantage of this scheme is that it can provide a lot of query methods and different performance guarantees, and can deal with a variety of complex business query requirements.

The common architecture of MySQL + es is that the write operation directly acts on MySQL, then synchronizes the data changes to es through Canal + Kafka, and then queries the data from MySQL or ES according to different query scenarios. The following figure is the possible architecture diagram under the Tencent premium code business scenario:

Industry case | application of mongodb in Tencent retail premium code

As can be seen from the architecture diagram, there are several problems in this scheme:

Data synchronization and consistency: this problem will not be affected when the amount of data is small. However, if the amount of data is 10 billion or even 100 billion, it is a very serious problem.

Data capacity: in general, MySQL’s single table data is better maintained at the level of one million. If the amount of single table data is too large, reading and writing is a problem. So if you want to store hundreds of billions of data, you need thousands of tables. When so many sub tables need to be maintained by the business itself, it is almost impossible to develop operation and maintenance.

Cost problem: redundant data storage will increase additional storage costs. At the same time, ES needs more machines and memory to ensure data reliability and query performance. Moreover, ES has the problem of data expansion. For the same data, it needs a larger disk than mysql.

DDL operation and maintenance problem: after MySQL is distributed in sub databases, because DDL statements need to operate a large number of database tables, it is very time-consuming and error prone. According to our previous project experience, when there are hundreds of tables and hundreds of thousands of data in a single table, a simple DDL statement to add fields also takes 1 hour or more to complete.

Development cost: this scheme requires the business to maintain sub database and sub table, synchronize data and select different query engines according to needs. Not only is the whole architecture complex, but it also needs to be carefully considered when making business requirements. If you don’t pay attention to using the wrong storage engine, it may lead to performance problems.

Horizontal capacity expansion: to expand the capacity of MySQL sub database and sub table, the business needs to rehash the data manually. The cost is very high, and it is difficult to deal with the problem of data reading and writing in the process of capacity expansion.

2.2.2 mongodb scheme analysis

Mongodb is a well-known distributed storage engine, which has many advantages, such as no schema, high availability, distributed, data compression and so on. Although mongodb is a NoSQL storage engine, its wired tiger storage engine, like innerdb, uses a B + tree at the bottom. Therefore, mongodb can provide most of the query methods supported by MySQL on the premise of providing distributed storage. Therefore, when using mongodb, we do not need MySQL redundant tables or es to support most distributed queries. In the application scenario of Tencent premium code, the storage architecture based on mongodb is shown in the figure below:

Industry case | application of mongodb in Tencent retail premium code

As can be seen from the figure, mongodb can avoid data synchronization and consistency problems, storage cost problems and resource / operation and maintenance / development costs caused by redundant storage. Moreover, after further testing and analyzing the function and performance of mongodb, we found that mongodb also has the following advantages:

No DDL problem: because mongodb is no schema, the DDL problem of MySQL can be avoided.

Automatic data uniformity: mongodb has automatic rebalancing function, which can automatically relocate data when the data distribution is uneven, so as to ensure uniform load among each partition.

Lower cost: mongodb comes with data compression. Under the same data, mongodb needs fewer disks.

Higher performance: mongodb maximizes the use of memory and has a performance close to that of memory database in most scenarios. After testing, mongodb’s single slice reading performance is about 30000 QPS.

More reading and writing methods: Although mongodb does not have the inverted index of ES, its supported query method is slightly inferior to es. However, while mongodb has the query capability of most es, its performance is much higher than that of ES; Moreover, compared with MySQL, mongodb’s field type supports embedded objects and array objects, so it can meet more reading and writing requirements.

2.3 scheme comparison

Through the previous analysis, we preliminarily judge that mongodb has better performance. Therefore, in order to further determine the advantages of mongodb, we deeply compared the performance of MySQL + es and mongodb in all aspects.

2.3.1 storage cost comparison

Mongodb’s advantages in storage are mainly reflected in two aspects: data compression and non redundant storage.

In order to more intuitively see the disk usage, we simulated the actual storage situation under MySQL + es and mongodb in the Tencent priority code business scenario.

On the one hand, under the MySQL + es scheme, in order to meet the needs, we need to combine the redundant es data with the redundant table of MySQL. The core data of the code is stored in mysql, and its total disk accounts for only 38.1% of the total. As mentioned earlier, mongodb does not need redundant storage, so using mongodb can reduce the total data capacity by 61.9%.

Industry case | application of mongodb in Tencent retail premium code

On the other hand, after testing the same code data, the compression rate of mongodb snappy compression algorithm is about 3 times and that of zlib compression algorithm is about 6 times. Therefore, although the business chooses snappy compression algorithm to ensure the stability of the system, mongodb still only needs one-third of MySQL’s disk consumption.

Industry case | application of mongodb in Tencent retail premium code

2.3.2 development, operation and maintenance cost

No data synchronization link: using mongodb does not require data synchronization, so there is no need to maintain the canal service and Kafka queue, which greatly reduces the difficulty of development, operation and maintenance.

Labor cost benefit: under the MySQL + es architecture, every time a field change is added to the MySQL Cluster, a certain man-day investment in operation and maintenance is required, and there is a risk of business jitter. At the same time, it will affect the progress of business iterative release, which is time-consuming and risky.

Development and maintenance costs: mongodb storage architecture is simple, one storage, no data consistency pressure.

Dynamic capacity expansion: mongodb supports dynamic capacity expansion at any time, and there is basically no problem of capacity ceiling. MySQL needs to rehash the data manually during capacity expansion, and ensure the data consistency and integrity.

2.3.3 performance comparison

After pressure test, under the same 4c8g machine configuration, MySQL and mongodb have basically the same write performance under large amount of data. The readability of MySQL is about 6000qps, and the performance of ES is only about 800qps. Mongodb’s single slice read performance is about 30000 QPS, which is much higher than that of MySQL and es.

2.3.4 summary

After the above analysis and comparison, it is obvious that mongodb has advantages in all aspects. In order to more intuitively see the differences between different schemes, here are the comparative data from five aspects: function, performance, cost, scalability and Maintainability:

Industry case | application of mongodb in Tencent retail premium code

To sum up, mongodb is better than the other two schemes in terms of service cost and maintainability. Therefore, mongodb can not only meet the core requirements of Tencent code, but also completely meet the maintenance requirements of other services.

III Optimization process of mongodb partitioned cluster

The retail premium code business has high requirements for cost and large amount of data, and the real online read-write traffic is not too high (3w QPS requirements). Therefore, the cluster deployment mode of low specification 4c8g specification (single node specification) is adopted.

3.1 slicing cluster slicing selection + pre slicing

The retail premium code data query is through the code ID, so the code ID is selected as the slice construction, which can maximize the query performance, and the index query can obtain data through the same slice. In addition, in order to avoid the movechunk operation caused by the imbalance of data between slices, the hashed slice method is selected and pre slicing is carried out in advance. Mongodb supports hashed pre slicing by default. Taking the priority code detail table as an example, the pre slicing method is as follows:

use db_code_xx  
sh.enableSharding("db_code_xx")  
//N is the actual number of slices  
sh.shardCollection("db_code_xx.t_code_xx", {"id": "hashed"}, false,{numInitialChunks:8192*n})

3.2 setting of sliding window in low peak period

Due to the low specification of mongodb instance node (4c8g), when the chunks data between slices is unbalanced, automatic balance will be triggered. Due to the low specification of the instance, the following problems exist in the balance process:

CPU consumption is too high, and the migration process even consumes about 90% CPU

Business access jitter and time-consuming increase

Slow log increase

Increase of abnormal alarms

The above problems are caused by the movechunk data relocation process in the balance process. In order to quickly migrate data from one partition to another, mongodb will keep moving data from one partition to another. At this time, it will consume a lot of CPU, resulting in business jitter.

Mongodb kernel also considers that the balance process has a certain impact on the business, so it supports the balance window setting by default, so that the balance process and business peak can be staggered, so as to maximize the avoidance of business jitter caused by data migration. For example, set the 0-6 am low peak period to set the balance window. The corresponding commands are as follows:

use config  
db.settings.update({"_id":"balancer"},{"$set":{"activeWindow":{"start":"00:00","stop":"06:00"}}},true) 

3.3 write majority optimization

Because the QR code data is very core, in order to avoid the risk of data loss and data regression in extreme cases, the client adopts writeconcern = {W: “majority”} configuration to ensure that the data is written to most members of the replica set before sending confirmation to the client.

The concept of chain replication: suppose that nodes a (primary), B (secondary) and C (secondary). If node B synchronizes data from node A and node C synchronizes data from node B, a chain synchronization structure is formed between a – > b – > C, as shown in the following figure:
Industry case | application of mongodb in Tencent retail premium code

Mongodb multi node replica set can support chain replication. You can obtain whether the current replica set supports chain replication through the following command:

cmgo-xx:SECONDARY> rs.conf().settings.chainingAllowed  
true  
cmgo-xx:SECONDARY>   

In addition, you can judge whether there is chain replication in the current replica set node by viewing the synchronization source of each node in the replica set. If the synchronization source is a secondary slave node, it indicates that chain replication exists in the replica set. See the replica set parameters below for details:

cmgo-xx:SECONDARY> rs.status().syncSourceHost  
xx.xx.xx.xx:7021  
cmgo-xx:SECONDARY> 

Since the service is configured as write majority, in view of performance considerations, the chain replication function can be turned off. Mongodb can be turned off through the following command operations:

cfg = rs.config()  
cfg.settings.chainingAllowed = false
rs.reconfig(cfg)  

Chain replication benefits:It can greatly reduce the pressure on the master node to synchronize oplog.

Insufficient chain copy:When the write policy is majority, the time-consuming of the write request becomes larger.

Based on the consideration of write performance, when the business adopts the “write most” strategy, the chain copy function is directly turned off to ensure the write performance degradation caused by the long write link.

About the author:
CSIG Tencent youcode team and Tencent mongodb team

Community recruitment

In order to allow members of the community organizing committee and volunteer friends to participate flexibly, at the same time, we have opened a “recruitment channel” for partners who want to deeply participate in community construction. If you want to make like-minded technical partners in the community, precipitate valuable dry goods in the community, and want a stage to show yourself and enhance your technical influence, Join the community contribution team now ~ Click the link to submit the application:
http://mongoingmongoing.mikec…