Evolution and practice of vivo cloud service massive data storage architecture

Time:2021-12-6

1、 Write at the beginning

Vivo cloud service provides users with the ability to back up contacts, SMS, notes, bookmarks and other data on their mobile phones. The underlying storage adopts MySQL database for data storage.

With the development of vivo cloud service business, the number of cloud service users is growing rapidly, and the amount of data stored in the cloud is increasing. Massive data has brought great challenges to back-end storage. The biggest pain point of cloud service business in recent years is how to solve the storage problem of users’ massive data.

2、 Facing challenges

From 2017 to 2018, the core indicators of cloud service products focused on increasing the number of users. Cloud service has made major adjustments in product strategy. After users log in to vivo account, the cloud service data synchronization switch is turned on by default.

This product strategy has brought explosive growth to the number of cloud service users. The number of users has directly jumped from millions to tens of millions, and the amount of data stored in the back-end has also jumped from tens of billions to hundreds of billions.

In order to solve the storage problem of massive data, the cloud service has practiced the four axes of sub database and sub table: horizontal sub table, vertical sub table, horizontal sub database and vertical sub database.

1. Horizontal sub table

Thorny road 1:What should we do when the browser bookmark, memo list library and single table have a data volume of more than 100 million?

I believe that brothers who have understood the knowledge system of sub database and sub table will soon be able to answer: if the amount of data in a single table is too large, sub table should be carried out. We did the same, splitting the browser bookmark and memo module single table into 100 tables.

Add browser bookmarks, notes and listsHundred million levelThe data volume is migrated to 100 sub tables, and each table is loaded  1000W Amount of data.

This is the familiar first Axe:Horizontal sub table

2. Horizontal sub database

Thorns Road 2: the contact and SMS data have been divided into tables, but only 50 tables have been divided at first, and the database has not been divided. After the explosive growth of the number of users, the total amount of contact data in a single database has reached   Billions, and the amount of data in a single table has reached 5000W  , Continued growth will seriously affect MySQL performance. What should I do?

Second board axe, horizontal sub warehouse: if one library can’t support it, divide it into several more libraries. WeSplit the original single library into 10 LibrariesIn addition, the original 50 tables of single database contacts and SMS were expanded to 100 tables. During the same period, the migration and rerouting of billions of stock data were carried out, which was very painful.

3. Vertical sub warehouse and vertical sub table

Thorny road 3:At first, the data storage of each module of cloud service was miscellaneous.

When there is a bottleneck in the storage space, we analyze the storage space distribution of the data of each module, as follows:

Single library disk capacity5TContact dataOccupied storage space2.75T(55%)SMS dataOccupied storage space1T(20%)All other module dataShared storage space500G(5%)Remaining free space 1T,   Contact and SMS data occupy the total space75%

The remaining 1t of space capacity can not support the continuous growth of user data, and the situation is not optimistic. If there is insufficient space, all modules will be unavailable due to space problems. What should I do?

(the following figure shows the distribution of data storage space of cloud service at that time)

The third and fourth axes, vertical sub warehouse and vertical sub table:We store and decouple contact data, SMS data and other module data. Separate contact data and SMS data into libraries.

So far, the cloud service has practiced all the four axes of the sub database and sub table. The data should be disassembled and divided.

4. Dynamic capacity expansion scheme based on routing table

Thorny road 4:According to the above description, the split contact database adopts the strategy of fixed 10 databases. According to the preliminary evaluation, 10 databases * 100 tables can meet the needs of business data growth. I thought I could rest easy, but the growth rate of contact data exceeded expectations.

Separate split of contact database9 monthsAfter that, the storage space of a single libraryFrom 35% to 65%。 According to this growth rate, the independent split contact database will again face the problem of insufficient space for another six months.

How to solve it? It is certain to continue capacity expansion, and the core point is what kind of capacity expansion strategy to adopt. If the conventional capacity expansion scheme is adopted, we will face the problem of migration and rerouting of massive stock data, and the cost is too high.

After the communication and discussion of the technical team, combined with the characteristics of the cloud service contact business (the number of contacts of old users is basically stable, a large number of contacts will not be added frequently, and the growth rate of contact data of old users is controllable), we finally adopted the dynamic capacity expansion scheme based on routing table.

The following describes the features of this scheme:

  • Add a user routing table to record the database and table in which the user contact data is routed;
  • The contact data of new users will be routed to the newly expanded database, which will not cause data storage pressure on the original database.
  • The data of old users will not move, but will be saved in the original database.
  • The feature of this scheme is to ensure that the original old database only needs to ensure the data growth of old users, and all new users are carried by the newly expanded database.

Although the growth rate of old user contacts is controllable, we expectThe original old database can reserve 60% of the storage space to support the data growth of old users. At present, the old library only has 35% available space, which does not meet our requirements.

In order to reduce the storage space occupied by old database data, we naturally think of starting fromdata compressionStart at the level.

3、 Pre study of compression scheme

Cloud services has pre researched the following three schemes for database data compression:

Scheme 1: the program realizes data compression by itself, and then saves it to the database

Advantages:

Without any modification to the database, the modification is completely convergent by the program itself, and the fields that need to be compressed can be controlled freely.

inferiority:

Additional compression tasks need to be developed for data compression of stock data, and the magnitude of stock data is too large, and the time-consuming of data compression by program is uncontrollable.

After the data is compressed and stored in the database, it is necessary to select the query directly from the DB platform. The content of the field is no longer readable, which increases the difficulty of subsequent positioning.

Scheme 2: data compression capability of MySQL database InnoDB

Advantages:

Using the existing capabilities of InnoDB for data compression, there is no need to make any modification to the upper program, and it does not affect the subsequent select data query.

inferiority:

It is suitable for business scenarios with large amount of data, more reading and less writing, and is not suitable for businesses requiring high query performance.

Scheme 3: switch the InnoDB storage engine to tokudb and use the natural data compression capability of tokudb engine

Advantages:

Tokudb naturally supports data compression, supports a variety of compression algorithms, supports frequent data writing scenarios, and has natural advantages for large amount of data storage.

inferiority:

MySQL needs to install additional plug-ins to support tokudb engine. At present, the company has no business and has mature experience in using tokudb. The risk after access is unknown, which is also a challenge to the maintenance of subsequent DBAs.

After comprehensive consideration, we finally decided to adopt the second compression scheme: INnodb’s own compression capacity

The main reasons are as follows:

  • Simple operation:The DBA can compress the data by changing the file format of the existing InnoDB data table;
  • Controllable compression speed:After testing, a 2000W data table can be compressed in 1-2 days;
  • Low transformation cost:The whole transformation process only needs DBA to execute relevant SQL and change the file format of the data table, and the upper program code does not need to be changed;
  • Business scenarios suitable for cloud services:User data backup and recovery do not belong to high-performance and high QPS business scenarios, and most of the data tables of cloud services comply with the characteristics of a large number of string fields, which is very suitable for data compression.

4、 Compression scheme verification

1. Introduction to InnoDB compression capability

Before MySQL version 5.1.38, there was only InnoDB base storage engine. The default file format was antelope. This file format supports two row_formats: compact and redundant, which are not data compression type row formats.

After MySQL 5.1.38, InnoDB plugin and Barracude file format are introduced. Barracude is fully compatible with the file format of antelope, and supports two other line formats, dynamic and compressed (supporting data compression).

2. Compression environment preparation

Modify database configuration: change the file format of the database. It defaults to antenna and changes to Barracuda

SET GLOBAL innodb\_file\_format=Barracuda;

SET GLOBAL innodb\_file\_format_max=Barracuda;

SET GLOBAL innodb\_file\_per_table=1

Description: InnoDB\_ file\_ per_ Table must be set to 1. The reason is that table spaces cannot be compressed in the InnoDB system. The system table space contains not only user data, but also InnoDB internal system information. It can never be compressed. Therefore, different table spaces need to be set to support compression.

After setting OK, you can execute show global variables like ‘% file’\_ Format% ‘and show global variables like’% file\_ ‘per%’ confirm whether the modification is effective.

(this setting method is only effective for the current session and will become invalid after the MySQL instance is restarted. If it needs to be permanently effective, please configure it in the MySQL global configuration file)

3. Compression effect test verification

Prepare 1 data table that supports compression format and 1 data table that does not support compression. The field formats are all the same.

Compression table:

Description: row\_ Format = compressed, the specified line format is compressed. Recommended key\_ block\_ size=8。 key\_ block_ The default size is 16. The optional values 16, 8 and 4 represent the InnoDB data page size. The smaller the value, the greater the compression force. Based on the comprehensive consideration of CPU and compression rate, it is recommended to set it to 8 online.

Uncompressed table:

Prepare data: use stored procedures to send data to t at the same time\_ Nocompress table and t\_ Insert 10W pieces of the same data into the compress table. The space occupied by the two tables is shown in the figure below:

t\_ Compress table data takes up 10m, t\_ Nocompress table data occupies 20m and the compression rate is 50%.

Note: the compression effect depends on the field type of the table. Typical data usually has duplicate values, so it can be compressed effectively. Char, varchar, text, blob, etc.

String type data is usually well compressed. Binary data (integer or floating-point numbers) and compressed data (JPEG or PNG images) usually do not achieve compression effect.

5、 Online practice

From the above test and verification, if the compression rate can reach 50%, the occupied space of the old contact database can be compressed from 65% to 33%, and 60% of the remaining space can be reserved.

However, we need to be awed by online data. Before online practice, we need to verify the scheme offline. At the same time, we also need to consider the following issues:

1. Does data compression and decompression affect the performance of DB server?

We use performance pressure measurement to evaluate the impact on the database server CPU before and after compression. The following is the CPU comparison diagram of DB server before and after compression:

Insert data into the contact list on the premise that the data volume of the contact list has been 2000W.

Before compression:Insert 50 contacts at one time, 200 concurrent contacts, lasting for 10 minutes, TPS 150, cpu33%

After compression:Insert 50 contacts at one time, 200 concurrent contacts, lasting for 10 minutes, TPS 140, cpu43%

After the data table is compressed,Frequent data insertion into the database does increase the CPU, but the TPS is not greatly affected. After repeated pressure tests, the CPU of the database server is basically stable at about 40%, which is an acceptable range for business.

2. Will changing the data table file format affect business SQL reading and writing and normal business functions?

We mainly didOffline verificationandOnline verification

Offline verification:The test environment adjusted the contact data table to the compressed format, and arranged the test engineer to assist in spot checking the full function of the contact, and finally the function was normal.

The pre launch environment goes through the steps of the test environment again, and the function point check is normal.

Online verification:Select the data table of the call recording module that is not sensitive to the user for compression, select and compress one table in one library, pay attention to the data reading and writing of this table, and pay attention to the user’s complaints.

After continuous observation for 1 week, the call record data of this table can be read and written normally, and no abnormal feedback from any user is received during this period.

3. The amount of online contact data is huge. How to ensure the stability of the service during compression?

We mainly weigh according to the following ideas:

  • Select a contact data table for compression and evaluate the time spent on a single table.
  • Select single database, perform multi table concurrent compression, and observe the CPU usage. DBA balances that the maximum value of CPU should not exceed 55%, and gradually adjusts the number of compression concurrency according to this standard to ensure that the CPU is stable at about 55%, and finally obtains the maximum number of tables supported by a single database for compression at the same time.
  • Combined with the first and second steps, we can calculate the approximate time spent after all data tables in all libraries are compressed. After synchronization to the project team and relevant responsible persons, we can implement the compression according to the steps.

Finally, the effect of data compression on the online contact database is as follows:

6、 Write at the end

This paper introduces the challenges brought by cloud service with business development and massive data storage, as well as some experience of cloud service in database and table division and database data compression, hoping to provide reference significance.

InnoDB  Data compression is applicable to the following scenarios:

  • Business with large amount of business data and space pressure on Database disk;
  • It is applicable to business scenarios with more reading and less writing, and is not applicable to businesses with high requirements for performance and QPS;
  • It is applicable to a large number of string type data in the business data table structure. This type of data table can usually be effectively compressed.

last:

  • When selecting databases and tables for business, we must fully estimate the growth of data volume, and the data migration caused by subsequent database expansion will hurt our muscles and bones.
  • We should be in awe of online data. The scheme must be repeatedly verified offline before it can be applied online.

Author: vivo platform product development team

Recommended Today

Principle of optical motion capture system

Nokov measurement optical motion capture system is a motion capture system based on the principle of infrared optics. Compared with the positioning means such as inertial principle motion capture system and GPS positioning system, it has the characteristics of high precision, low delay, strong real-time performance and is mostly used in indoor scenes. The system […]