Support “go to o” database evaluation through self-developed database portrait tool

Time:2020-1-21

“Go to o” is a hot topic in recent years, which also brings about various doubts, including existing database evaluation, technology selection, etc. Going to o is a systematic project, which needs to be fully evaluated. In this paper, through self-developed tools, generate database portraits, provide first-hand data for o-assessment, and hope to give you reference.

1、 Common doubts

When considering going to o, many companies often face the problem of “not knowing enough about their own database”, and some doubts are inevitable:

[manager]

  • Is the cost of database going to o high?
  • A lot of work?
  • Is the construction period long?
  • Is there any risk?

[architect]

  • Can MySQL support the existing business scale?
  • Is there any technical risk?
  • Is it necessary to introduce sub databases and sub tables?
  • Do you need to introduce caching?
  • Is R & D complex?
  • How long does it take?
  • What are the characteristics of data access?
  • Is there a large amount of data before and after migration?

[developers]

  • Are there many complex SQL?
  • Is the amount of renovation large?
  • Are Oracle dialects and proprietary objects needed to be modified?
  • Wait

In the face of these problems, it is necessary to quickly understand the existing Oracle objects, statements, access characteristics, performance, etc., and then evaluate the technical solutions, migration solutions and subsequent workload. That is to say, we need to “picture” our database. Based on the above database portrait, it will be helpful to guide the whole work cycle of go o, including the following aspects:

  • Decision making stage: overall difficulty, cost (human and financial), technical risk
  • Architecture stage: technical scheme, object structure and performance evaluation
  • R & D stage: compatibility, complexity, testing
  • Migration stage: structure migration, data migration and data verification

Based on this kind of demand, some companies launch evaluation products, such as Alibaba’s database and application migration service (Adam), but such products often need to deploy agents, upload analysis packages, etc., which is not feasible for security sensitive enterprises. When my company started to work in o two years ago, it also faced this problem. Therefore, we specially developed a green version of the small program, which can be run locally for the convenience of evaluation.

Address: https://github.com/bjbean/ora

2、 Design ideas

Collect and summarize Oracle database information, including environment, space, object, access characteristics, resource cost and SQL statements, covering the actual operation of the database. For more targeted information collection, the tool sets partial thresholds through parameters. By running the command line and collecting information, the web version evaluation report is produced, which is visualized. It can not only be used as the evaluation basis for O removal, but also as the data reference for subsequent transformation.

3、 Interpretation of portraits

The following is an interpretation of the report data, and a description of the common model to o-mysql.

3.1 general information

Figure 1

Displays a summary of the collected targets, including IP, instances, users, and so on. Pay attention to the analysis time, the script will extract the database execution characteristics (within 24 hours), so it is recommended to run after the business peak.

3.2 spatial information

Figure 2

Space size is one of the key indexes to be considered in database selection, which will also affect subsequent migration. If the scale of the library is large, it should be considered to be split. The principle of splitting is to control the scale of single database as much as possible. In general, the following split priority principle can be followed:

1) Business layer vertical split

At the application level, data is split according to different business lines. For example, the e-commerce platform is split by order, user, commodity, inventory, etc. The separated parts are business cohesive and have no strong data dependency.

2) Business level split

In the same business, establish life cycle management for data, and make data hot and cold stratification. According to different data access characteristics of different layers, further splitting can be done. For example, in the e-commerce platform, orders can be divided into active orders (within two weeks, goods can be returned and exchanged), inactive orders (from two weeks to half a year, customer service can accept), and historical orders (more than half a year).

3) Application layer sub database sub table

If the scale of a single library is still large after the above-mentioned splitting, the technology of sub database and sub table can be considered. The usual way is to introduce the database middle layer, which can make one database virtual logically, but it can be divided into many databases physically. This is not a very “elegant” solution, because it is difficult to achieve application transparency. That is to say, there must be a compromise in R & D, at the expense of some database capabilities. Common technical solutions can be divided into three types: client, proxy and sidecar. Now, proxy mode is recommended (sidecar mode can be considered for container deployment).

4) Distributed database of basic layer

The more thorough way is to use distributed database directly. It provides a solution that can carry a larger scale (capacity, throughput). In recent years, the distributed database has gradually matured, popularized and implemented, and began to be used in key scenarios.

3.3 object information

Figure 3

For Oracle objects, there are different considerations in the transformation. Summary data and detailed data are provided in the report for easy query.

1) table

Too many tables directly affect the size of the data dictionary and the overall efficiency of the database. From the perspective of MySQL, we also need to consider file handle and other issues. This indicator is not fixed and needs to be considered as appropriate. This is more about data architecture to avoid too many data tables in a single database. It has experienced 100000 tables in a single database, with low performance; after optimization, it is integrated into 20000 optimization cases. If MySQL is selected, it is recommended that the number of tables in a single database should not exceed 5000; the total number of tables in the database * should not exceed 20000.

2) Table (large table)

Controlling the scale of a single table is one of the key points of design, which directly affects access performance. If the table is too large, the above principle should be considered for splitting. There is no general principle for table size, which can be configured by parameters. It can be set according to physical size or number of records. The key point here is the access mode of the table. For example, if the tables are simple kV access, the scale is better; if the access is more complex, it is recommended to set a lower threshold. For example, if MySQL is selected, large table complex queries or multi table associations are not good at scenarios. You can consider using ES, Solr + HBase and other methods to process complex queries asynchronously.

3) Table (partition table)

Since 9i and 10g, the partition function of Oracle has been improved and enhanced. It can be said that Oracle has become a powerful tool to deal with massive data. But for MySQL, partition is still not recommended. On the one hand, with the enhancement of hardware capacity, the single table bearing capacity becomes larger; on the other hand, MySQL partition needs to face the problems of “DDL amplification”, “lock change”, etc. If the team can manage the database middle tier well, it is recommended to use the table splitting technology with lower complexity. This may slightly increase the amount of research and development, but for operation and maintenance, there are many benefits.

4) Field (large object)

Large objects are not recommended in any database. If you use it, take advantage of the renovation work and get rid of it. Large object function is a chicken bone for database. The acid ability of database itself should focus on saving more important data.

Figure 4

5) Index (B-tree)

Too many indexes will affect DML efficiency and occupy a lot of space. The “index / table” can roughly reflect the reasonableness of the number of indexes. There are no recommended values here, which can be considered as appropriate. For any database, there is a similar problem: how to “build a strategic index strategy”. Here, you can refer to the following table (selected from Li Huazhi – massive database solutions) to sort out the index requirements. Create and maintain index scientifically.

6) Index (other)

In addition to the usual B + tree index, Oracle supports other types of indexes. If other databases are selected, these indexes need to be modified and implemented in other ways.

7) view

View, as the logical encapsulation of SQL statements, is very meaningful in some scenarios (such as security). However, it has high requirements for the optimizer, and Oracle has done a lot of work in this area (see the book “SQL optimization best practices” written by the author). But for MySQL, it is not recommended to use it, and transformation is considered.

8) Triggers / stored procedures / functions

For the database, it carries two kinds of capabilities: computing and storage. As the most difficult component of the whole infrastructure, it is important to give full play to the core capabilities of the database. Compared with the storage capacity, the computing capacity can be solved by the application layer, and the application layer is often easy to expand. In addition, considering future maintainability, portability and other factors, this part should be solved at the application end.

Figure 5

9) sequence

The sequence in Oracle can provide incremental, discontinuous support sequence number service. There is a similar implementation in mysql, which is done by adding attributes. This part should be able to be migrated, but if the concurrency is very large, you can also consider the solution of using the signer.

10) Synonyms

Synonyms are the expression of data coupling. No matter what database they are in, they should be discarded. Consider splitting at the business side, no longer relying on this feature.

3.4 visit characteristics

Figure 6

Here is a collection of top 20 databases with the largest number of DML in the past 24 hours. This directly reflects the “hot spot” object of the current system’s operation. These objects need to be evaluated after selection and before migration. The ability to consider splitting, caching and other means can reduce the hot spot pressure of these objects. It is not only limited to these objects, but also recommended to build a “business stress model”. After fully understanding and evaluating the business, the business logic is abstracted and transformed into a data stress model. The difficulty here lies in the abstract ability of business logic and the proportion evaluation of module business volume.

Form pseudo code similar to the following:

Figure 7

According to the above pseudo code, the pressure test code can be compiled. By using some tools to call test code, the pressure of simulation test is generated. This is significant for system transformation, upgrading, capacity expansion evaluation, new hardware selection, etc. In the specific work of going to o, whether the new technical scheme meets the needs can be evaluated and verified by this method. Use more business language to compare the bearing capacity changes before and after o removal. This is also one of the factors to consider whether the technical scheme is feasible. Of course, the above information only includes DML, which is not included in the query part. These data can be obtained from Oracle AWR. More complete, it can be considered to do the whole link pressure test in combination with the application.

3.5 resource consumption

Figure 8

The resource usage in the last 24 hours is listed here. These data have two main purposes:

1) Evaluate overall load

Because the above indicators are measured by Oracle, they cannot be directly compared to other databases. The load pressure can be evaluated with expert experience + historical data. One of the bases used to evaluate other alternatives. Some of these indicators (such as user calls, etc.) can be converted into quantitative indicators to guide subsequent testing and other work.

2) Evaluate bottlenecks

For a very prominent indicator, it indicates that there are bottlenecks in the existing business. When migrating to other solutions, it should be considered in the design phase as much as possible, and focus on the testing phase to reduce the possible technical risks.

3.6 SQL statement

Figure 9

The rewriting of SQL statement is the most headache in the whole migration work. Unless it is a complete refactoring, it is a task that needs to pay attention to SQL rewriting. It involves rewriting amount, complexity, performance comparison and many other contents, many of which still need manual screening.

The author once had such experience that the project team spent one month to complete the “structure + SQL” migration of a project, but then spent another three months to complete sentence optimization and even structural adjustment. The reason is that the statement cannot meet the performance requirements after the migration goes online. This is a process of adjustment while on-line, which is extremely painful. Therefore, it is of great significance to find out the existing SQL situation in the early stage for evaluating workload, rewriting difficulty and performance evaluation. The above part is to collect and analyze all SQL of users in history (you can turn on the details switch to display full SQL), which includes the following dimensions.

1) Total SQL

This indicator can approximately reflect the busy degree of business. In addition, it can also be used as a scale analysis basis for subsequent problematic statements.

2) Overlong SQL

Statements with more than the specified number of characters are listed here, and the threshold can be configured by parameters. If MySQL is considered, it is recommended to use “short and concise” SQL, which generally does not perform well in the face of complex SQL. So for these super long statements, they are all worthy of attention, at least the statements that are prone to problems.

3)ANTI SQL

Reverse query, database processing are more difficult, this part is also more test optimizer. Although in the newer version of MySQL, there is a good optimization of reverse query, but this part is still worthy of attention.

4)Oracle Syntax SQL

The writing method with Oracle characteristics, that is, the dialect of Oracle (such as special functions, pseudo columns, etc.), needs to be processed in the migration. Of course, there are also some manufacturers that have announced that their products are compatible with Oracle syntax, but they also recommend to do special tests for these products.

5)Join 3+ Table SQL

Multi table association is also a comparative test of optimizer. In particular, the efficiency of association between MySQL tables is low. It is not recommended to use more than two tables. Here are 3 or more associated queries, which need to be modified. For particularly complex queries, you can consider unloading them to the big data platform.

6)SubQuery SQL

Subqueries are similar to the above, and MySQL is not good at them. Although the optimizer can be optimized to some extent, it is still worthy of attention.

By Han Feng

Welcome to the public channel, Han Feng channel.

Source: Yixin Institute of Technology