Some thoughts on data management


With the vigorous development of Internet, 5g and big data, data seems to have become an indispensable and important asset for enterprises, and the significance of data management is becoming more and more prominent. However, it is not easy to manage data well. Why?

Let’s first look at the definition of data:

Data refers to the symbols that record and identify objective events. It is the physical symbols or a combination of these physical symbols that record the nature, state and relationship of objective things. It is a recognizable and abstract symbol—— Baidu Encyclopedia

Data are digital features or information obtained through observation—— Wikipedia

Combined with the actual situation, generally speaking, data is a virtual resource obtained from enterprise development or business behavior and used for the development and growth of enterprises, which will not only bring rich material benefits, but also help enterprises make continuous progress.

Secondly, the definition of data management should be clear:

Data management is the management of data resources. According to the definition of Dama, “data resource management is committed to developing appropriate constructs, strategies, practices and procedures to deal with the enterprise data life cycle”. This is a high-level and broad definition, and does not necessarily directly involve the specific operation of data management—— Wikipedia

The process of effectively collecting, storing, processing and applying data by using computer hardware and software technology. Its purpose is to give full and effective play to the role of data. The key to effective data management is data organization—— Baidu Encyclopedia

Compared with Baidu’s definition of computer field, Wikipedia is more in line with the definition of modern enterprise data management.

Combined with the actual situation, generally speaking, the amount of data is rising in the process of enterprise development, and data also brings many problems when helping enterprise development. Therefore, enterprises need a set of overall and complete concepts, procedures and architecture to manage data, so as to make the data better used by enterprises.

Why manage data?

Enterprise management data is very similar to our ordinary people’s financial management. But one important difference is that enterprises manage data in the hope that data can operate more efficiently and at a low cost to bring business benefits. But just as there are risks in financial management, don’t directly launch a data management project or purchase data management products before you figure out why to manage data. First, you need to find out what the significance of data management is and what value it can bring to the enterprise?

Nature of management data

Enterprises hope to achieve specific purposes at low cost and high efficiency by managing data. There are three key words in this sentence, to achieve a specific purpose, low-cost and efficient, through the way of managing data. We analyze it one by one: first, managing data is a professional work, so talent is the first element, just as pilots can’t be asked to fight bayonets on the battlefield; Secondly, low cost and high efficiency, that is to clarify the actual value of each data, arrange the data priority, and eliminate useless data in time; Finally, to achieve a specific purpose, data management must have its own purpose, and never manage for management.

So, how to manage data?

Answer: use data to manage data. The premise of data management is to quantify the things to be done. After quantification, the things also become data, and we need to use data for work decision-making and management. Secondly, there needs to be a set of effective management methods.

With the purpose and method, we should start to formulate data specifications

It is difficult to formulate data management specifications. It needs to focus on the goal while making and practicing. There is no best system, only the most suitable system.

The following is an evaluation question to measure the ability of data management. First, it should be quantified, second, it should be answered by machine, and third, it should be answered within half an hour.

  • Can you directly give the value of each table for data realization? Or if this table does not show, how many potential losses will it bring? (virtual indicators are OK).
  • Can the operation quality report of each table be given directly? Can specific suggestions for operation optimization be given according to priority?
  • Which watches can go offline directly?

You will find that answering these questions is not only as simple as building a data management system, but also needs to formulate corresponding specifications and standards.

If you need to know the value of each table for data realization, there must be a relationship between the application and the table. Therefore, when the development goes online, you must formulate specifications and submit at least the mapping relationship. At the same time, in order to prevent the phenomenon of two skins, you must rely on an automated system.

If you need to know the data quality report of each table, you must formulate relevant quality indicators and be able to give early warning and processing in time. This requires a set of data quality monitoring system.

If it is necessary to determine which tables can be directly offline, a set of data table life cycle management system must be developed, such as blood relationship and impact analysis. Otherwise, how do you know the impact?

If the operation and maintenance personnel want to know who these tables are, they must have a good data dictionary and clarify the table naming specification and caliber definition to reduce the management cost.


You see, all data management rules and regulations are actually to ensure that the purpose is achieved, which will extend a huge data management system, but we still need to understand that we can grasp the essence. Because at the beginning, it is impossible to think of so much and do so much. We need to start from the source and think about where to start.

After that, we will mention data management tools.

If a worker wants to do well, he must sharpen his tools first. At present, the amount of data is becoming more and more huge, it is difficult to ensure stability by human flesh, and the risk is huge. Therefore, data management tools are becoming more and more important in modern enterprises.

Before, the event of micro League programmers deleting libraries and running around spread all over the network. A few lines of code evaporated the market value of listed company Micro League by more than 1 billion in one day, affecting millions of users, and the direct and indirect losses are difficult to measure. This “joke” like event will undoubtedly sound an alarm to major companies, especially small and medium-sized companies whose data management is not standardized. If such a database deletion and running event occurs, the effect will undoubtedly be devastating.

It can be seen that to ensure the stable operation of enterprise data, data control and audit are undoubtedly very important, especially for data managers. For developers at different levels in the enterprise, they need to formulate detailed data operation permissions, which are allowed and which are not allowed, which should be clearly pointed out. Moreover, what users have done to the data within their authority, especially high-risk events, should be subject to detailed audit analysis, which I think are the most essential.

In addition, the visualization of data management is also very important. Some companies have hundreds of ETL tasks. It is also important to quickly and simply judge whether the task is running successfully. It will directly determine the workload and difficulty of operation and maintenance.

At present, there are not many data management products, but if you want to face the complex and changeable enterprise environment, different database conditions and different users’ operation habits, you can often achieve limited results.

How to be a good data management tool?

First of all, it must be able to integrate into the production environment of the enterprise, which is the major premise.

Secondly, at present, enterprises need to use more and more kinds of databases, such as NoSQL, newsql, domestic databases, etc., so they need to support all kinds of commonly used databases in one platform without opening a lot of tools.

In addition, cloud and web applications are becoming more and more widely. For enterprise teams, if a system can be deployed in the cloud without everyone downloading the configuration, unified deployment and unified use, and enhancing the cooperation between teams, it will greatly save team time and improve efficiency. Moreover, the data operation experience and efficiency can not be ignored. The functions commonly used in DBA and development in daily work should be standard.

The last is the most important data security. For DBAs, the two basic functions they just need may be permission control and audit.

Some thoughts on data management

Data management is a systematic project, which involves the reengineering of many processes and the establishment of new mechanisms, such as standardizing the development process. The impact is also all-round. It also needs to win the support of managers, otherwise it will be difficult.

In addition, data management is a professional work, and professional people should make every effort to deal with it. Other tools, such as tools, are auxiliary. Leaving professional talents often will not achieve good results.


In the future era, data will become more and more huge, and data management will become more complex and difficult, which is not only a challenge to data practitioners, but also an opportunity given by the times.

Cloudquery, a unified data management and control tool based on Web——