Metadata, a simple definition, is the data that describes the data. In an enterprise, no matter where there is data, there is corresponding metadata. Only with complete and accurate metadata can we better understand the data and make full use of its value. In order to let everyone better understand what metadata is, taskctl will give an example of what metadata is for the type of metadata.
The scope of metadata management will include data description information of various links such as data generation, data storage, data processing and display, so as to help users understand the context, relationship and related attributes of data.According to the different description objects, it can be divided into three types of metadata: Technical Metadata, business metadata and management metadata。
The three metadata are described as follows:
- Technical Metadata technical metadata is the data that describes the concepts, relationships and rules related to the technical field in the data system, mainly including the feature description of data structure and data processing, covering all data processing links such as data source interface, data warehouse and data mart storage, ETL, OLAP, data encapsulation and front-end display;
- Operator data business metadata is the data that describes the concepts, relationships and rules related to the business field in the data system, mainly including business terms, information classification, indicator definitions, business rules and other information;
- Management metadata management metadata is the data that describes the concepts, relationships and rules related to the management field in the data system, mainly including personnel roles, job responsibilities, management processes and other information.
Scope of metadata management
The scope of metadata management should not only be limited to the data of enterprise data warehouse, data mart and management classification application, but also include the metadata of enterprise business system for unified management, so as to truly manage the metadata from the source and manage it as the complete life cycle of data.
Data map presentation is a hierarchical graphical presentation of various data entities and data processing process metadata of the data system in the form of topology diagram, and through different levels of graphic presentation granularity control, it can meet the needs of Graphic Query and auxiliary analysis in different application scenarios of development, operation and maintenance or business.
Blood relationship analysis
Blood relationship analysis(also known as pedigree analysis)It refers to starting from an entity and tracing its processing process back to the data source interface of the data system. For different types of entities, the conversion process involved may have different types,For example, for the underlying warehouse entity, the ETL processing process is involved； For the warehouse summary, it may involve both ETL processing and warehouse summary processing; For indicators, in addition to the above processes, it also involves the process of indicator generation. The data source interface entity is provided by the source system as the data input of the data system. Other data entities have gone through one or more different types of processing processes. Kinship analysis provides such a function, which allows users to understand different processing processes according to their needs, what each processing process does, what input they need, and what output they produce.
Impact analysis refers to looking for process entities or other entities that depend on an entity. If necessary, you can find all dependent process entities or other entities recursively. This function supports the evaluation of the influence scope of some entities when they change or need to be modified.
Entity association analysis
Entity association analysis is to view the usage of specific data from the perspectives of other entities associated with an entity and the processing process it participates in, so as to form a network of entities and the processing process it participates in, so as to further understand the importance of the entity. This function can be used to support the application of requirement change impact assessment
Entity difference analysis
Entity difference analysis is to check different entities of metadata and show their differences in the form of graphics and tables, including names, attributes, data kinship and differences affecting other parts of the system. There are many similar entities in the data system. These entities(e.g. data sheet)There may be only slight differences in names or attributes, and even some attributes have the same names but are in different applications. For various reasons, these small differences directly affect the data statistical results, and the data system needs to clearly understand these differences. This function helps to further unify the statistical caliber and evaluate the differences of similar entities
Index consistency analysis
Indicator consistency analysis refers to analyzing and comparing whether the data flow diagrams of two indicators are consistent in a graphical way, so as to understand whether the indicator calculation process is consistent. This function is a specific application of blood relationship analysis. Indicator consistency analysis can help users clearly understand whether the data objects and transformation relationships involved in each stage of the business analysis data flow diagram of the two indicators to be compared are consistent, help users better understand the context of indicators, and clearly understand the differences between indicators distributed in different departments and with the same name, so as to improve users’ trust in indicator values.
Auxiliary application optimization
Metadata provides an accurate description of the data, data processing process and the relationship between data in the data system. By using metadata analysis functions such as blood relationship analysis, impact analysis and entity association analysis, it can identify technical resources related to system application, and assist in the application optimization of data system in combination with the application life cycle management process
Auxiliary safety management
The data stored and various analysis applications provided by the enterprise data platform involve all kinds of sensitive information about the company’s operation. Therefore, in the process of data system construction, a comprehensive security management mechanism and measures must be adopted to ensure the data security of the system.
The data system security management module is responsible for the management of data sensitivity, customer privacy information and audit log records of all links of the data system, and effectively monitors the data access and function use of the data system. In order to realize the access control of the data system on sensitive data and customer privacy information and further realize the refinement of permissions, the security management module shall be based on metadata, and the metadata management module shall provide the definition of sensitive data and customer privacy information, and assist the security management module to complete relevant security control operations.
Metadata based development management
Development of data system projectThe main links include: demand analysis, design, development, testing and online。 The development management application can provide corresponding functions to manage and support the workflow, related resources, rules and constraints, input and output information of the above links.
Which method is more suitable for scheduling metadata entry?
Generally, open source scheduling tools support editing scheduling meta information using form forms(e.g. XXL job)Or just XML text(e.g. Ozzie, Azkaban)。
Traditional commercial scheduling software(e.g. control-m)In order to support massive scheduling job design. Template EXCEL documents are used for batch editing, and then imported into the scheduling system.
Taskctl in addition to support(traditional graphic drag job node)+(job properties form)，(except for template excel batch editing), also supports(Advanced XML code ide editor)To achieve mass job design.
The above briefly describes the meta information and explains how to use the meta information through examples; I hope you can clearly explain the core purpose of metadata. If you are interested or have questions, you are welcome to leave a message + share discussion and exchange with me. We will select 10 comments in the comment area and 20 partners sharing the link of the article to send a private message to me. We will give you the softwaretaskctl 6.01 year permanent free use official license