Data modeling refers to the abstract organization of all kinds of data in the real world, determining the jurisdiction of the database and the organization form of data, until it is transformed into a real database. After transforming the conceptual model abstracted after system analysis into physical model, the process of establishing database entities and the relationship between entities (entities are generally tables) in tools such as Visio or Erwin.
2、 Basic flow of data modeling
1. Determine the data and its related processes, truthfully, the salesperson needs to view the online product catalog and submit new customer orders.
2. Define data, such as data type, size, and default values.
3. Ensure data integrity using business rules and validation checks.
4. Define operational procedures, such as security checks and backups.
5. Choose a data storage technology, such as relational, tiered, or indexed storage.
6. Be aware that modeling often involves the management of a company in unexpected ways. For example, when there is new insight into which data elements should be maintained by which organizations, data ownership and the implicit responsibility for data maintenance, accuracy and timeliness are often questioned. Data design often makes companies realize how interdependent enterprise data systems are, and encourages companies to seize the efficiency improvement, cost savings and strategic opportunities brought by coordinated data planning.
3、 Types of data modeling
1. ER model
The ER model in OLAP is different from that in OLTP. The essential difference is subject oriented abstraction from the perspective of enterprise, rather than the abstraction of entity object relationship for a specific business process.
2. Star model
Star model is an implementation of dimension model on relational database. The model indicates that each business process contains a fact table, which stores the numerical degree of events. Multiple dimension tables around the fact table contain the actual text environment when the event occurs. This star like structure is often called “star connection”. It focuses on how users can complete requirements analysis more quickly, and has better response performance for large-scale complex queries. Based on the star model, the snowflake model can be further derived in complex scenes.
3. Multidimensional model
Multidimensional model is another implementation of dimensional model. When data is loaded into OLAP multidimensional database, the stored index of these data adopts the format and technology involved for dimensional data. Performance aggregation or precomputation summary tables are usually established and managed by the multidimensional database engine. Due to precomputation, Indexing Strategy and other optimization methods, multidimensional database can realize high-performance query.
4、 Data modeling case
1. Smartbi big data mining platform has rich and scalable algorithms
The data mining platform supports a variety of efficient and practical machine learning algorithms, including classification, regression, clustering, prediction and association. It contains a variety of trainable models: logical regression, decision tree, random forest, naive Bayes, support vector machine, linear regression, K-means, DBSCAN and Gaussian mixture model. In addition to providing main algorithms and modeling functions, the data mining platform also provides essential data preprocessing functions, including field splitting, row filtering and mapping, column selection, random sampling, filtering null values, merging columns, merging rows, joining, row selection, removing duplicate values, sorting, adding serial numbers, adding calculation fields, etc.
2. Smartbi big data mining platform is fully functional and seamlessly integrated into enterprise Bi applications
1) Suitable for large enterprises
Distributed cloud computing, linear expansion, guaranteed performance, seamless integration with Bi platform, one click release of mining models, model base to improve knowledge reuse, reduce repeated investment, support cross database query, unified control of data access, training automation and model self-learning.
2) Suitable for ordinary users
Intuitive flow modeling, minimalist node configuration interface, support visual exploration, easy understanding of data quality and data parallel, online help of process nodes, and automatic adjustment of model super parameters.
3) Professional algorithm ability
Built in 5 categories of mature machine learning algorithms, support text analysis and processing, support the use of Python to expand mining algorithms, and support the use of SQL to expand data processing capabilities.
3、 Smartbi big data mining platform is easy to learn and use, and can complete data processing and modeling in one stop