data mining Concept and characteristics of
Data mining refers to the non trivial process of revealing implicit, previously unknown and potentially valuable information from a large amount of data in the database. Data mining is a decision support process, which is mainly based on artificial intelligence, machine learning, pattern recognition, statistics, database, visualization technology, etc. it can highly automatically analyze the enterprise data, make inductive reasoning, and mine the potential patterns, so as to help decision makers adjust market strategies, reduce risks and make correct decisions.
Basic steps of data mining. Data mining is a technology of analyzing each data and finding its law from a large amount of data. It mainly includes three steps: data preparation, law finding and law representation. Data preparation is to select the required data from relevant data sources and integrate them into a data set for data mining; Law search is to find out the law contained in the data set by some method; Rule representation is to express the found rules in a way that users can understand as much as possible. The tasks of data mining include association analysis, cluster analysis, classification analysis, anomaly analysis, specific group analysis and evolution analysis.
Basic steps of data mining
1. Definition problem
Before starting knowledge discovery, the first and most important requirement is to understand data and business problems. There must be a clear definition of the goal, that is, to decide what you want to do. For example, when you want to improve the utilization of e-mail, you may want to “improve user utilization” or “improve the value of one-time user use”. The models established to solve these two problems are almost completely different, and a decision must be made.
2. Establish data mining database
The establishment of data mining database includes the following steps: data collection, data description, selection, data quality evaluation and data cleaning, merging and integration, building metadata, loading data mining database and maintaining data mining database.
3. Analysis data
The purpose of the analysis is to find the data fields that have the greatest impact on the prediction output and decide whether to define the export fields. If the data set contains hundreds of fields, browsing and analyzing these data will be a very time-consuming and tiring thing. At this time, you need to choose a good interface and powerful tool software to help you complete these things.
4. Prepare data
The last step before establishing the model is data preparation. This step can be divided into four parts: selecting variables, selecting records, creating new variables and converting variables.
5. Model building
Modeling is an iterative process. Different models need to be carefully examined to determine which model is most useful for the business problems faced. First use part of the data to establish the model, and then use the remaining data to test and verify the obtained model. Sometimes there is a third data set, called validation set, because the test set may be affected by the characteristics of the model. At this time, an independent data set is needed to verify the accuracy of the model. Training and testing data mining models need to divide the data into at least two parts, one for model training and the other for model testing.
6. Evaluation model
After the model is established, we must evaluate the results and explain the value of the model. The accuracy obtained from the test set is only meaningful to the data used to build the model. In practical application, we need to further understand the types of errors and the related costs. Experience has proved that an effective model is not necessarily a correct model. The direct reason for this is the various assumptions implied in the model establishment. Therefore, it is very important to test the model directly in the real world. First apply it in a small range, obtain the test data and feel satisfied, and then promote it in a large range. After the model is established and verified, there are two main methods of use. The first is to provide reference for analysts; The other is to apply this model to different data sets.
The above is the sharing of smartbi of Smet software. For more industry dry goods, please pay attention to our sharing in the next issue. Smart Bi software is a well-known domestic Bi brand, focusing on the R & D and service of Business Intelligence BI and big data Bi analysis platform software industry. After years of independent research and development, it has gathered a large number of business intelligence best practice experience and integrated the functional requirements of data analysis and decision support in various industries. Meet the needs of end users in big data analysis such as enterprise level report, data visualization analysis, self-service exploration analysis, data mining modeling, AI intelligent analysis and so on.
Now the personal version provides long-term free use of the whole module. Interested partners can log in to the official website for free trial~