Data mining refers to the process of searching hidden information from a large number of data through algorithms, which is usually related to computer science, and achieves the above objectives through many methods, such as statistics, online analysis and processing, information retrieval, machine learning, expert system (relying on past experience rules) and pattern recognition.
I have sorted out the common methods of data mining:
1. Analysis method of data mining — decision tree method
Decision tree has a strong ability in solving classification and prediction. It is expressed in the form of rules, and these rules are expressed in a series of problems. By constantly asking questions, it can finally lead to the desired results. A typical decision tree has a root at the top and many leaves at the bottom. It decomposes records into different subsets. The fields in each subset may contain a simple rule. In addition, decision trees may have different shapes, such as binary trees, ternary trees, or mixed decision tree types.
2. Analysis method of data mining — neural network method
Neural network method simulates the structure and function of biological nervous system. It is a nonlinear prediction model learned through training. It regards each connection as a processing unit and tries to simulate the function of human brain neurons. It can complete a variety of data mining tasks such as classification, clustering, feature mining and so on. The learning method of neural network is mainly reflected in the modification of weight. It has the advantages of anti-interference, nonlinear learning and associative memory, and can get accurate prediction results for complex situations; The first disadvantage is that it is not suitable for dealing with high-dimensional variables, can not observe the intermediate learning process, has “black box” nature, and the output results are difficult to explain; Secondly, it takes a long time to learn. Neural network method is mainly used in clustering technology of data mining.
3. Analysis method of data mining — association rule method
Association rules are rules that describe the relationship between data items in the database, that is, according to the occurrence of some items in a transaction, other items can also appear in the same transaction, that is, the association or relationship between data hidden. In customer relationship management, by mining a large amount of data in the enterprise’s customer database, we can find interesting association relationships from a large number of records, find out the key factors affecting the marketing effect, and provide information for product positioning, pricing and customization, customer seeking, segmentation and maintenance, marketing and promotion, Provide reference basis for decision support such as marketing risk assessment and fraud prediction.
4. Analysis method of data mining — genetic algorithm
Genetic algorithm simulates the phenomena of reproduction, mating and gene mutation in natural selection and heredity. It is a machine learning method based on evolutionary theory, which uses genetic combination, genetic cross mutation and natural selection to generate implementation rules. Its basic viewpoint is the “survival of the fittest” principle, which has the properties of implicit parallelism and easy combination with other models. The main advantage is that it can process many data types and process all kinds of data in parallel; The disadvantage is that too many parameters are required, coding is difficult, and the amount of calculation is generally large. Genetic algorithm is often used to optimize neural networks and can solve problems that are difficult to be solved by other technologies.
5. Analysis method of data mining — cluster analysis method
Cluster analysis is to divide a group of data into several categories according to similarity and difference. Its purpose is to make the similarity between data belonging to the same category as large as possible and the similarity between data in different categories as small as possible. According to the definition, it can be divided into four categories: hierarchical clustering method; Partition clustering algorithm; Density based clustering algorithm; Grid clustering algorithm. The commonly used classical clustering methods include K-mean, k-medoids, ISODATA and so on.
6. Analysis method of data mining — fuzzy set method
Fuzzy set method uses fuzzy set theory to carry out fuzzy evaluation, fuzzy decision-making, fuzzy pattern recognition and fuzzy cluster analysis. Fuzzy set theory uses membership degree to describe the attributes of fuzzy things. The higher the complexity of the system, the stronger the fuzziness.
7. Analysis method of data mining — web page mining
Through the mining of the web, we can use the massive data of the web to analyze, collect information related to politics, economy, policy, science and technology, finance, various markets, competitors, supply and demand information, customers and so on, and concentrate on analyzing and processing the external environment information and internal operation information that have a significant or potentially significant impact on the enterprise, According to the analysis results, find out various problems and possible precursors of crisis in the process of enterprise management, and analyze and process these information in order to identify, analyze, evaluate and manage the crisis.
8. Analysis method of data mining — logistic regression analysis
It reflects the temporal characteristics of attribute values in the transaction database, generates a function that maps data items to a real value predictive variable, and finds the dependencies between variables or attributes. Its main research problems include the trend characteristics of data series, the prediction of data series, and the correlation between data.
9. Analysis method of data mining — rough set method
It is a new mathematical tool to deal with vague, imprecise and incomplete problems. It can deal with data reduction, data correlation discovery, data meaning evaluation and so on. Its advantage is that the algorithm is simple. In its processing process, it can automatically find out the internal law of the problem without a priori knowledge of the data; The disadvantage is that it is difficult to deal with continuous attributes directly, and attribute discretization must be carried out first. Therefore, the discretization of continuous attributes is a difficult problem restricting the practicability of rough set theory.
10. Analysis method of data mining — connection analysis
It takes the relationship as the main body, and develops a considerable number of applications from the relationship between people, things or people and things. For example, the telecommunications service industry can collect the time and frequency of customers using the phone through link analysis, infer customers’ use preferences, and put forward plans that are beneficial to the company. In addition to the telecommunications industry, more and more marketers also use link analysis to do research beneficial to enterprises.
The above is the common methods of data mining shared by Xiaobian today. Please continue to pay attention to us for more knowledge. Guangzhou smart software Co., Ltd. (hereinafter referred to as smart BI) is a “high-tech enterprise” recognized by the state, focusing on business intelligence (BI) and big data analysis software products and services. We have more than 15 years of product R & D experience in the Bi field and provide complete big data analysis software products, solutions, as well as supporting consulting, implementation, training and maintenance services.