Time：20191028
Process and method of data mining 1. task: association analysis cluster analysis Classification analysis Anomaly analysis Specific group analysis Evolution analysis 2. method: Statistics On line processing analysis information retrieval machine learning classification Practical application: application classification / trend prediction / recommendation of related products regression analysis Practical application: forecast sales trend clustering Practical application: […]

Time：2019107
This article belongs to forwarding, on the basis of this article, I will add some of my own ideas in the future. I recently looked at schools abroad, separating machine learning from data mining. Data mining mainly deals with databases, learning what data warehouses, and using Oracle software. Machine learning seems to be closer to […]

Time：2019106
Preface This course is very suitable for beginners. It is much simpler than Ng’s open course at Stanford University. It has less mathematics and more compact content. It covers a wide range of contents. It aims to build a bridge between beginners and machine learning. It is worth mentioning that this course is closer to […]

Time：2019105
This chapter is more dry goods, looks more tired, the harvest is also relatively large.Keep reading and writing.It’s very tiring to write formulas. I hope segmentfault can support latex formulas as soon as possible.We have been unable to grasp the optimization of this thing, both theory and practice are deficient, and strive to win this […]

Time：2019104
General method of $2.2 Subgradient method Converting to Common LP, SDP Problems This general method does not mine the structure of the 1 norm problem itself, so the convergence rate is slow.LP, SDP and other methods are too pursuit of optimization accuracy, which is not important in the field of machine learning. It is important […]

Time：2019103
Finally, we can input the formula directly. I hope SF will get better and better. At present, the rendering speed of formulas is a little slow, and the formula rendering is also problematic. (Block) Coordinate Descent Alorithm The socalled (B) CD means that at each iteration, not all parameters are optimized, but only one parameter […]

Time：2019102
I looked up a lot of information about this problem, meaning that CSV read nine attribute (feature / dimension) values, but read only one, and the third line has a problem.Then there are all kinds of formatting spaces in the data, and so on. In fact, there is nothing wrong with setting up the attribute […]

Time：2019102
Data science is OSEMN (with the same pronunciation as awesome), which includes Obtaining, Scrubbing, Exploring, Modeling, and iNterpreting data. As a data scientist, I spend a very long time on the command line, especially when it comes to acquiring, collating and exploring data. And I’m not the only one to do that. Recently, Greg Reda […]

Time：2019101
LibShortText is an open source Python short text categorization toolkit (including headings, text messages, questions, sentences, etc.). On the basis of LibLinear, it further optimizes short text. Its main features are as follows:– Support multiclassification– Input text directly without preprocessing of feature Vectorization– Bigram, no pause, no partofspeech filtering– based on linear kernel SVM classifier […]

Time：2019930
As a layman of data mining, I have been boasting that I am an engineer of “kitchen knife flow” and have worked on three recommendation system projects. The three recommendation systems have different user groups, different business scenarios, and different regions and cultures. So lately, I’ve been wondering whether the recommendation system, which originated from […]

Time：2019929
Business UnderstandingThe business is categorized. We need to analyze the pronunciation of nine Japanese vowel ae, and then identify nine pronunciators according to the analysis. The ae. train file is the training data set. The ae. test file is used to test the training effect. The size_ae. test records the corresponding blocks of data in […]

Time：2019928
Sigmoid function LogisticRegression is a basic regression and classification algorithm in the field of data mining. For a long time, my understanding of logical regression was limited to “logical regression”. Until an intern was interviewed and talked to him about the hot radio rankings being made, it was necessary to map the weighted scores of […]