Developers who use Python need data mining tools, because generally speaking, whether the data can be fully utilized depends on whether there are appropriate tools to clean up, prepare, merge, and correctly analyze. Today, I introduce 8 excellent data mining tools, which can be collected by interested partners.
Link to the original text: https://developer.51cto.com/a…
Author: fast Internet
Genism is a library for text topic model. It is mainly used to deal with language tasks, such as text similarity calculation, LDA, word2vec, etc. Gensim supports a variety of topic model algorithms including TF-IDF, LSA, LDA and word2vec, supports stream training, and provides API interfaces for some common tasks such as similarity calculation and information retrieval.
Tensorflow is an open source numerical computing framework of Google. It can flexibly build deep learning model by using data flow graph. It has rich applications in graphics classification, audio processing, recommendation system and natural language processing. It is one of the most popular machine learning frameworks at present.
SciPy, based on numpy, is a tool specially designed for crawlers. It has the functions of URL reading, HTML parsing, data storage, etc., and can provide matrix support, as well as a large number of matrix based numerical calculation modules, including: interpolation operation, linear algebra, image signal, fast Fourier transform, optimization processing, ordinary differential equation solution, etc., which can meet various requirements flexibly.
Numpy can provide array support, vector operation, and efficiently handle functions, linear algebra processing, etc. Moreover, numpy includes SciPy, Matplotlib, pandas and other libraries. It’s faster than Python built-in lists. Since numpy built-in functions process data at the same level as C language, it is recommended to use built-in functions as much as possible.
Matplotlibmatplotlib is a python package based on numpy. This package provides command data drawing tools, mainly used to draw some statistical graphs. It is one of the easy to use data visualization tools. It is mainly used for two-dimensional mapping. It needs a few lines of code to generate all kinds of charts, such as histogram, bar chart, scatter chart, etc. Three dimensional drawing is also supported, but it can only be used for simple drawing.
Pandas is a necessary tool for Python data mining, which should be familiar to many people. It is derived from numpy. It provides good data reading and writing functions, supports addition, deletion, modification and query. The data processing function is very powerful, and supports time series analysis function. It can analyze and explore data conveniently.
Scikit learn is an excellent Python library for machine learning. It can provide a complete learning toolbox, and can perform data processing, regression, classification, clustering, prediction, model analysis and other operations. The disadvantage is that there is no neural network and deep learning model, but this is also good, after all, it is very practical.
Keras is a python library that can help deep learning. It can not only build ordinary neural networks, but also build various deep learning models, such as self encoder, cyclic neural networks, recursive neural networks, convolution neural networks, etc. Moreover, it runs very fast, the steps are simplified and the degree of customization is high. It can easily build a deep neural network with hundreds of input nodes.