Datasets in Python’s sklearn Library


1、 Sklearn introduction

Scikit learn is a machine learning library developed by Python language, which is generally referred to as sklearn. At present, it is a well implemented Library in the general machine learning algorithm library. Its perfection lies not only in the number of algorithms, but also in a large number of detailed documents and examples. Its documents are easy to understand and can be used as a machine learning tutorial.

2、 Sklearn dataset types

There are many kinds of data sets in sklearn

  • Packaged dataset: sklearn. Datasets. Load < name >
  • Downloadable dataset: sklearn. Datasets. Fetch < name >
  • Generated dataset: sklearn. Datasets. Make < name >
  • Data set in svmlight / libsvm format: sklearn. Datasets. Load_svmlight_file (…)
  • Data set obtained from online download: sklearn. Datasets. Fetch_mldata (…)

3、 Sklearn data set

1. Tool classes related to data sets

Clear? Data? Home clear the specified directory

Get data home get sklearn data root

Load files load category data

Dump? Svmlight? File convert file format to svmlight / libsvm

Load file and format conversion

Load file and format conversion

2. Relevant text classification and clustering data sets

Fetch newsgroupsnews text classification dataset

Fetch ABCD news groups ABCD vectorized news text vectorized dataset

English news text classification data set

Data sets on face recognition
Fetch & LFW & pairs face data set

Fetch & LFW & people face data set

Fetch? Olivetti? Faces face data set

3. Data sets about images

Load? Sample? Image data set

Load? Sample? Images image data set

Load? Digits handwritten data set

4. Medical data set

Load breast cancer data set

Load diabetes data set

Load? Linnerud fitness training data set

5. Other data sets

Load? Wine data set

Data set of iris

Load Boston Boston housing data set

Fetching California housing data set

Fetch? KDDCUP99 intrusion detection data set

Fetch species distribution data set

Fetch covtype forest vegetation data set

Data set downloaded online

