What is the “dimension” in machine learning?


In machine learning, the word “dimension” should be a high-frequency word, which often appears in people’s vision. For example, random forest is built by random feature extraction to avoid high-dimensional calculation; for example, sklearns must be at least two-dimensional when introducing feature matrix; the purpose of feature selection is through descent. Dimension reduces the computational cost of the algorithm… These languages were normally used by me until one day a little friend asked me, “What is dimension?” I…

After careful consideration, I sum up as follows:

1. For arrays and Series

aboutArrays and SeriesFor example,Dimensions are the results of functional shapes, and a few numbers are returned in shapes, that is, the dimensions.。 Data other than indexing is called one-dimensional (when shape returns the number of data on the unique dimension), and two-dimensional (shape returns row x) is also called table. A table is at most two-dimensional, and a complex table constitutes a higher dimension. When there are two tables with three rows and four columns in an array, shape returns (higher dimensions, rows, columns). When there are two groups of tables with three rows and four columns in the array, the data is four-dimensional and the shape returns (2,2,3,4).

Each table in an array can be oneCharacteristic MatrixOr a DataFrame, where these structures always have only one table, so there must be rows, where rows are samples and columns are features. For each table,Dimensions refer to the number of samples or features, which are not specifically specified, but the number of features.。 In addition to index, one feature is one-dimensional, two features are two-dimensional, and N features are n-dimensional.

2. For images

Dimensions are the number of feature vectors in an image.Eigenvector can be understood as coordinate axis, a feature vector defines a straight line, is one-dimensional, two mutually perpendicular feature vectors define a plane, that is, a rectangular coordinate system, that is, two-dimensional, three mutually perpendicular feature vectors define a space, that is, a three-dimensional rectangular coordinate system, that is, three-dimensional. More than three feature vectors are perpendicular to each other, which defines a high-dimensional space that the human eye cannot see or imagine.

3. Dimension Reduction in Dimension Reduction Algorithms

In dimension reduction algorithm, “dimension reduction” refers to the reduction of the number of features in the feature matrix.As we said in last week’s lecture, the purpose of dimensionality reduction is to make it possible for us to reduce dimensionality.The algorithm has faster operation and better effect.But there is another demand:Data visualization。 From the graph above, we can see that the dimensions of image and feature matrix can correspond to each other, that is, a feature corresponds to a feature vector and a coordinate axis. Therefore, the three-dimensional and below feature matrix can be visualized, which can help us quickly understand the distribution of data, while the three-dimensional and above feature matrix can not be visualized, and the nature of data is difficult to understand.

Well, that’s the summary of dimensionality reduction. If you have new ideas, you are welcome to discuss them together.~