Linear algebra in deep learning



This series of articles is the reading notes of deep learning. This book is an excellent learning reference book for deep learning, which has certain difficulties. Therefore, this series of articles need to be read together with the original book, with better effect. If you do not read the original book, it is assumed that you have the general level of college higher mathematics.

Linear algebra in deep learning

Basic concepts of miscibility

  • Scalar: a single number
  • Vector: number of rows / columns
  • Matrices: 2D arrays
  • Tensor: generally refers to multidimensional (0-dimensional tensor is scalar, 1-dimensional tensor is vector, 2-dimensional tensor is matrix)
  • Transpose: fold along the main diagonal

The method of defining matrix in numpy and the method of transposing:

import numpy as np

a = np.array([[1, 2, 3], 
              [4, 5, 6]])
a = a.reshape(3, 2)

[[1 2]
 [3 4]
 [5 6]]

Basic arithmetic relation

It is consistent with matrix multiplication in Higher Mathematics:

a = np.array([[1, 2],
              [3, 4]])
b = np.array([[5, 6],
              [7, 8]])

print(a * b)
print(, b))

#Star (*)
[[ 5 12]
 [21 32]]

#Point multiplication
[[19 22]
 [43 50]]

#Point multiplication
[[19 22]
 [43 50]]

#Inverse operation
[[-2.   1. ]
 [ 1.5 -0.5]]


A norm is a function that measures the size of a length. Mathematically, norm includes vector norm and matrix norm.

Vector norm

Let’s first discuss the norm of a vector. A vector has direction and size, which is represented by a norm.

Linear algebra in deep learning

Strictly speaking, a norm is an arbitrary function that satisfies the following properties:

Linear algebra in deep learning

  • When p = 2, the norm (, which can be simplified into) is called Euclidean norm, and the distance can be calculated. But we can see that there is a square operation here, so in order to get rid of this square, we may find the square of the norm, that is, the norm, which will reduce an open operation. In the loss function mentioned later, both the norm and the square norm provide the same optimization goal, so the square norm is more commonly used, and the calculation is simpler, which can be calculated through calculation It’s fast.
  • When p = 1, norm () is the sum of the absolute values of each element of a vector. In the field of machine learning, norm is better than norm for distinguishing 0 from non-zero.
  • When p=0, the norm is not a norm in fact. Most places that mention the norm will emphasize that it is not a norm in the true sense. It is used to indicate how many non 0 elements are in the vector, but it is very useful in fact, and has application in the regularization and sparse coding in machine learning. In an example, it is said that: judge whether the user name and password are correct. The user name and password are two vectors. When the login is successful, there is an error in the user name and password. When the user name and password are both wrong. We know there is such a thing. It’s good to know when we see relevant content in the future.
  • When p is infinite, the norm is also called infinite norm and maximum norm. Represents the largest absolute value of an element in a vector.

Matrix norm

For matrix norm, we only talk about Frobenius norm, which is simply the square sum of all elements in the matrix and then the square. There are other definition methods, as follows, where the conjugate transpose represented by TR is trace, and the singular value represented by TR is trace

Linear algebra in deep learning

singular value decomposition

We are familiar with: singular decomposition is similar to: in which the values of row and column of matrix are, orthogonal matrix, diagonal matrix and orthogonal matrix, the elements on the diagonal of matrix are called singular values, in which the non-zero singular value is the square root of the eigenvalue of or, the left singular vector is the eigenvector of, and the right singular vector is the eigenvector of. Because singular matrix can’t be inversed, and inversing is a very good method to study matrix, so we should consider the method of regressing to find the second, to find the pseudo inverse, which is the closest to matrix inversing. We should turn the matrix into the most comfortable form to study other properties. The pseudo inverse turns the matrix into the main non-zero elements with rank on the diagonal, and the other elements in the matrix are all zero, which is also statistical The methods commonly used in learning are very easy to use in machine learning.


  • Diagonal matrix: only the main diagonal contains non-zero elements;
  • Unit vector: vector with unit norm,;
  • Vector orthogonal: if both vectors are non-zero, the included angle is 90 degrees;
  • Standard orthogonal: mutually orthogonal, norm 1;
  • Orthogonal matrix: row vector and column vector are orthonormal respectively;
  • Feature decomposition: the matrix is decomposed into feature vector and feature value;
  • Eigenvalues and eigenvectors: sum of;
  • Positive definite, semi positive definite and negative definite: all eigenvalues are positive, non negative and all negative.


One of the major characteristics of linear algebra is “a large string”, a unified knowledge system, which is closely linked with each other and very beautiful. It has an important application in deep learning, so we should learn it well.

If necessary, it is strongly recommended to listen to the course once, you can check here, hope you are happy to learn!

  • This article starts with the official account number: RAIS, and expects your attention.