Numpy: it’s enough to master the basic knowledge points of n-dimensional array

Time:2021-8-15

Absrtact: numpy is one of the most important extension libraries of Python. It is also a necessary tool for getting started with machine learning programming. A foreign programmer said that the basic operations of numpy were written down graphically, making the learning process easy and interesting.

Numpy is one of the most important extension libraries of Python and a necessary tool for getting started with machine learning programming. However, for beginners, numpy’s large number of operation methods are very difficult to remember.

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Recently, a foreign programmer said that the basic operations of numpy were written down graphically, making the learning process easy and interesting. In reddit machine learning community, 500 + likes were harvested in less than half a day.

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Now let’s follow his tutorial and learn together!

The tutorial content is divided intovector(one dimensional array)matrix(2D array)3D and higher dimensional arraysThree parts.

Numpy arrays and python lists

Before introducing the formal content, let’s take a look at the difference between numpy arrays and python lists.

At first glance, numpy arrays look like Python lists. They can all be used as containers, with the functions of getting and setting elements, and inserting and removing elements.

There are many similarities between the two. The following is an example of their operation:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Compared with Python lists, numpy arrays have the following characteristics:

More compact, especially in dimensions above one dimension; Vectorization is faster than Python lists, but adding elements at the end is slower than Python lists.

Numpy: it's enough to master the basic knowledge points of n-dimensional array

When adding an element at the end, the python list complexity is O (1), and the numpy complexity is O (n)

Vector operation

Vector initialization

One way to create a numpy array is to convert directly from a python list. The type of array elements is the same as that of list elements.

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Numpy arrays cannot be as long as Python lists because there is no space left at the end of the array.

Therefore, it is common to define a python list, operate on it, and then convert it to a numpy array, or usenp.zerosandnp.emptyInitialize the array and pre allocate the necessary space:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Sometimes we need to create an empty array with the same size and element type as the existing array:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

In fact, all functions that fill the created array with constants have one_likeTo create an array of constants of the same type:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

In numpy, you can usearangeperhapslinspaceTo initialize the monotone sequence array:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

If you need a floating-point array similar to [0,1,2.], you can change the type of rangeoutput: range3. Astype (float).

But there is a better way: the range function is sensitive to the data type. If an integer is used as a parameter, an integer array is generated; If you enter a floating-point number, such as orange (3.), a floating-point array is generated.

However, orange is not particularly good at handling floating-point numbers:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

This is because 0.1 is a limited decimal number for us, but not for computers. In binary, 0.1 is an infinite decimal and must be truncated somewhere. This is why adding the decimal part to step arange is usually a bad method: we may encounter a bug, resulting in the number of elements in the array is not the number we want, which will reduce the readability and maintainability of the code.

At this time,linspaceIt will come in handy. It is not affected by rounding errors and always generates the required number of elements.

For testing purposes, it is usually necessary to generate random arrays. Numpy provides random integer, uniform distribution, normal distribution and other forms of random numbers:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Vector index

Once the data is stored in the array, numpy provides a simple way to get it out:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Various indexes are shown above, such as taking out a specific interval, indexing from right to left, taking only odd digits, etc.

But they are all so-called views, that is, they do not store original data. And if the original array is changed after being indexed, the change of the original array will not be reflected.

These indexing methods allow you to allocate and modify the contents of the original array, so you need to pay special attention: only the last method below is to copy the array. If you use other methods, you may destroy the original data:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Another super useful way to get data from numpy array is Boolean index, which allows various logical operators to retrieve qualified elements:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Note: ternary comparison 3 < = a < = 5 in Python does not work in numpy array.

As mentioned above, Boolean indexes also overwrite arrays. It has two common functions, namelynp.whereandnp.clip

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Vector operation

Arithmetic operation is one of the most noticeable aspects of numpy speed. Numpy’s vector operator has reached the C + + level, avoiding Python’s slow loop.

Numpy allows the entire array to be manipulated like an ordinary number (addition, subtraction, multiplication, division, power):

Numpy: it's enough to master the basic knowledge points of n-dimensional array

△ as in Python, a / / B represents div B (integer division), and X * * n represents x ⁿ

Vectors can also perform similar operations with scalars in the same way:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Most mathematical functions have numpy correspondences for processing vectors:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Dot products and cross products of vectors also have operators:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

We can also perform trigonometric function, inverse trigonometric function and bevel operation:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Arrays can be rounded to an integer:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

△ lower bound of floor; Ceil takes the upper bound; Round is even if rounded

Numpy can also perform the following basic statistical operations (maximum and minimum value, average value, variance, standard deviation):

Numpy: it's enough to master the basic knowledge points of n-dimensional array

However, the sorting function has fewer functions than the python list corresponding function:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Elements in search vector

Contrary to the python list, numpy arrays do not have an index method.

Numpy: it's enough to master the basic knowledge points of n-dimensional array

  • One way to find elements is NP. Where (a = = x) 0, which is neither elegant nor fast, because the item to be found needs to traverse all the elements of the array from the beginning.
  • The faster way is to accelerate through next ((I [0] for I, V in NP. Ndenumerate (a) if v = = x), – 1) in numba.
  • Once the array is sorted, the situation becomes better: v = NP. Searchsorted (a, x); Return V if a [v] = = x else – 1 has a complexity of O (log n). It is indeed very fast, but it first requires the sorting time of O (n log n).

Compare floating point numbers

functionnp.allclose(a, b)Used to compare floating-point arrays with a given tolerance:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

  • np.allcloseAssume that the level of all comparison figures is 1 unit. For example, in the above figure, it thinks that 1E-9 and 2e-9 are the same. If you want to make a more detailed comparison, you need to specify the comparison level 1 through Atol: NP. Allclose (1E-9, 2e-9, Atol = 1e-17) = = false.
  • math.iscloseThe comparison is based on a reasonable ABS given by the user without assumptions_ TOL value: math.isclose (0.1 + 0.2 – 0.3, ABS)_ tol=1e-8) == True。

In addition, NP. Allclose has some small problems in the absolute and relative tolerance formulas. For example, there is allclose (a, b) for some numbers= allclose(b, a)。 These problems have been solved in the math.isclose function.

Matrix operation

There was once a special class matrix in numpy, but now it has been discarded. Therefore, the words matrix and 2D array will be used alternately below.

The syntax of matrix initialization is similar to that of vector:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Double parentheses are required here because the second positional parameter is reserved for dtype.

The generation of random matrix is also similar to the generation of vector:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Two dimensional indexing is more convenient than nested lists:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Like a one-dimensional array, the view in the figure above shows that the slice array is not actually copied. When you modify the array, the changes are also reflected in the slice.

Axis parameter

In many operations (such as summation), we need to tell numpy whether to operate across rows or columns. In order to use a general representation of arbitrary dimensions, numpy introducesaxisConcept: the axis parameter is actually the number of indexes in question: the first index is axis = 0, the second index is axis = 1, and so on.

Therefore, in a two-dimensional array, if axis = 0 is by column, then axis = 1 is by row.

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Matrix operation

Except for ordinary operators (such as +, -,, /, / / and)In addition to the element calculation, there is another one@*Operator computes matrix product:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

In the first part, we have seen the operation of vector product. Numpy allows the mixed operation of elements between vector and matrix, or even between two vectors:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Row vector and column vector

As can be seen from the above example, row vectors and column vectors are treated differently in a two-dimensional array.

By default, one-dimensional arrays are treated as row vectors in two-dimensional operations. Therefore, when multiplying the matrix by the row vector, you can use (n,) or (1, n), and the result will be the same.

If a column vector is required, there is a transpose method to operate it:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

The two operations that can generate a two digit group column vector from a one-dimensional array are using the commandreshapeRearrangement andnewaxisCreate a new index:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

The – 1 Parameter here means that reshape automatically calculates the array length on the second dimension. None acts as a shortcut to np.newaxis in square brackets, and an empty axis is added to the shortcut at the specified position.

Therefore, there are three types of vectors in numpy: one-dimensional array, two-dimensional row vector and two-dimensional column vector. This is a schematic diagram of an explicit conversion between the two:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

According to the rules, a one-dimensional array is implicitly interpreted as a two-dimensional row vector, so it is usually not necessary to convert between the two arrays, and the corresponding area is marked in gray.

Matrix operation

The connection matrix has two main functions:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Both functions work when stacking only matrices or only vectors. However, when it comes to the mixed stacking between one-dimensional array and matrix, vstack works normally: hstack will have a size mismatch error.

Because, as mentioned above, a one-dimensional array is interpreted as a row vector rather than a column vector. The solution is to convert it to a column vector, or use column_ Stack automatic execution:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

The reverse operation of stacking is splitting:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

The matrix can be copied in two ways:tileSimilar to copy and paste,repeatSimilar to paging printing.

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Specific columns and rows can be useddeleteTo delete:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Inverse operation is insert:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Append is like hstack. This function cannot automatically transpose a one-dimensional array, so you need to transpose or add length to the vector again, or use column_ Stack instead:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

In fact, if all we need to do is add constant values to the bounds of the array, thenpadThe function is sufficient:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Meshgrid

If we want to create the following matrix:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Both methods are slow because they use Python loops. The way to deal with this kind of problem in MATLAB is to create ameshgrid

Numpy: it's enough to master the basic knowledge points of n-dimensional array

The meshgrid function accepts any set of indexes, MGrid is only a slice, and indexes can only generate a complete index range. From function as described above, the provided function is called only once with the I and j parameters.

But in fact, there is a better way in numpy. There is no need to consume storage space on the entire matrix. It is sufficient to store only vectors of the correct size, and the operation rules will deal with the rest:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Without the indexing = ‘ij’ parameter, meshgrid will change the order of the parameters: J, I = NP. Meshgrid (J, I) – this is an “XY” mode for visualizing 3D graphs.

In addition to initializing on 2D or 3D arrays, meshgrid can also be used to index arrays:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Matrix statistics

Just like the statistical function mentioned earlier, after the two-dimensional array receives the axis parameter, it will take the corresponding statistical operation:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

In two-dimensional and higher dimensions, the argmin and argmax functions return the index of the maximum and minimum value:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

The axis parameter can also be used for all and any functions:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Matrix sorting

Although the axis parameter is useful for the functions listed above, it is not helpful for two-dimensional sorting:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Axis is by no means a substitute for the python list key parameter. However, numpy has multiple functions that allow sorting by column:

1. Sort the array by the first column: a [a [:, 0]. Argsort()]

Numpy: it's enough to master the basic knowledge points of n-dimensional array

After argsort, the index array of the original array is returned here.

This technique can be repeated, but care must be taken so that the next sort does not confuse the results of the previous sort:

a = a[a[:,2].argsort()]a = a[a[:,1].argsort(kind=’stable’)]a = a[a[:,0].argsort(kind=’stable’)]

2. There is an auxiliary function lexport, which sorts all available columns as described above, but always executes by row, for example:

  • A [NP. Lexport (NP. Flipud (a [2,5]. T))]: sort first through column 2 and then through column 5;
  • A [NP. Lexport (NP. Flipud (A.T)): sort by all columns from left to right.

Numpy: it's enough to master the basic knowledge points of n-dimensional array

3. There is another parameter, order, but starting with a normal (unstructured) array is neither fast nor easy to use.

4. Because this special operation mode is more readable and it may be a better choice, pandas is not error prone:

  • pd.DataFrame(a).sort_ values(by=[2,5]).to_ Numpy (): sort by column 2 and then by column 5.
  • pd.DataFrame(a).sort_ values().to_ Numpy (): sort by all columns from left to right

High dimensional array operation

When you create a 3D array by rearranging one-dimensional vectors or converting nested Python lists, the meaning of the index is (Z, y, x).

The first index is the number of the plane, and then the movement on the plane:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

This index order is convenient, for example, for retaining a pile of grayscale images: this a[i] is a shortcut to reference the ith image.

However, this index order is not universal. When processing RGB images, the (y, x, z) order is usually used: the first two are pixel coordinates, and the last is color coordinates (RGB in Matplotlib and BGR in openCV):

Numpy: it's enough to master the basic knowledge points of n-dimensional array

In this way, a specific pixel can be easily referenced: a [I, J] gives the RGB tuple (I, J) of the pixel.

Therefore, the actual command to create a specific geometry depends on the Convention of the domain being processed:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Obviously, numpy functions like hstack, vstack, or dstack do not know these conventions. The hard coded index order is (y, x, z), and the RGB image order is:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

△ RGB image array (for simplicity, only 2 colors in the above figure)

If the layout of the data is different, it is more convenient to stack images with the concatenate command and provide an explicit index number in the axis parameter:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

If axis is inconvenient to use, the array can be hard coded to the form of hstack:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

This conversion does not occur with actual replication. It’s just the order of mixed indexes.

Another operation of mixed index order is array transpose. Checking it may make us more familiar with 3D arrays.

According to the axis order we decide, the actual command to transpose all planes of the array will be different: for general arrays, it exchanges indexes 1 and 2, and for RGB images, it exchanges 0 and 1:

Numpy: it's enough to master the basic knowledge points of n-dimensional array

Interestingly, the default axes parameter (and unique operation mode) reverses the index order, which is inconsistent with the above two index order conventions.

Finally, there is another function that can save a lot of Python loops when dealing with multidimensional arrays and make the code more concise. This is Einstein summation functioneinsum

Numpy: it's enough to master the basic knowledge points of n-dimensional array

It will sum along the array of duplicate indexes.

Finally, to master numpy, you can go to the project on GitHub——100AvenueNumPyExercisesTo verify their learning achievements.

This article is shared from the Huawei cloud community “numpy graphics: Mastering the basic knowledge points of n-dimensional arrays, it’s enough to read this article”, original author: hwcloudai.

Click focus to learn about Huawei cloud’s new technologies for the first time~