Absrtact: numpy is one of the most important extension libraries of Python. It is also a necessary tool for getting started with machine learning programming. A foreign programmer said that the basic operations of numpy were written down graphically, making the learning process easy and interesting.
Numpy is one of the most important extension libraries of Python and a necessary tool for getting started with machine learning programming. However, for beginners, numpy’s large number of operation methods are very difficult to remember.
Recently, a foreign programmer said that the basic operations of numpy were written down graphically, making the learning process easy and interesting. In reddit machine learning community, 500 + likes were harvested in less than half a day.
Now let’s follow his tutorial and learn together!
Numpy arrays and python lists
Before introducing the formal content, let’s take a look at the difference between numpy arrays and python lists.
At first glance, numpy arrays look like Python lists. They can all be used as containers, with the functions of getting and setting elements, and inserting and removing elements.
There are many similarities between the two. The following is an example of their operation:
Compared with Python lists, numpy arrays have the following characteristics:
More compact, especially in dimensions above one dimension; Vectorization is faster than Python lists, but adding elements at the end is slower than Python lists.
△When adding an element at the end, the python list complexity is O (1), and the numpy complexity is O (n)
One way to create a numpy array is to convert directly from a python list. The type of array elements is the same as that of list elements.
Numpy arrays cannot be as long as Python lists because there is no space left at the end of the array.
Therefore, it is common to define a python list, operate on it, and then convert it to a numpy array, or usenp.zerosandnp.emptyInitialize the array and pre allocate the necessary space:
Sometimes we need to create an empty array with the same size and element type as the existing array:
In fact, all functions that fill the created array with constants have one_likeTo create an array of constants of the same type:
In numpy, you can usearangeperhapslinspaceTo initialize the monotone sequence array:
If you need a floating-point array similar to [0,1,2.], you can change the type of rangeoutput: range3. Astype (float).
But there is a better way: the range function is sensitive to the data type. If an integer is used as a parameter, an integer array is generated; If you enter a floating-point number, such as orange (3.), a floating-point array is generated.
However, orange is not particularly good at handling floating-point numbers:
This is because 0.1 is a limited decimal number for us, but not for computers. In binary, 0.1 is an infinite decimal and must be truncated somewhere. This is why adding the decimal part to step arange is usually a bad method: we may encounter a bug, resulting in the number of elements in the array is not the number we want, which will reduce the readability and maintainability of the code.
At this time,linspaceIt will come in handy. It is not affected by rounding errors and always generates the required number of elements.
For testing purposes, it is usually necessary to generate random arrays. Numpy provides random integer, uniform distribution, normal distribution and other forms of random numbers:
Once the data is stored in the array, numpy provides a simple way to get it out:
Various indexes are shown above, such as taking out a specific interval, indexing from right to left, taking only odd digits, etc.
But they are all so-called views, that is, they do not store original data. And if the original array is changed after being indexed, the change of the original array will not be reflected.
These indexing methods allow you to allocate and modify the contents of the original array, so you need to pay special attention: only the last method below is to copy the array. If you use other methods, you may destroy the original data:
Another super useful way to get data from numpy array is Boolean index, which allows various logical operators to retrieve qualified elements:
Note: ternary comparison 3 < = a < = 5 in Python does not work in numpy array.
As mentioned above, Boolean indexes also overwrite arrays. It has two common functions, namelynp.whereandnp.clip：
Arithmetic operation is one of the most noticeable aspects of numpy speed. Numpy’s vector operator has reached the C + + level, avoiding Python’s slow loop.
Numpy allows the entire array to be manipulated like an ordinary number (addition, subtraction, multiplication, division, power):
△ as in Python, a / / B represents div B (integer division), and X * * n represents x ⁿ
Vectors can also perform similar operations with scalars in the same way:
Most mathematical functions have numpy correspondences for processing vectors:
Dot products and cross products of vectors also have operators:
We can also perform trigonometric function, inverse trigonometric function and bevel operation:
Arrays can be rounded to an integer:
△ lower bound of floor; Ceil takes the upper bound; Round is even if rounded
Numpy can also perform the following basic statistical operations (maximum and minimum value, average value, variance, standard deviation):
However, the sorting function has fewer functions than the python list corresponding function:
Elements in search vector
Contrary to the python list, numpy arrays do not have an index method.
- One way to find elements is NP. Where (a = = x) 0, which is neither elegant nor fast, because the item to be found needs to traverse all the elements of the array from the beginning.
- The faster way is to accelerate through next ((I  for I, V in NP. Ndenumerate (a) if v = = x), – 1) in numba.
- Once the array is sorted, the situation becomes better: v = NP. Searchsorted (a, x); Return V if a [v] = = x else – 1 has a complexity of O (log n). It is indeed very fast, but it first requires the sorting time of O (n log n).
Compare floating point numbers
functionnp.allclose(a, b)Used to compare floating-point arrays with a given tolerance:
- np.allcloseAssume that the level of all comparison figures is 1 unit. For example, in the above figure, it thinks that 1E-9 and 2e-9 are the same. If you want to make a more detailed comparison, you need to specify the comparison level 1 through Atol: NP. Allclose (1E-9, 2e-9, Atol = 1e-17) = = false.
- math.iscloseThe comparison is based on a reasonable ABS given by the user without assumptions_ TOL value: math.isclose (0.1 + 0.2 – 0.3, ABS)_ tol=1e-8) == True。
In addition, NP. Allclose has some small problems in the absolute and relative tolerance formulas. For example, there is allclose (a, b) for some numbers= allclose(b, a)。 These problems have been solved in the math.isclose function.
There was once a special class matrix in numpy, but now it has been discarded. Therefore, the words matrix and 2D array will be used alternately below.
The syntax of matrix initialization is similar to that of vector:
Double parentheses are required here because the second positional parameter is reserved for dtype.
The generation of random matrix is also similar to the generation of vector:
Two dimensional indexing is more convenient than nested lists:
Like a one-dimensional array, the view in the figure above shows that the slice array is not actually copied. When you modify the array, the changes are also reflected in the slice.
In many operations (such as summation), we need to tell numpy whether to operate across rows or columns. In order to use a general representation of arbitrary dimensions, numpy introducesaxisConcept: the axis parameter is actually the number of indexes in question: the first index is axis = 0, the second index is axis = 1, and so on.
Therefore, in a two-dimensional array, if axis = 0 is by column, then axis = 1 is by row.
Except for ordinary operators (such as +, -,, /, / / and）In addition to the element calculation, there is another one@*Operator computes matrix product:
In the first part, we have seen the operation of vector product. Numpy allows the mixed operation of elements between vector and matrix, or even between two vectors:
Row vector and column vector
As can be seen from the above example, row vectors and column vectors are treated differently in a two-dimensional array.
By default, one-dimensional arrays are treated as row vectors in two-dimensional operations. Therefore, when multiplying the matrix by the row vector, you can use (n,) or (1, n), and the result will be the same.
If a column vector is required, there is a transpose method to operate it:
The two operations that can generate a two digit group column vector from a one-dimensional array are using the commandreshapeRearrangement andnewaxisCreate a new index:
The – 1 Parameter here means that reshape automatically calculates the array length on the second dimension. None acts as a shortcut to np.newaxis in square brackets, and an empty axis is added to the shortcut at the specified position.
Therefore, there are three types of vectors in numpy: one-dimensional array, two-dimensional row vector and two-dimensional column vector. This is a schematic diagram of an explicit conversion between the two:
According to the rules, a one-dimensional array is implicitly interpreted as a two-dimensional row vector, so it is usually not necessary to convert between the two arrays, and the corresponding area is marked in gray.
The connection matrix has two main functions:
Both functions work when stacking only matrices or only vectors. However, when it comes to the mixed stacking between one-dimensional array and matrix, vstack works normally: hstack will have a size mismatch error.
Because, as mentioned above, a one-dimensional array is interpreted as a row vector rather than a column vector. The solution is to convert it to a column vector, or use column_ Stack automatic execution:
The reverse operation of stacking is splitting:
The matrix can be copied in two ways:tileSimilar to copy and paste,repeatSimilar to paging printing.
Specific columns and rows can be useddeleteTo delete:
Inverse operation is insert:
Append is like hstack. This function cannot automatically transpose a one-dimensional array, so you need to transpose or add length to the vector again, or use column_ Stack instead:
In fact, if all we need to do is add constant values to the bounds of the array, thenpadThe function is sufficient:
If we want to create the following matrix:
Both methods are slow because they use Python loops. The way to deal with this kind of problem in MATLAB is to create ameshgrid：
The meshgrid function accepts any set of indexes, MGrid is only a slice, and indexes can only generate a complete index range. From function as described above, the provided function is called only once with the I and j parameters.
But in fact, there is a better way in numpy. There is no need to consume storage space on the entire matrix. It is sufficient to store only vectors of the correct size, and the operation rules will deal with the rest:
Without the indexing = ‘ij’ parameter, meshgrid will change the order of the parameters: J, I = NP. Meshgrid (J, I) – this is an “XY” mode for visualizing 3D graphs.
In addition to initializing on 2D or 3D arrays, meshgrid can also be used to index arrays:
Just like the statistical function mentioned earlier, after the two-dimensional array receives the axis parameter, it will take the corresponding statistical operation:
In two-dimensional and higher dimensions, the argmin and argmax functions return the index of the maximum and minimum value:
The axis parameter can also be used for all and any functions:
Although the axis parameter is useful for the functions listed above, it is not helpful for two-dimensional sorting:
Axis is by no means a substitute for the python list key parameter. However, numpy has multiple functions that allow sorting by column:
1. Sort the array by the first column: a [a [:, 0]. Argsort()]
After argsort, the index array of the original array is returned here.
This technique can be repeated, but care must be taken so that the next sort does not confuse the results of the previous sort:
a = a[a[:,2].argsort()]a = a[a[:,1].argsort(kind=’stable’)]a = a[a[:,0].argsort(kind=’stable’)]
2. There is an auxiliary function lexport, which sorts all available columns as described above, but always executes by row, for example:
- A [NP. Lexport (NP. Flipud (a [2,5]. T))]: sort first through column 2 and then through column 5;
- A [NP. Lexport (NP. Flipud (A.T)): sort by all columns from left to right.
3. There is another parameter, order, but starting with a normal (unstructured) array is neither fast nor easy to use.
4. Because this special operation mode is more readable and it may be a better choice, pandas is not error prone:
- pd.DataFrame(a).sort_ values(by=[2,5]).to_ Numpy (): sort by column 2 and then by column 5.
- pd.DataFrame(a).sort_ values().to_ Numpy (): sort by all columns from left to right
High dimensional array operation
When you create a 3D array by rearranging one-dimensional vectors or converting nested Python lists, the meaning of the index is (Z, y, x).
The first index is the number of the plane, and then the movement on the plane:
This index order is convenient, for example, for retaining a pile of grayscale images: this a[i] is a shortcut to reference the ith image.
However, this index order is not universal. When processing RGB images, the (y, x, z) order is usually used: the first two are pixel coordinates, and the last is color coordinates (RGB in Matplotlib and BGR in openCV):
In this way, a specific pixel can be easily referenced: a [I, J] gives the RGB tuple (I, J) of the pixel.
Therefore, the actual command to create a specific geometry depends on the Convention of the domain being processed:
Obviously, numpy functions like hstack, vstack, or dstack do not know these conventions. The hard coded index order is (y, x, z), and the RGB image order is:
△ RGB image array (for simplicity, only 2 colors in the above figure)
If the layout of the data is different, it is more convenient to stack images with the concatenate command and provide an explicit index number in the axis parameter:
If axis is inconvenient to use, the array can be hard coded to the form of hstack:
This conversion does not occur with actual replication. It’s just the order of mixed indexes.
Another operation of mixed index order is array transpose. Checking it may make us more familiar with 3D arrays.
According to the axis order we decide, the actual command to transpose all planes of the array will be different: for general arrays, it exchanges indexes 1 and 2, and for RGB images, it exchanges 0 and 1:
Interestingly, the default axes parameter (and unique operation mode) reverses the index order, which is inconsistent with the above two index order conventions.
Finally, there is another function that can save a lot of Python loops when dealing with multidimensional arrays and make the code more concise. This is Einstein summation functioneinsum：
It will sum along the array of duplicate indexes.
Finally, to master numpy, you can go to the project on GitHub——100AvenueNumPyExercisesTo verify their learning achievements.
This article is shared from the Huawei cloud community “numpy graphics: Mastering the basic knowledge points of n-dimensional arrays, it’s enough to read this article”, original author: hwcloudai.