# Preface

Tf2.0 is what we learned before. It was written in private ynote and rewritten in SF.

Tf2.0-gpu installation tutorial portal: https://segmentfault.com/a/11

I’ve been exposed to TF1 and manual session mechanism before. It’s a headache to watch Tf2.0 does not need to do this

Tf2.0 is easier to understand (gradually Python and numpic)

The back end of tf2.0 uses the keras interface (building the network layer), which is more convenient.

The model layer defined by the keras interface of tf2.0 implements the call method. Means that most instance objects can be called directly as functions

# Row and column axis

Take the list for example (Abstract example, bread slices piled up…)

```
[ා the outermost layer, no need to remember
[1,2,3],
#Slice 1 (first sample)
[4,5,6],
#Bread slice 2 (second sample)
]
```

- Each inner layer list represents one sample, for example, [1,2,3] represents the first sample as a whole
- The innermost element represents the attribute value Eg: 1, 2 and 3 are all attribute values.
- Example: when element 5 is taken out separately, it is regarded as “the second sample, attribute value 5” (of course, the horizontal and vertical indexes are still taken from 0)

Take the data just now as an example:

```
t = tf.constant(
[
[1., 2., 3.],
[4., 5., 6.]
]
)
Print (TF. Reduce ﹣ sum (T, axis = 0)) ᦇ sum operation, flatten up and down, aggregate samples
>> tf.Tensor([5. 7. 9.], shape=(3,), dtype=float32)
Print (TF. Reduce ﹣ sum (T, axis = 1)) ᦇ sum operation, squash left and right, aggregate attribute
>> tf.Tensor([ 6. 15.], shape=(2,), dtype=float32)
```

Note: the numpy axis is the same. I used the x-axis and y-axis to abstract and memorize at first, but I can’t remember them at all.. Too much conceptual confusion.

But if you can’t remember, every time you use various operation and aggregation APIs, you will spend a lot of time in your mind to sort it out again. waste time.

So you have to practice understanding. You have to do the following: “you can see the meaning of this dimension’s data and the meaning of axis operation at a glance”

My own way of remembering (axis = 0, axis = 1):

- 0 axis usually represents, sample (upper and lower flattening)
- 1 axis usually represents, attribute (left and right squash)

It is often necessary to use the correlation aggregation function of axis parameter:

```
TF. Reduce sum() (sum)
TF. Reduce? Mean()? Average
TF. Reduce × max() × Max
TF. Reduce? Min()? Min
TF. Square() (square)
TF. Concat() (splicing)
Note: if axis parameter does not transfer, all dimensions will be operated.
```

# Common import

```
#It will be used basically
import numpy as np
import tensorflow as tf
from tensorflow import keras
#Optional import
import os, sys, pickle
import scipy
import pandas as pd
import matplotlib.pyplot as plt
From sklearn.preprocessing import standardscaler
From sklearn.model? Selection import train? Test? Split? Training test separation
```

# Tensor & operator

## Constant (ordinary tensor)

### Definition:

```
C = TF. Constant ([[1,2,3.], [4,5,6.]]]); (the number is followed by a dot to represent the type of transition to float32)
print(c)
>> tf.Tensor([[1. 2. 3.] [4. 5. 6.]], shape=(2, 3), dtype=float32)
```

### Six operations (addition, subtraction, multiplication and division, matrix multiplication, matrix transposition)

Let’s start with matrix multiplication (I learned it in college, but I won’t talk about the operation process):

```
Syntax format: a @ b
Condition requirement: number of columns of a = = = number of rows of B (must be equal)
Eg: (5 rows and 2 columns @ 2 rows and 10 columns = 5 rows and 10 columns)
Special case: (dimension 0, must be equal)
t1 = tf.ones([2, 20, 30])
t2 = tf.ones([2, 30, 50])
print( ([email protected]).shape )
>>(2, 20, 50) the 0-th dimension remains unchanged, and the last 2-th dimension is still calculated by matrix multiplication
```

Matrix transpose:

```
tf.transpose(t)
#It can not only be transposed, but also exchange dimensions
T2 = tf.post (T, [1,0]) the column changes from row to row Similar to basic transpose (reverse index, axis reverse order)
#Or if we take (2100200, 3) shape as an example
t = tf.ones([2, 100, 200, 3])
Print (TF. Post (T, [1, 3, 0, 2]). Shape)
>> (100, 3, 2, 200)
#Original 1 axis - > put on current 0 axis
#Original 3-axis - > now 1-axis
#Original 0-axis - > now 2-axis
#Original 2-axis - > now 3-axis
```

Addition, subtraction, multiplication and division all have “broadcast mechanism”:

Interpretation of image (Interpretation of broadcasting mechanism):

```
I try to explain it in white:
1. My shape is not the same as yours, but I will try my best to expand to your shape to calculate with you.
2. If there is a vacancy after expansion, make a copy of yourself and fill in it (if it is not filled completely, it means that it cannot be calculated)
3. Small shape obeys big shape (I am thinner than you, I can move You don't have to move... )
eg:
t = tf.constant(
[
[1, 2, 3],
[4, 5, 6],
]
)
t + [1,2,1]
Process analysis:
[1,2,1] is obviously a small shape, it will automatically try to change to a large shape - >
The first step is deformation (the outermost large frame meets the requirements, and there are still gaps in it):
[
[1,2,1],
]
The second step is to transform (copy yourself and fill in the blanks):
[
[1,2,1],
[1,2,1], that's the copy itself
]
Step 3 operation (add bit by bit)
[ + [ = [
[1,2,3], [1,2,1], [2,4,4],
[4,5,6], [1,2,1], [5,7,7],
] ] [
```

Abstract (broadcast mechanism) demonstration:

```
If the shape of T1 is [5200100,50]
If the shape of T2 is [5200]
Note: the following data demonstration is all about the shape, shape and shape of tensor!
[5,200,1,50]
#Obviously, at the beginning, the two rows of data dimensions did not match, and the shapes were not aligned
[5,1]
------------------------
[5,200,1,50]
[5,50]
#Align this line to fill 50
------------------------
[5,200,5,50]
#Align this line to fill 5
[5,50]
------------------------
[5,200,5,50]
[1, 1, 5,50]
#There are two extensions in this line. Fill in 1 by default
------------------------
[5,200,5,50]
[1,200, 5,50]
#This line is aligned to fill 200
------------------------
[5,200,5,50]
[5,200,5,50]
#Align this line to fill 5
Be careful:
1. Each dimension shape: one of them must be 1 to be aligned (otherwise, error - >)
[5,200,1,50]
[5,20]
#In the same way, it starts to align to the right, but neither 50 nor 20 is 1, so neither can be aligned, so error
2. If the dimension is missing:
Still all right aligned
Then start from the right and fill in the shape of each dimension
Then expand the dimension and set the shape to 1 by default
Then fill in the shape of the extended dimension (because it is set to 1 by default, it can be filled)
```

Of course, all of the above are automatic broadcast mechanisms during operation

You can also broadcast manually:

```
T1 = TF. Ones ([2, 20, 1]) original shape [2, 20, 1]
Print (TF. Broadcast_to (T1, [5,2,20,30]). Shape)
[5,2,20,30]
[2,20, 1]
-----------
[5,2,20,30]
[2,20,30]
-----------
[5,2,20,30]
[1,2,20,30]
-----------
[5,2,20,30]
[5,2,20,30]
Note: because it is a manual broadcast, only the original shape can supplement the dimension to the target shape itself, or supplement the shape“
And the target shape can't move at all.
```

F. expand ﹣ dims + tile instead = > TF. Broadcasting

In the same example above, I want to change the shape [2,20,1] to [5,2,20,30]

```
t1 = tf.ones([2, 20, 1])
a = tf.expand_dims(t1,axis=0)
#Insert an axis at the 0 axis index, result [1,2,20,1]
print(tf.tile(a,[5,1,1,30]).shape)
#Results [5, 2, 20, 30]
Technological process:
[5,2,20,30]
[2,20,1]
-----------
[5,2,20,30]
# tf.expand_dims(t1,axis=0)
[1,2,20,1]
#Index 0 inserts a new axis (dimension increase)
-----------
[5,2,20,30]
#Tf.tile (5,1,1,30) (shape alignment, tile each parameter represents the shape expansion of the corresponding axis by several times)
[5,2,30,30] 1*5 2*1 20*1 1*30
```

The difference between tile and Broadcasting:

- Tile is a physical copy, physical space increases
- And broadcasting is virtual replication (for the sake of calculation, the replication implicitly implemented has no physical space increase)
- Tile can copy n * m to any (integer times, Mn is the same integer)
- However, broadcasting (the original data shape can only be expanded when there is only one 1 * n, n is integer)

Compact dimension (TF. Squeeze):

It is to delete every dimension of 1 (just like mathematics a * 1 = a)

```
print(tf.squeeze(tf.ones([2,1,3,1])).shape)
>>> (2, 3)
```

Of course, you can also specify dimension compression (not specified by default, all dimensions are 1):

```
print(tf.squeeze(tf.ones([2,1,3,1]), axis=-1).shape)
>>> (2, 1, 3)
```

### Index & slice

Soul Description: no matter index or slice, (row and column are separated by comma), and no matter row and column, index starts from 0.

Index: take a value

```
Print (t [1,2]) indicates the index of the row before the comma, and the index of the column after the comma
>> tf.Tensor(6.0, shape=(), dtype=float32)
```

Slice: take substructure (there are two ways)

Mode 1 (colon slice):

```
Print (t [:, 1:]) ා the comma is preceded by a line. Write only: for all lines. A comma is followed by a column 1: represents the second column to the last
>> tf.Tensor([[2. 3.] [5. 6.]], shape=(2, 2), dtype=float32)
```

Mode 2 (ellipsis slice): (I believe that no one who does not know numpy has heard of Python’s ellipsis, that is, ellipsis class)

Run and play this line of code yourself first:

```
print(... is Ellipsis)
>>> True
```

Back to the main topic: (ellipsis… Slice, is for multi-dimensional, if it is two-dimensional, use directly: OK)

```
(let's take three-dimensional as an example, which is not suitable to be called row and column.)
#Shape is (2, 2, 2)
t = tf.constant(
One dimensional
Two dimensional
[1, 2], "3D"
[3, 4],
],
[
[5, 6],
[7, 8],
],
]
)
Pseudocode: T [1D slice, 2D slice, 3D slice]
Code: T [:,:, 0:1]
#One dimension does not move, two dimensions do not move, three dimensions take a piece of data
Results: shape was (2,2,1)
One dimensional
Two dimensional
[1], "3D"
[3],
],
[
[5],
[7],
],
]
```

If you can’t understand it, read it several times.

Find out, even if I don’t do the 1-D and 2-D slices, I am forced to write two: to occupy the space

So if there are 100 dimensions, I just want to slice the last dimension The first 99 don’t need to be moved. Do I have to write 99: occupy a place??

No, it can be solved by the following code:

`Print (t [..., 0:1]) this is the role of... (note, only useful in numpy and tensorflow)`

Sensor to numpy type

`T. numpy() (sensor to numpy type)`

## variable

Definition:

```
V = TF. Variable (ා note: V is upper case
[
[1, 2, 3],
[4, 5, 6]
]
)
```

Variable assignment (with the nature of its own assignment):

```
Note: once a variable is defined, the shape is determined Assignment (only values of the same shape can be assigned)
v.assign(
[
[1,1,1],
[1,1,1],
]
)
print(v)
>> <tf.Variable 'Variable:0' shape=(2, 3) dtype=int32, numpy=array([[1, 1, 1],[1, 1, 1]])>
```

Variable value (equivalent to conversion to tensor):

```
In particular, the variable itself is of variable type, and the value is taken as tensor (including slice value, index value, etc.)
print( v.value() )
>> tf.Tensor([[1 2 3] [4 5 6]], shape=(2, 3), dtype=int32)
```

Variable index & slice assignment:

```
Constant: is immutable. So only value, no value.
Variable: both value and value can be assigned
V.assign (XX) is similar to v = XX of Python
v[0, 1].assign(100)
#Index assignment, V. assign is equivalent to
v[0, :].assign([10, 20, 30])
#Note that the need for slice assignments to pass is the container type
Special note: as mentioned before, variable structure shape is immutable, and the assigned value is data.
But when you assign values, you should always be careful not to change the original shape of variables
Take slice assignment as an example:
You have to give as many as you cut And the assigned value structure should be consistent.
Take a chestnut: you dig out a small cube from the cube. Then you have to fill a small cube with the same shape.
There are also two extension APIs:
v.assign_add()
#Python like+=
v.assign_sub()
#Python like-=
```

Variable index & slice value

`Same as constant slice value (omitted)`

Variable to numpy

`print(v.numpy())`

## Ragged tensor

Definition:

```
rag_tensor = tf.ragged.constant(
[
[1,2],
[2,3,4,5],
]
)
#Allow uneven data length for each dimension
```

Splicing: if “splicing irregular tensor” is needed (tf.concat (axis =) can be used)

```
0 axis: vertical splicing (the samples are stacked vertically) can be spliced at will After splicing, it is still "irregular tensor"
Axis 1: horizontal splicing (attribute horizontal splicing) at this time, you need to have the same number of samples, otherwise, it will be wrong
Summary: the samples are randomly spelled vertically, and the attributes are horizontally spelled (the number of samples must be equal)
```

Raggedtensor normal tensor:

```
Note: normal tensor is required to be length aligned. Fill in 0 at the end of misalignment
tensor = rag_tensor.to_tensor()
```

## Sparse tensor

Characteristics (can be understood as record index):

- Only coordinate positions other than 0 are recorded. Indexes parameter: each sublist represents a coordinate
- Although only coordinates are recorded, only coordinate positions have values when it is converted to normal tensor, and the values of other positions are all 0
- Fill range, depending on the setting of deny? Shape

Definition:

```
s = tf.SparseTensor(
indices=[[0, 1], [1, 0], [2, 3]],
#Note that this index setting needs to be set in order (from left to right, from top to bottom)
values=[1, 2, 3],
#Set the above three coordinate values to 1, 2 and 3 respectively
dense_shape=[3, 4]
#Total range of tensor
)
print(s)
>> SparseTensor(indices=tf.Tensor([[0 1], [1 0],[2 3]], shape=(3, 2), dtype=int64)。。。
```

Change to normal tensor (after changing to normal tensor, what you see is to store the real value)

```
tensor = tf.sparse.to_dense(s)
print(tensor)
>> tf.Tensor([ [0 1 0 0],[2 0 0 0],[0 0 0 3] ], shape=(3, 4), dtype=int32)
```

If you use to une() above, you may encounter an error:

```
error: is out of range
The reason for this error is to create tf.sparsetensor (indexes =), which is also mentioned earlier. The indexes should be written in order (from left to right, from top to bottom)
Of course, you can also use the sorting API to sort first, and then turn to:
eg:
_ = tf.sparse.reorder(s)
#Sort index first
tensor = tf.sparse.to_dense(_)
Turn again
```

# tf.function

This API is used as a decorator to convert Python syntax conversion into TF syntax and graph structure as effectively as possible

```
import tensorflow as tf
import numpy as np
@tf.function
def f():
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
return a + b
print( f() )
>>> tf.Tensor([5 7 9], shape=(3,), dtype=int32)
```

You should have found a feature that none of the TF syntax is written in the f () function we defined Only one line of @ tf.function is decorated

The return value of the call result is a sensor.

This is the function of @ tf.function decorator!

Of course, TF operations can also be written in functions, which is no problem.

However, it should be noted that variables are not allowed to be defined in a function. Variables to be defined should be defined outside the function

```
a = tf.Variable([1,2,3])
#If you need a sensor variable, you should put it outside
@tf.function
def f():
# a = tf.Variable([1,2,3])
#Variables cannot be defined in this!
pass
```

# Merge add (TF. Concat)

My understanding is that

```
#The principle of merging similar items is that there is one item different and the other items are exactly the same.
#Precondition: (at most, the shape of one dimension is not equal Attention is the most)
t1 = tf.ones([2,5,6])
t2 = tf.ones([6,5,6])
print( tf.concat([t1,t2],axis=0).shape )
#Axis = 0. If the axis 0 is passed, then the other axes will not change. Merge 0 axis only
>> (8,5,8)
```

# Stack dimension reduction (TF. Stack)

My understanding is (elementary school arithmetic, carry, (carry is to expand a dimension to represent the number))

```
#Precondition: all dimension shapes must be equal.
tf1 = tf.ones([2,3,4])
tf2 = tf.ones([2,3,4])
tf3 = tf.ones([2,3,4])
print(tf.stack([tf1,tf2,tf3], axis=0).shape)
#You can imagine three groups [2, 3, 4], and then three groups as a new dimension are inserted into the corresponding index of axis.
>> (3, 2, 3, 4)
#If this is TF. Concat(), the result is (6,3,4)
```

# Split dimension reduction (TF. Unstack)

Tf.stack and tf.stack are inverse processes. When you specify the axis dimension, it will be split into several data and reduce the dimension at the same time.

```
a = tf.ones([3, 2, 3, 4])
for x in tf.unstack(a, axis=0):
print(x.shape)
The results are as follows (divided into three [2,3,4])
>>> (2, 3, 4)
>>> (2, 3, 4)
>>> (2, 3, 4)
```

# Split without dimension reduction (TF. Split)

### Syntax:

The difference between tf.unstack and tf.unstack is that tf.unstack is an average division and dimensionality reduction method, and tf.stack can specify the number of partitions

```
a = tf.ones([2,4,35,8])
for x in tf.split(a, axis=3,num_or_size_splits=[2,2,4]):
print(x.shape)
Result:
>>(2, 4, 35, 2) × last dimension 2
>>(2, 4, 35, 2) × last dimension 2
>>(2, 4, 35, 4) - last dimension 4
```

### Usage scenario:

If we want to cut the dataset into (train test valid) 3 parts, the proportion is 6:2:2

Method 1: (scikit learn Cut 2 times continuously)

```
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.2)
x_train, x_valid, y_train, y_valid = train_test_split(x_train, y_train,test_size=0.2)
#The source code shows test_size if not passed. The default is 0.25.
#Thinking, because scikit learn can only cut two results: so we need to cut two times:
#First cut from complete training set (remaining training set, test set)
#Second cut from the remaining data set (remaining training set 2, verification set)
```

Method 2: (TF. Split)

```
x = tf.ones([1000, 5000])
y = tf.ones([1000, 1])
x_train, x_test, x_valid = tf.split(
x,
num_or_size_splits=[600,200,200],
Cut 3 pieces
axis=0
)
y_train, y_test, y_valid = tf.split(
y,
num_or_size_splits=[600,200,200],
#Also cut into 3 parts
axis=0
)
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
print(x_valid.shape, y_valid.shape)
Result
>>> (600, 5000) (600, 1)
>>> (200, 5000) (200, 1)
>>> (200, 5000) (200, 1)
```

# Advanced index (TF. Gather)

The numpy index is called the family indexing (if I remember correctly)

```
Data = TF. Constant ([6,7,8]) as real data
Index = TF. Constant ([2, 1, 0]) as index
print(tf.gather(data, index))
>> tf.Tensor([8 7 6], shape=(3,), dtype=int32)
```

# Sort (TF. Sort)

```
data = tf.constant([6, 7, 8])
print(tf.sort(data, direction='DESCENDING')) # 'ASCENDING'
#Default is ascending
Tf.argsort() (the same as above), but the returned index is the sorted index corresponding to the data
```

# Top-K(tf.math.top_k)

Find the largest n (better than sorting first and then slicing)

```
A = tf.math.top_k ([6,7,8], 2) ? find the two largest ones, and return an object
Print (a.indexes) ා fetch the two largest indexes ()
Print (a.values) ා take the two largest values
>> tf.Tensor([2 1], shape=(2,), dtype=int32)
>> tf.Tensor([8 7], shape=(2,), dtype=int32)
```

# Tf.gradienttape (custom derivation)

### Partial derivative

```
v1, v2 = tf.Variable(1.), tf.Variable(2.)
#Variables will be automatically detected and updated
c1, c2 = tf.constant(1.), tf.constant(2.)
#Constant does not automatically detect updates
y = lambda x1,x2: x1**2 + x2**2
with tf.GradientTape(persistent=True) as tape:
"" "by default, this tape will be deleted once it is used. Persistent = true means it exists permanently, but it needs to be released manually later." ""
#Because constants are not automatically detected, we need to call watch() manually to detect them
tape.watch(c1)
#If it's a variable, you don't need to watch
tape.watch(c2)
f = y(c1,c2)
#Call function, return result
c1_, c2_ = tape.gradient(f, [c1,c2])
#Parameter 2: passing several independent variables will return several partial derivative results
#C1 is the partial derivative of C1
#C2 is the partial derivative of C2
Del tape? Release tape manually
```

### Finding the second-order partial derivative (gradient nesting)

```
v1, v2 = tf.Variable(1.), tf.Variable(2.)
#We use variables
y = lambda x1,x2: x1**2 + x2**2
with tf.GradientTape(persistent=True) as tape2:
with tf.GradientTape(persistent=True) as tape1:
f = y(v1,v2)
once_grads = tape1.gradient(f, [v1, v2])
#First derivative
#The derivation of this list shows: take the first-order partial derivative to continue to find the second-order partial derivative (note, use tape2)
twice_grads = [tape2.gradient(once_grad, [v1,v2]) for once_grad in once_grads]
#Second order partial derivative
print(twice_grads)
del tape1
Release
del tape2
Release
```

### Explain

```
Derivative (an independent variable): tape1.gradient (F, V1)
#Gradient passes 1 independent variable
Derivation (multiple independent variables): tape1.gradient (F, [V1, V2])
#Gradient passes a list, and all the independent variables are filled in the list
```

### SGD (random gradient descent)

Mode 1: tear by hand (without optimizer)

```
v1, v2 = tf.Variable(1.), tf.Variable(2.)
#We use variables
y = lambda x1, x2: x1 ** 2 + x2 ** 2
#Bivariate quadratic equation
learning_rate = 0.1
Learning rate
For in range (30): number of iterations
with tf.GradientTape() as tape:
#Derivative scope
f = y(v1,v2)
d1, d2 = tape.gradient(f, [v1,v2])
#Derivative, D1 is the partial derivative of V1, D2 is the partial derivative of V2
v1.assign_sub(learning_rate * d1)
v2.assign_sub(learning_rate * d2)
print(v1)
print(v2)
Implementation process summary:
1. Partial derivative independent variable V1, v2 （d1, d2 = tape.gradient(f, [v1,v2])）
2. The attenuation of independent variables V1 and V2 is correlated partial derivative (attenuation value = learning rate * partial derivative)
3. We set up a large cycle (and set the number of iterations) for the first two steps, and repeat steps 1-2-1-2-1-2-1-2-1-2
```

Mode 2: use tensorflow optimizer to achieve gradient descent

```
v1, v2 = tf.Variable(1.), tf.Variable(2.)
#We use variables
y = lambda x1, x2: x1 ** 2 + x2 ** 2
#Bivariate quadratic function, which we usually use to calculate loss
learning_rate = 0.1
Learning rate
optimizer = keras.optimizers.SGD(learning_rate=learning_rate)
#Initialize optimizer
for _ in range(30):
#Number of iterations
with tf.GradientTape() as tape:
f = y(v1,v2)
d1, d2 = tape.gradient(f, [v1,v2])
#D1 is the partial derivative of V1, D2 is the partial derivative of V2
optimizer.apply_gradients(
#Notice that it's different here. We used to manually attenuate it
[
#Now, optimizer.sgd helps us with these things
(d1, v1),
#We just need to pass the partial derivative and the independent variable to it in this format
(d2, v2),
]
)
#Usually in this format, we use zip () to implement
# eg:
# model = keras.models.Sequential([......])
# .......
# grads = tape.gradient(f, [v1,v2])
# optimizer.apply_gradients(
# zip(grads, model.trainable_variables)
# )
print(v1)
print(v2)
Implementation process summary:
1. The partial derivative is obtained from the independent variable V1, V2 (D1, D2 = tape. Gradient (F, [V1, V2]))). This step is unchanged
2. Pass the partial and independent variables to optimizer. Apply ﹣ grades() optimizer. Sgd() to help us attenuate automatically.
3. We still set up a large cycle for the first two steps (and set the number of iterations), and repeat steps 1-2-1-2-1-2-1-2-1-2.
Note: if you use other optimizers such as Adam, there may be more complex formulas. If we tear them by hand, we will have some trouble.
At this time, we'd better use optimizer. Adam... And other finished products and optimizers. The general steps are as follows
1. Instantiate an optimizer object first
2. Instantiate the object. Apply? Grades ([(partial derivative, independent variable)])
```