Notes on tensorflow 2 deep learning (I) tensorflow Foundation


This series of notes records the process of learning tensorflow2, mainly based on


First of all, it needs to be clear that tensorflow is a scientific computing library for deep learning algorithm, and the internal data is stored in theTensor objectAll operations (OPS) are also based on tensor objects.

data type

Fundamentals in tensorflowdata typeThere are three types, including numeric, string and Boolean.

Numerical type】It also includes: (in the middle of tensorflow, in order to express conveniently, scalar, vector and matrix are generally referred to as tensors. Without distinction, we need to judge by ourselves according to the dimension number and shape of tensors.)

1. Scalar

Dimension (also called rank) is 0, shape is []

a = tf.constant(1.2) create scalar

2. Vector

Dimension is 1, length is indefinite, shape is [⻕]

x = tf.constant([1,2.,3.3]) 

3. Matrix

Dimension number is 2, the length of each dimension is indefinite, and shape is [⻕, ⻔] (n rows and m columns)

b = tf.constant([[1,2],[3,4]]) 

4. Tensor

The arrays with dimension dim > 2 are called tensors.

C = TF. Constant ([[1,2], [3,4]], [[5,6], [7,8]]]) (definition of three-dimensional tensor)

Character stringType:

a = tf.constant(‘Hello, Deep Learning.’) 

In tf.strings module, common string type utility functions are provided, such as splicing join (), length (), splitting split (), etc., such as tf.strings.lower (a)


 a = tf.constant(True)

Numerical accuracy

The commonly used precision types are tf.int16, tf.int32, tf.int64, tf.float16, tf.float32, tf.float64, of which tf.float64 is tf.double

For most deep learning algorithms, tf.int32 and tf.float32 are generally used, which can meet the operation accuracy requirements. For some algorithms with high accuracy requirements, such as reinforcement learning, tf.int64 and tf.float64 can be used to save tensors.

By accessing the dtype member attribute of tensor, we can judge the saving precision of tensor: a = tf.constant (NP. PI, dtype = TF. Float16), print(a.dtype

Conversion precision: a = tf.cast (a, TF. Float32)

It is also legal to convert boolean type and integer. It is a common operation. Generally, 0 means false, 1 means true by default. In tensorflow, non-0 numbers are considered true

Tensor to be optimized

In order to distinguish the tensor that needs to calculate gradient information from the tensor that does not need to calculate gradient information, tensorflow adds a special data type to support the recording of gradient information: tf.variable. Tf.variable type adds attributes such as name and trace to the common tensor type to support the construction of calculation chart. Because gradient operation will consume a lot of computing resources and automatically update the relevant parameters, it is not necessary to package the unneeded optimized tensor, such as input X of neural network, with tf.variable; on the contrary, it is necessary to package the tensor, such as w and⻙of neural network layer, with tf.variable to facilitate tensorflow Track relevant gradient information. The TF. Variable() function can be used to convert ordinary tensors into tensors to be optimized.

a = tf.constant([-1, 0, 1, 2])
aa = tf.Variable(a), aa.trainable

(‘Variable:0’, True)


Creating a variable object is the default optimization enabled flag. You can set the variable = false to set that the tensor does not need optimization.

In addition to creating variables through normal tensor, you can also directly create: a = TF. Variable ([[1,2], [3,4]])

The tensor to be optimized can be regarded as a special type of ordinary tensor, which can also be temporarily added to the list of tracking gradient information through gradienttape. Watch().

Create tensor

1. Create from numpy, list object

adopttf.convert_to_tensorYou can create a new tensor and import the data saved in the python list object or the numpy array object into the new tensor: tf.convert_to_sensor ([1,2.]) | tf.convert_to_sensor (NP. Array ([[1,2.], [3,4]])

It should be noted that the floating-point array in numpy uses 64 bit precision to save data by default. When it is converted to tensor type, the precision is tf.float64, which can be converted to tf.float32 type when it is needed.

Tf.constant() and tf.convert_to_sensor() can automatically convert numpy array or Python list data type to tensor type

2. Create all 0, all 1 tensor

Considering the linear transformation 𝒚 = ⻎𝒙 + ⻙, the weight matrix W is initialized to the full 1 matrix, and the offset B is initialized to the full 0 vector. At this time, the output 𝒚 = 𝒙 of the linear change layer is a better initialization state of the layer. Through TF. Zeros() and TF. Ones(), we can create tensor with arbitrary shape of all 0 or all 1. For example, create scalar tensors of 0 and 1:
 tf.zeros([2,2])   |    tf.ones([3,2]) 

adopttf.zeros_like, tf.ones_likeIt is convenient to create a tensor that is consistent with a certain tensor shape and has all 0 or all 1 Contents: tf.zeros_like (a)

TF. * like is a convenient function, which can be realized by tf.zeros (a.shape), etc

3. Create a custom value tensor

tf.fill([2,2], 99) 

4. Create tensor of known distribution

Normal distribution(Normal Distribution, or Gaussian distribution) and even distribution(Uniform Distribution)It is one of the most common distributions, and it is very useful to create tensors sampled from these two distributions, such asIn the convolution neural network, the initialization of the convolution kernel tensor W into normal distribution is beneficial to the training of the network; in the confrontation generation network, the hidden variable Z is generally sampled from the uniform distribution.

adopttf.random.normal(shape, mean = 0.0, StdDev = 1.0) you can create shape as shape, mean value as mean, and standard deviation as normal distribution of StdDev.

adopttf.random.uniform(shape, minval = 0, maxval = none, dtype = TF. Float32) you can create evenly distributed tensors that are sampled from the [⻔⻕⻕⻕⻔⻕⻕⻕⻕⻕⻕⻕⻕⻕⻕⻕⻕⻕⻕⻕⻕⻕] interval.

5. Create sequence

 tf.range(start, limit,delta=1)
Can create[𝑠𝑡𝑎𝑟𝑡,𝑙𝑖𝑚𝑖𝑡)Series with delta steps, excluding limit itself:tf.range(1,10,delta=2) 

Typical application of tensor

1. scalar

In tensorflow, scalar is the easiest to understand. It is a simple number with dimension number of 0 and shape of []. One of the typical uses of scalar is the representation of error value and various measurement indexes, such as accuracy (ACC), precision (precision) and recall (recall), etc.

Out = tf.random.uniform ([4,10]) random analog network output


Y = TF. Constant ([2,3,2,0]) randomly construct real labels of samples

y = tf.one_hot(y, depth = 10) one hot coding

loss = tf.keras.losses.mse(y, out) calculate MSE of each sample

loss = tf.reduce_mean(loss) average MSE


2. vector

Vector is a very common data carrier. For example, in the full connection layer and the convolution neural network layer, the offset tensor⻙is represented by vector. As shown in the figure, an offset value is added to the output nodes of each full connection layer to express the offset of all output nodes in vector form: ⻙ = [⻓1, ⻓2] ⻓.


#Z = Wx, simulate the input Z of the activation function

z = tf.random.normal([4,2])

B = TF. Zeros ([2]) analog bias vector

Z = Z + B ා additive bias (up to now, ⻙with shape [4,2] and⻙tensor with shape [2] can be added directly, why? Let’s uncover the secrets in the broadcasting section.)


Through the network layer created by the high-level interface class dense(), the tensors w and⻙are stored inside the class, and are automatically created and managed by the class. Bias variables can be viewed through the bias member variables of the full connection layer

fc = tf.keras.layers.Dense(3) ා create a layer of Wx + B, and the output node is 3 (the original book is expressed as: FC = layers. Dense (3) ා create a layer of Wx + B, and the output node is 3, here the premise is:),4))#Create W and B tensors through build function, input node is 4

fc.bias#View offset

3. matrix

Matrix is also a very common type of tensor. For example, batch input⻏ = [⻓, ⻕⻐⻕], where⻓represents the number of input samples, i.e. batch size, ⻕⻐⻕represents the length of input characteristics. For example, if the feature length is 4, the input containing two samples can be expressed as a matrix: x = TF. Random. Normal ([2,4])

You can view the weight matrix w through the kernel member name of the full connection layer:

fc =tf.keras.layers.Dense(3)#The output node of the full connection layer is defined as 3 (originally expressed as layers. Dense (3)) (input_shape = (2,4)) ාdefine the input node of the full connection layer as 4


4. Three dimensional tensor

⻕] where⻓represents the number of sequence signals, sequence len represents the number of sampling points of sequence signals in the time dimension, and feature len represents the characteristic length of each point.

In order to facilitate the processing of strings by neural networks, words are generally encoded as vectors of fixed length through embedding layer, for example, “a” is encoded as a vector of a certain length 3, then two sentence sequences of equal length (the number of words is 5) can be expressed as shapes of [2,5, 3] 3-D tensor, where 2 represents the number of sentences, 5 represents the number of words, and 3 represents the length of word vector.

5. Four dimensional tensor

We only discuss 3 / 4-dimensional tensors here, and those larger than 4-dimensional tensors are seldom used, such as in meta-learning(meta learning)5-Dimensional tensor representation will be used in, and the understanding method is similar to 3 / 4-dimensional tensor.

4-D tensor is widely used in convolution neural network, which is used to save feature map(Feature maps)Data, generally defined as


Where⻓represents the number of inputs, H / W distribution represents the height and width of the feature map, ⻔represents the number of channels of the feature map, and some deep learning frameworks will also use the feature map tensor in the format of [⻓, ⻔, ℎ,], such as pytorch. Picture data is a kind of characteristic graph. For a color picture with three RGB channels, each picture contains h rows and W columns of pixel points. Each point needs three values to represent the color intensity of RGB channels, so a picture can be represented as [h, W, 3].

Dimension transformation

The basic dimension transformation includes changing view reshape, inserting new dimension expand ﹣ dims, deleting dimension square, exchanging dimension post, copying data tile, etc


tf.reshape(x, [2, – 1]) where the parameter – 1 indicates that the length of the current axis needs to be automatically derived according to the principle that the total elements of the view are constant, so as to facilitate the user’s writing.

2. Add / delete dimensions

Increase dimension

The data of a 28×28 gray-scale picture is saved as a tensor whose shape is [28,28]. Add a new dimension to the tensor at the end, which is defined as the channel number dimension. At this time, the shape of the tensor changes to [28,28,1]:

x = tf.random.uniform([28,28],maxval=10,dtype=tf.int32) 

x = tf.expand_dims(x,axis=0) 

Delete dimension

x = tf.squeeze(x, axis=0) 

If you do not specify the dimension parameter axis, tf.square (x), then it will delete all dimensions with a length of 1 by default

3. Exchange dimension

x = tf.random.normal([2,32,32,3])


shape=(2, 3, 32, 32)


4. Data replication

adopttf.tile(b, multiples=[2,1]) can be copied once in axis = 0 dimension, but not in axis = 1 dimension.


Broadcasting is also called broadcasting mechanism (automatic extension may be more appropriate). It is a lightweight tensor replication method, which logically extends the shape of tensor data, but only when it is needed can the actual storage replication operation be performed. For most scenarios, the broadcasting mechanism can avoid the actual replication of data and complete the logical operation by optimizing means, thus reducing a lot of computing costs compared with tf.tile function.

A = tf.random.normal([32,1])
tf.broadcast_to(A, [2,32,32,3])


Mathematical operations

1. Addition, subtraction, multiplication and division

Addition, subtraction, multiplication and division are the most basic mathematical operations, which are realized by tf.add, tf.subtract, tf.multiply, tf.divide functions respectivelyAt present, tensorflow has been overloaded + − */Operators, it is generally recommended to use operators directly to complete addition, subtraction, multiplication and division. Integer division and remainder division are also common operations, which are implemented by / / and% operators respectively.

2. power

Through tf.pow (x, a), we can easily complete the operation of 𝑦 = ⻟⻒⻒⻒⻟⻟⻟⻒⻒⻟⻒⻟⻟⻒⻟⻒⻒⻕count

For common square and square root operations, tf.square (x) and tf.sqrt (x) can be used.

3. Index and logarithm

Through tf.pow (a, x) or * * operator, the exponential operation can be easily realized𝑥In particular, for the natural index⻕⻟, it can be realized by tf.exp (x)

Natural logarithm log𝑒⻟can be realized by tf.math.log (x). If you want to calculate the logarithm of other base numbers, you can use the formula for changing the base of logarithm

4. Matrix multiplication

Square by @ operatorThe matrix multiplication can also be realized by tf.matmul (a, b)

Matrix in tensorflowYou can use batch method to multiply, that is, the dimension number of tensor A and B can be greater than 2. When the tensor a, B dimension degree is greater than 2In tensorflow, the last two dimensions of a and B are selected for matrix multiplication. All the previous dimensions are considered as batch dimensionsDegree.

Matrix multiplication function supports automatic broadcasting mechanism

a = tf.random.normal([4,28,32])
b = tf.random.normal([32,16])