# Notes on Andrew NG’s deep learning course: neural network, supervised learning and deep learning

Time：2020-2-12

Neural network, supervised learning and in-depth learning belong to the author’s deep learning specialization course notes series. This paper mainly describes the notes and code implementation of the author’s learning Andrew ng deep learning specialization series.

# Neural network, supervised learning and deep learning

Deep learning is gradually changing the world, from traditional Internet businesses such as Internet search and advertising recommendation to different industries such as health care and automatic driving. The electrical revolution a hundred years ago brought new pillar industries to the society, and now AI is the power foundation of the new era, driving the rapid development of social technology. The first part of this course focuses on how to build a neural network including a deep neural network and how to use data to train the network. At the end of the course, a deep neural network for identifying animals will be built. The second part of this course will explain more information about deep learning practice, including super parameter tuning, regularization, and how to select appropriate optimization algorithms from momentum armrest prop and ad authorization. In the third part, we will learn how to customize machine learning project, how to preprocess data, how to apply data to model training, and how to extract training set and verification set of cross validation. The fourth part will focus on the convolutional neural network CNN, how to build the classic CNN model. In the last part, we will learn to build sequence models (seq2seq, etc.) to solve the tasks related to natural language processing; typical sequence models include RNNs, LSTM, etc.

# neural network

Deep learning is often related to the training of large-scale neural networks. In this chapter, we will learn what neural networks are. Taking the classic house price prediction as an example, suppose we have six groups of house data, each group contains the area and price of the house; we hope to find a suitable function to predict the house price according to the size of the house. If we use linear regression to solve this problem, we will draw a`y = kx + b`Such function lines take the form of black lines as follows:

We know that the price of a house cannot be negative, so we can use the relu (corrected linear unit) function to describe the relationship between size and price, as shown in the blue line above. We can abstract the problem as the size X of the room input and the price y of the room output, and a neuron is the function that accepts the input and outputs the target value after the appropriate operation:

As shown in the figure above, it is the simplest unit neural network, while the complex neural network is composed of countless neurons connected and stacked in layers. For example, the actual house price will be affected by the size, number of bedrooms, zip code and the affluence of the community. Our ideal neural network can automatically help us build hidden units, that is, the relationship between input units to make the best prediction:

Given the input, one of the tasks of neural network is to automatically build the hidden layer for us; each hidden layer unit will input all the features of the input layer as the input value.

# Supervised learning

There are many kinds of neural networks, but up to now most valuable neural networks are based on the so-called supervised learning in machine learning. In supervised learning, the corresponding relationship between features and output is known in our training data set, and the goal is to find the correct representation of the relationship between input and output. For example, one of the most profitable in-depth learning applications at present, online advertising is to input information about the website display and some users, neural network will predict whether users will click on the advertisement; by showing different users the advertisements they are most interested in, to increase the actual click rate of users. The following table lists several common field applications and their input and output:

Computer vision has also developed rapidly in recent years. One of the typical applications is image annotation. We may randomly input a picture to find the most similar picture. Speech recognition is to be able to translate the speech data input by the user into the text representation; machine translation is to be able to translate the sentences between different languages freely, for example, to convert an English paragraph into the corresponding Chinese representation. In autopilot, we may input a picture of the car from the radar to determine the relative position of other cars on the road. For these different industries, we also need to apply different types of neural networks. For example, for the house price prediction mentioned above, we can use the standard neural network; for the picture application, we will give priority to the convolution neural network (CNN).

For sequential data, such as audio stream played over time, it can be expressed as one-dimensional time series, and we usually use RNN to process this type of data. In text processing, we often express text as character sequence, so we often use RNN to process this type of data. For more complex applications such as autonomous driving, we may need to process multiple categories of data such as pictures and texts at the same time, so we will use a hybrid network architecture.

Another group of common concepts in model training is structured data and unstructured data. Structured data is similar to the data stored in relational database. For example, in the house price prediction introduced above, we will have a data table containing the size, number of bedrooms and other columns, which is called structured data. Each feature in structured data, such as room size, number of bedrooms, and age of users, has an interpretable meaning. However, the typical representative of unstructured data, such as voice, text or picture, often uses pixel value or single word as the composition of feature vector, and these feature values are often difficult to have a practical meaning of interpretation. After a long time of evolution, human beings can better discriminate unstructured data, and now machines are constantly improving their ability to discriminate unstructured data by using deep learning technology.

# Deep learning

The theoretical basis and technical concepts behind deep learning have appeared for decades. In this part, we will discuss why deep learning did not develop explosively until recent years. We can use the following figure to describe the relationship between the size of data set and algorithm performance (accuracy, accuracy, etc.):

For classical machine learning algorithms such as support vector machine and logistics regression, their performance will continue to improve at the initial stage when the amount of data increases from zero; however, they will soon touch the ceiling, at this time, the performance is difficult to improve with the growth of data sets. With the advent of the era of mobile Internet, we can obtain massive data from websites, mobile applications or other sensors installed on electronic terminal devices; these data not only open the era of big data, but also provide a solid foundation for the development of deep learning. We can also see in the figure above that the larger the neural network is, the faster its performance will improve with the increase of data volume, and the higher its performance ceiling.
Another important cornerstone of the rise of deep learning is the improvement of computing power, which not only refers to the new generation of CPU or GPU equipment, but also the innovation in many basic optimization algorithms, which enables us to train neural networks faster. For example, in the early stage, we will use the SIGMOD function as the activation function of neural network. With the increase of X, its gradient will gradually approach to zero, which leads to the relatively slow convergence of the model. However, relu can avoid this problem better, and its gradient value remains constant at positive infinity. Simply moving from SIGMOD function to relu can improve the efficiency of model training greatly, which is also convenient for us to build more complex neural network.