Introductory deep learning? Here are five things you should know


By purva huilgol
Compile Flin
Source | analyticsvidhya

Start your deep learning career?

For novices, deep learning is a complex and daunting field. Concepts such as hidden layer, convolutional neural network and back propagation continue to appear when you try to master the topics of in-depth learning.

It’s not easy – especially if you’re on an unstructured learning path and don’t understand the basic concepts first. You will stagger in foreign cities like a tourist without a map!

The good news is that you don’t need an advanced degree or doctorate to study and master deep learning. However, before entering the deep learning world, you should understand (and master) some key concepts.

In this article, I will introduce five such basic concepts. I also suggest that you enrich your in-depth learning experience through the following resources:

The five basic elements for starting an in-depth learning journey are:

  1. Preparation system

  2. Python Programming

  3. Linear algebra and calculus

  4. probability statistics

  5. Key machine learning concepts

Let’s introduce them one by one.

1. Preparation system

To learn new skills (such as cooking), you first need to have all the equipment. You will need tools such as knives, cookers and, of course, gas stoves! You also need to know how to use these tools.

It’s also important to build your system, learn deeply, and understand the tools you need and how to use them.

Whether you are using windows, Linux or Mac, you must understand the basic commands. This is a convenient form for your reference:

This is a great tutorial to start using GIT and basic git commands:

The upsurge of deep learning has not only brought breakthrough research in the field of AI, but also broken the new barriers of computer hardware.

GPU (graphics processing unit):

For most deep learning projects, you will need a GPU to process image and video data. You can also build a deep learning model on a laptop / PC without GPU, but it will be very time-consuming. The main advantages that GPU must provide are:

  1. It allows parallel processing
  2. In the CPU + GPU combination, the CPU allocates complex tasks to the GPU and other tasks to itself, thus saving a lot of time

This is a wonderful video that explains the difference between GPU and CPU:

You do not need to purchase a GPU or install a GPU on your computer. There are a variety of cloud computing resources that can be provided free of charge or at a very low cost. In addition, some have pre installed some exercise datasets and pre loaded their own tutorial GPUs. Some of them are paperspace gradient, Google colab and kaggle kernels.

On the other hand, there are mature servers that require some installation steps and some custom functions, such as Amazon Web Services EC2.

The following table describes the options you have:

Deep learning also enables Google to develop its own type of processing unit, which is specially used to build neural network and deep learning task – TPU.


TPU or tensor processing unit is essentially a coprocessor used with CPU. TPU is cheaper than GPU, so it is much faster, so it is easy to build a deep learning model.

Google colab also provides a free TPU (not a full enterprise version, but a cloud version). This is Google’s own colab tutorial on using TPU and building models on it: colab notebooks | cloud TPU.

To sum up, this is the basic minimum hardware requirement to start building a deep learning model:

2. Python Programming

Continue to use the analogy of learning cooking. Now you have mastered the tips of operating knives and gas stoves. But what about the skills and recipes needed to actually cook food?

This is where we come across the software needed for deep learning. Python is a cross industry programming language for in-depth learning.

However, for the calculations and operations required for deep learning, we can’t just use python. Additional functionality is provided by libraries in Python. A library can have hundreds of gadgets called functions that we can use to program.

Although you don’t need to be an in-depth coding ninja, you do need to understand the basics of Python programming

In other words, instead of mastering the vast ocean of Python programming, it is better to learn some specific libraries dedicated to machine learning and data processing

Anaconda is a framework that helps you track Python versions and libraries. It is a convenient multi-functional tool, very popular, easy to use, and has simple documents. Here is how to install anaconda.

So what do I mean by Python basics? Let’s discuss it in more detail.

Note: you can start learning Python in our free course

1. Variables and data types in Python

The main data types in Python are:

  • Int: integer
  • Float: decimal
  • String: single character or character sequence
  • Bool: save 2 Boolean values – true and false

2. Operators in Python

There are five main operator types in Python:

  • Arithmetic operators: +, -, *, / etc
  • Comparison operators: for example, < =, > =, = ==
  • Logical operators: and, or, not
  • Identification operator: is, is not
  • Membership operators: in, not in

3. Data structure in Python

Python provides a variety of data sets that can be used for different purposes. Each data structure has its unique properties, and we can use them to store different types of data and data types. These attributes are:

  • Orderly: this means that the storage order of elements in the data structure is specific. This order will remain the same no matter how and when we use it (unless we explicitly change it)

  • Immutable: this means that the data structure cannot be changed. If the data structure is mutable, it means that it can be changed

In data science, the most commonly used data structures are:

  • Lists: orderly and variable

Example: we have a list like this:

my_list = [1、3、7、9]

This order will remain the same everywhere this list is used. In addition, we can change this list, such as deleting 7, adding 11, etc.

  • Tuple: similar to a list (ordered), but unlike a list, tuples are immutable

Example: tuples can be declared as:

my_tuple = ("apple", "banana", "cherry")

For now, this order will remain the same, but unlike the list, we cannot delete “cherry” or add “orange” to the tuple.

  • Sets: unordered and variable, although they can only hold unique values

Example: the collection uses the following curly braces:

my_set = {'apple', 'banana', 'cherry'}

No order is defined for the collection.

  • Dictionaries: a pair. Dictionaries are unordered and variable. This means that they are basically out of order and can be changed, but can be accessed through indexes or keys. A dictionary can only have unique keys, although keys do not necessarily have unique values.

Example: the dictionary also uses curly braces in key value format:

my_dict = { "brand": "Ford",  "model": "Mustang",  "year": 1964}

Here, “brand”, “model” and “year” are keys with values “Ford”, “Mustang” and “1964”, respectively. The order of keys can be different each time the dictionary is printed.

4. Control flow in Python

Control flow means control code execution flow. We execute the code line by line, and the content of one line will affect the way we write the next line of code:

Conditional statement

Set the condition through the condition operator we saw earlier.

  • if-else: what would you like to eat today? Hamburger or salad? If you want a healthier choice, you can choose salad, or if you just want a fast food and don’t care about calories, you can choose hamburger. This is the function of if else conditional statements

Example: you need to check whether students pass or fail. If his score is > = 40, he has passed; Otherwise, his grades will not pass.

In this case, our conditional statement will be:

if marks >= 40:

For loop: used to traverse the sequence. The sequence can represent a character sequence (string) or any of the above data structures, such as lists, sets, tuples, and dictionaries

Example: we have a list of values from 1 to 5. We need to multiply each value in this list by 3:

numbers_list = [1, 2, 3, 4, 5]

for each_number is numbers_list:
   print(each_number * 3)

Try the code snippet above and you’ll find how simple Python is!

Interestingly, unlike other programming languages, we don’t need to store the same type of variables in the data structure. We can have a list like this [John, 153, 78.5, “a +”] or even a list like [[“a”, 56], [“B”, 36.5]]. It is the diversity and flexibility of python that makes it so popular among data scientists!

You can also take advantage of the following free courses on the basics of Python and Pandas:

5.Pandas Python

This is one of the libraries you will encounter when starting machine learning and deep learning. Pandas is a very popular library, which is necessary for deep learning and machine learning.

We store data in a variety of formats, such as CSV (comma separated values) files, excel worksheets, etc. In order to process the data in these files, pandas provides a data structure called pandas data frame (you can consider it as a table).

The data frame and the large number of operations provided by pandas on the data frame make it the main library of machines and deep learning.

If you don’t have pandas, you can choose free easy courses:

Now, if you read the five things in the list we started to do, you may have a question: how will mathematics in deep learning be handled?

Well, let’s find out!

3. Linear algebra and calculus for deep learning

There is a common misconception that deep learning requires advanced knowledge of linear algebra and calculus. Well, let me eliminate this misunderstanding here.

You just need to remember your high school math to start the deep learning journey!

Let’s give a simple example. We have images of cats and dogs, and we want the machine to tell us which animal exists in any given image:

Now we can easily identify cats and dogs here. But how will the machine distinguish between the two? The only way is to provide the data to the model in digital form, which is where we need linear algebra. We basically convert images of cats and dogs into numbers. These numbers can be expressed as vectors or matrices.

We will introduce some key terms and some important resources from which you can learn.

Linear algebra of deep learning

1. Scalar and vector: Although a scalar has only amplitude, a vector has both direction and amplitude.

  • Dot product: the dot product of two vectors returns a scalar value
  • Cross product: the cross product of two vectors returns another vector orthogonal (right angle) to the two vectors

Example: if we have 2 vectors a = [1, – 3, 5] and B = [4, – 2, – 1], then:

a) Dot product:

a . b = (a1 * b1) + (a2 * b2) + (a3 * b3) = (1 * 4) + (-3 * -2) + (5 * 1) = 3

b) Cross product:

a X b = [c1, c2, c3] = [13, 21, 10]


c1 =(a2 * b3)-(a3 * b2)
c2 =(a3 * b1)-(a1 * b3)
c3 =(a1 * b2)-(a2 * b1)

2. Matrix and matrix operation: a matrix is an array of numbers in the form of rows and columns. For example, the image of the cat above can be written as a pixel matrix:

Just like numbers, we can add and subtract two matrices. However, operations such as multiplication and division are slightly different from conventional methods:

  • Scalar multiplication: when we multiply a single scalar value by a matrix, we multiply the scalar by all the elements in the matrix

  • Matrix multiplication: multiplying 2 matrices means calculating the dot product of rows and columns and creating a new matrix with a different size from the 2 input matrices

  • Transpose of matrix: we swap rows and columns in the matrix to get their transposes

  • Inverse matrix: conceptually, it is similar to the inverse number. The inverse of the matrix can be multiplied by the matrix to obtain an identity matrix

You can refer to this Khan Academy excellent course on linear algebra to understand the above concepts in detail. You can also examine 10 powerful applications of linear algebra here.

Deep Learning Calculus

The value we try to predict, such as “Y”, is whether the image is a cat or a dog. This value can be expressed as a function of the input variable / input vector. Our main purpose is to make this predicted value close to the actual value.

Now, imagine processing images of thousands of cats and dogs. These look really cute, but you can imagine that it’s not easy to deal with these images and numbers!

Because deep learning essentially involves a large amount of data and complex machine learning models, the use of them usually wastes time and resources. This is why it is important to optimize our deep learning model so that it can predict as accurately as possible without using too much resources and time.

This is the key of calculus in deep learning: optimization.

In any deep learning or machine learning model, we can express the output as a mathematical function of input variables. Therefore, we need to see how the output changes with each input variable. We need derivatives to do this because derivatives represent the rate of change.

Derivative and partial derivative: simply put, when we change the input value, the derivative measures the change in the output value. In mathematical terms:

If y = f(x), then the derivative of y with respect to x, id given as
dy/dx = change in y / change in x

Geometrically, if we represent f (x) as a graph, the derivative of the point is also the slope of the tangent of the point on the graph.

Here is a diagram to help you understand it:

The derivative we saw above involves only one variable x. However, in deep learning, the final output y may depend on hundreds of variables. In this case, we need to calculate the rate of change of Y for each input variable. This is where the partial derivative occurs.

partial derivative: basically, we only consider one variable and leave all other variables unchanged. Then, we use the remaining variables to calculate the derivative of Y. In this way, we can calculate the derivative of each variable.

Chain rule: in general, the function of Y may be much more complex depending on the input variable. So how do we calculate the derivative? Chain rules help us calculate the following:

If y = f(g(x)), where g(x) is a function of x, and f is a function of g(x), then
dy/dx = df/dx * dg/dx

Let’s consider a relatively simple example:

y = sin(x ^ 2)

Therefore, use the chain rule:

dy / dx = d(sin(x2))/ dx * d(x2)/ dx = cos(x2)* 2x

Learning resources for Deep Learning Calculus:

4. Probability statistics of deep learning

Like linear algebra, “statistics and probability” is its own new mathematical world. For beginners, this can be very daunting. Even experienced data scientists sometimes find it challenging to recall advanced statistical concepts.

However, it is undeniable that statistics is the backbone of machine learning and deep learning. The concepts of probability and Statistics (such as descriptive statistics and hypothesis testing) are very important in the industry. In this industry, the interpretability of deep learning model is the top priority.

Let’s start with the basic definition:

  • Statistics is the study of data

  • Descriptive statistics is the study of mathematical tools for describing and representing data

  • Probability measures the likelihood of an event

descriptive statistics

Let me give a simple example. Suppose you get a score of 1000 students in the entrance examination (the full score is 100). Someone asked you: How did the students do in this exam? Can you introduce the student’s score to that person? In the future, you may, but first you will say that the average score is 68. This is the average of the data.

Similarly, we can find simpler statements based on the data:

So far, just a few lines, we can say that most students do well, but not many do not score high in the test. This is descriptive statistics. We used only five values to represent the data of 1000 students.

Other key terms are also used in descriptive statistics, such as:

  • standard deviation
  • variance
  • Normal distribution
  • central limit theorem


Based on the same example, suppose you are asked a question: if I randomly select a student from these 1000 students, what is his / her chance of passing the exam? The concept of probability will help you answer this question. If you get a probability of 0.6, it means that the probability of his / her passing is 60% (assuming that the passing standard is 40 points).

Hypothesis testing and inferential statistics can be used to answer other questions about the same data (as follows):

  • Can the entrance examination be considered difficult?
  • Is the student’s high score the result of hard work or because the problems in the exam are easy?

You can learn all about statistics and probability from the following resources:

5. Key machine learning concepts of deep learning

That’s good news – you don’t need to know the full range of machine learning algorithms that exist today. Not that they don’t matter, but just from the perspective of starting deep learning, you don’t need to know much.

However, there are some concepts that are essential to building your foundation and getting familiar with yourself. Let’s review these concepts.

Supervised and unsupervised algorithms

  • Supervised learning: in these algorithms, we know the target variable (what we want to predict), and we know the input variable (which contributes to the independent characteristics of the target variable). Then, we generate an equation, give the relationship between the input variable and the target variable, and apply it to the data we have. Examples: KNN, SVM, linear regression, etc.

  • Unsupervised learning: in unsupervised learning, we do not know the target variables. It is mainly used to cluster data into groups, and after clustering data, we can identify groups. Examples of unsupervised learning include K-means clustering, a priori algorithms, etc.

Evaluation index

Building predictive models is not the only step required for deep learning. You need to check the quality of the model and constantly improve it until we reach the best model.

So, how do we judge the performance of the deep learning model? We use some evaluation indicators. According to the task, we have different evaluation indicators for regression and classification.

  • Classified evaluation indicators:

    • Confusion matrix

    • accuracy

    • Accuracy and recall

    • F1 score

    • AUC-ROC

    • Log loss

  • Regression evaluation index:

    • RMSE

    • RMSLE

    • R2 and adjusted R2

Evaluation indicators are very important in deep learning. Whether in research or industry, your in-depth learning model will be judged according to the value of evaluation indicators.

Verification technology

The deep learning model will train itself according to the data provided to it. However, as mentioned above, we need to improve this model and check its performance. The true power of the model can only be observed when we provide new data (although cleaned up).

But how can we improve the model? Whenever we want to change a parameter, do we give it new data? You can imagine how time-consuming and expensive such a task will be!

That’s why we use validation. We divide the whole data into three parts: training, verification and testing. This is a simple sentence that can help you remember:

We train the model on the training set, improve it on the verification set, and finally predict it on the test set that has not been seen so far.

Some common strategies of cross validation are: k-fold cross validation and leave one method cross validation (loocv).

This is a comprehensive article on validation techniques and how to implement validation techniques in Python: using cross validation to improve model performance (in Python / R)

gradient descent

Let’s go back to the calculus we saw earlier and the need for optimization. How do we know we have reached the best model? We can make some minor changes in the equation, and each time we change, we check whether it is close to the actual value.

This is a small step in the possible direction and the basic intuition behind the gradient descent. Gradient descent is one of the most important concepts you will encounter and often revisit in deep learning.

Explanation and implementation of gradient descent in Python: introduction to gradient descent algorithm (and variants) in machine learning.

linear model

What is the simplest equation you can think of? Let me list some:

  1. Y = x + 1
  2. 4x + 3y -2z = 56
  3. Y = x /(1-x)

Have you noticed one thing in common with these three functions? Yes, they are all linear functions. What if we can use these functions to predict the value of Y?

These are then called linear models. You will be surprised if you know how popular linear models are in the industry. They are not too complex and can be explained, and through the correct gradient descent, we can also get high evaluation indicators! Moreover, linear models form the basis of in-depth learning. For example, did you know that you can build a logistic regression model with a simple neural network?

Here is a detailed guide covering not only linear and logistic regression, but also other linear models: seven regression types and techniques in data science.

Over fitting and over fitting

You often encounter situations where your deep learning model performs well in the training set, but gives you poor accuracy in the validation set. This is because the model is learning each pattern from the training set, so it cannot detect these patterns in the validation set. This is called overfitting data, which makes the model too complex.

On the other hand, if your deep learning model does not perform well in both the training set and the verification set, it may not be suitable. When our data is actually nonlinear (complex), it can be regarded as applying a linear equation (an overly simple model) to our data:

A simple analogy between over fitting and under fitting is an example of a student in mathematics class:

  • Over fitting is related to the fact that the student learned all the questions discussed in the class by rote, but was unable to answer different questions related to the same concept during the exam

  • Under fit are those who do not perform well in class or in exams. Our target audience is the model / student who does not need to know all the questions discussed in class but does well in the exam to show that he / she knows the concept

Look at this intuitive explanation of over fitting and under fitting, and the comparison between them: over fitting and under fitting in machine learning.

Deviation variance

In the simplest terms, the deviation is the difference between the actual value and the predicted value. Variance is measured by the change of output when changing training data.

Let’s quickly summarize what can be explained in the figure above:

  1. Upper left: very accurate model, so the error of our model will be very low, which means that the deviation and deviation are small. All data points are suitable for bull’s-eye

  2. Upper right: the predicted data points are centered on the bull’s-eye (low variance), but they are also far away from each other (high deviation)

  3. Lower left: the predicted values are clustered together (low variance), but far from the bull’s-eye (high deviation)

  4. lower right: the predicted data points are neither close to the bull’s-eye (high deviation) nor close to each other (high variance)
    High deviation and high square error will increase the error. Generally, high deviation indicates insufficient fitting, while high variance indicates excessive fitting. It is very difficult to achieve both low bias and low variance – one usually at the cost of the other.

In terms of model complexity, we can use the following figure to determine the optimal complexity of the model:


Like the pandas library, there is another library that forms the basis of machine learning. Sklearn library is the most popular library in machine learning. It contains a large number of machine learning algorithms that you can apply to data in the form of functions.

In addition, sklearn even has functions for all evaluation indicators, cross validation and scaling / standardizing data.

This is a practical sklearn example:

from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error

regr = LinearRegression()  

#train your data - remember how we train the model on our train set?, y_train)

#predict on our validation set to improve it
y_pred = regr.predict(X_Valid)

#evaluation metrics: MSE
print('Mean Squared Error:', mean_squared_error(y_test, y_pred))
...#further improvement of our model

We can build a simple linear regression model with less than 10 lines of code!

Here are some good resources to learn more about sklearn:


In this article, we introduced five basic things to understand before building the first deep learning model. Here, you will encounter popular deep learning frameworks, such as pytorch and tensorflow. They are built in Python, and because you have a good grasp of python, you can now easily understand how to use them.

Here are some good articles on these frameworks:

Once you have built your foundation on these five pillars, you can explore more advanced concepts, such as hyperparametric adjustment, back propagation, etc. These are the concepts that I have accumulated in-depth learning knowledge.

Original link:

Welcome to panchuang AI blog:

Official Chinese document of sklearn machine learning:

Welcome to panchuang blog resources summary station: