[Collection] 6000 words of artificial intelligence popular science, high school students can understand – Jinkey original


0 Introduction

This paper is a reading note of Basic Artificial Intelligence (High School Edition). This book has very good pictures. It graphics the difficult concepts. So after reading, I will use the pictures in the book (the copyright of the pictures belongs to Shangtang Science and Technology).

Some of the concepts in the book are still rather obscure. The readers can’t understand the subtle differences between some concepts and methods from the perspective of Xiaobai. So I use my own understanding and erase some difficult details to show them in a more easy-to-understand form.

1 Overview of Artificial Intelligence

1.1 brief history

1.2 Application Areas


  1. Real-time detection of pedestrians and vehicles from video.
  2. Automatically locate abnormal behavior in the video (for example, drunk pedestrians or retrograde vehicles) and send out alarms with specific location information in time.
  3. Automatic judgment of crowd density and direction of flow, early detection of potential dangers caused by overcrowds, help staff guide and manage flow.

Medical care

  1. The technology of automatic analysis of medical images. These techniques can automatically find the key parts of medical images and make comparative analysis.
  2. Reconstructing the three-dimensional model of human internal organs through multiple medical images to help doctors design the operation and ensure the operation
  3. Provide health advice and disease risk warning for each of us, so that we can live a healthier life.

Intelligent customer service

Intelligent customer service can communicate with customers like people. It can understand the customer’s questions, analyze the meaning of the questions (such as whether the customer is asking the price or the function of the consulting product), and respond accurately, appropriately and individually.


At present, automobile can perceive the driving environment in real time through a variety of sensors, including video camera, lidar, satellite positioning system (Beidou satellite navigation system BD) S, global positioning system GPS, etc. Intelligent driving system can synthetically analyze a variety of perceptual signals, and real-time plan driving routes by combining maps and indication signs (such as traffic lights and road signs), and issue instructions to control the operation of the car.

Industrial manufacture

Help factories detect defects in different shapes automatically

1.3 concept

What is AI?
Artificial intelligence is a technology that simulates human cognitive ability through machines.

The three training methods of AI are as follows:Supervised learningUnsupervised LearningReinforcement learning。 The following will be introduced one by one.

2 Is this a balsam (classifier)

2.1 Feature Extraction

Human sensory characteristics
Number and color of petals

Artificial design features
Firstly, determine which features, and then convert them into specific values through measurement.

Deep learning characteristics
I won’t mention it here. I’ll talk about it later.

2.2 Sensor

The teacher gave a question:

To distinguish two kinds of balsam flowers, you have to draw a straight line to distinguish two kinds of flowers. You can draw countless straight lines, but which one is the best?

What shall I do? I am a learning slag.Depend on ignorance

  1. Find three numbers a = 0.5, B = 1.0, c=-2 and bring y = ax [1] + BX [2] + C.
  2. Two characteristics of each flower are also substituted for X [1], x [2], for example, y [prediction] = 1 is obtained by introducing (4, 1). At this time y [reality] = 1 (sample set color-changing dactyls to be 1, dactyls to be – 1), so y [reality] – y [prediction] = 0.
  3. Repeat the above two steps to get all the “synthesis of the gap between actual and predicted values” as Loss 1

  1. But how do you know if it’s the best line? Keep guessing! Continue to be ignorant! Just guess like the World Cup.
  2. By gradient along y = ax [1] + BX [2] + C (gradient is derivative, high school students!) Continue to guess the number in the direction of descent. The specific process is probably as follows:

[Collection] 6000 words of artificial intelligence popular science, high school students can understand - Jinkey original

The above belongs toThe gap between actual and predicted valuesIn fact, it’s a kind ofloss functionThere are other loss functions, such as the linear distance formula between two points, the cosine similarity formula and so on, which can calculate the difference between the predicted results and the actual results.

Focus: Loss function is the gap between reality and ideal (cruel)

2.3 Support Vector Machine

Method Difference
perceptron Line guessing is based on the fact that the difference between all predicted points and actual points is the smallest.
support vector machine Line guessing is based on the minimum distance from all points to a straight line.

* The difference of judgment basis also leads to the difference of loss function (but still guess)

Intuitively speaking, the bigger the gap, the better (old driver shut up!)

More than 2.4 Classifications

What if there are many kinds of flowers? In a botanical class, the teacher invited peony identification experts, lotus identification experts, plum identification experts.
Teacher took out a dish of flowers for each expert to identify, the probability of peony role is 0.013, the probability of lotus expert role is 0.265, the probability of plum expert role is 0.722. After synthesizing the opinions of the experts, the teacher told the students that it was a dish of plum blossoms.

Xiao Ming: Is the teacher silly? I don’t know what a flower is. I need three experts.
Teacher: get out of here for me.

The actual calculation process is to use the two classifiers trained by 2.2 and 2.3 methods to output corresponding classification values (for example, the classifiers of three flowers output – 1, 2, 3 respectively). How can these classification values be transformed into probabilities? That’s what we need.Normalized exponential function Softmax(If it’s a binary category, useSigmoid functionIn this paper, we will not take the formula as an example, we can intuitively look at the table in the book to understand:

2.5 Unsupervised Learning

2.2, judging from the difference between the predicted value and the actual value, it is because the biology teacher has told the learning slag which samples are the tail flowers of the mountain Luan and the color changing Luan tail flowers. But if the teacher does not tell the actual sample category, he will not know what the samples are.

So what should we do?

The introductory course of machine learning always talks about the Bluetail flower, which is boring enough. Here we change the scene:

If you are a live broadcaster looking for a bunch of small anchors, you have a bunch of candidates, but you only have their chest and hip data. A bunch of eight resumes are in front of you. You don’t know which ones are more capable and attract more fans. You don’t have time for all the interviews, so how do you choose?

  1. At this time, you will standard their chest and hip circumference on a two-dimensional coordinate map:

  1. This is a random stroke of your hand, dividing them into two groups, can be said to be “grouped into two categories”.
  2. Find the center of the cluster by some way of calculation (such as average).The closer the points are to the cluster center, the more similar the representations are.

  1. Find out the distance from the point in each cluster to the center of blue cluster and the center of yellow cluster.
  2. If a point is nearer to the Yellow clustering center and you randomly mark it as a blue grouping (the small square marked with a red border in the picture above), then you mark it as a yellow grouping.
  3. At this time, because the scope of the grouping and which ladies and sisters are included in the grouping have changed. At this point, you need to recalculate the center of the cluster in Step 3.

  1. Repeat step 4 (calculating center distance) – > repeat step 5 (adjusting the yellow and blue ladies) – > repeat step 3 (calculating center), and continue the process until the blue and yellow clustering contains the little sister does not change. So stop the cycle.
  2. So far, the ladies and sisters have been divided into two categories. You can get two kinds of ladies and sisters:

In the absence of supervision, the computer has successfully divided the miss and sister into two categories. Next, we can put 2 kinds of 2 anchors into the platform to see who is more competent. The effect is better, and then more capable hosts will be expanded with the clustering sample features.

Xiao Ming: and, what’s so great? I can see from a glance that yellow little sisters are more capable.
Teacher: get out of here for me.

The above algorithm for clustering Miss and Sister is calledK-proximity algorithmK is the number of clusters to be clustered (which needs to be specified manually). The above example K = 2. Then if it is divided into three categories, K = 3, the training process can be seen in the following figure, with an intuitive understanding:

[Collection] 6000 words of artificial intelligence popular science, high school students can understand - Jinkey original

3 what is this item (image recognition)?

3.1 Feature Extraction

Human sensory characteristics
Petal color, petal length, wings (distinguishing cats from birds), mouth and eyes (aircraft and birds)

kitten Little bird aircraft automobile
Feature 1: Whether there are wings or not no yes yes no
Feature 2: Do you have eyes? yes yes no no

Artificial design features
Sensory features are quantified to obtain numerical features of color (RGB), edge (rounded, right, triangle), texture (wave, line, grid).

Characteristics of in-depth learning
Extracting Image Features by Convolution

Key points: convolution function is to extract useful information from images, such as compressing the pictures you send out and reducing the size, but you can still distinguish the main contents of the images.

1 dimensional convolution 15+24+33=22、14+23+32=16、13+22+3*1=10

2 dimensional convolution 12+30+24+42=28…

Through convolution, we can get the feature information of the image, such as edge.

3.2 the difference between deep learning and traditional pattern classification.

Since there is a traditional pattern classification, why do we need a neural network?

The difference is that traditional pattern classification requires artificial features, such as petal length, color and so on. While learning the steps of omitting the characteristics of artificial design, the convolution operation is automatically extracted, and the training of classifier is also integrated into the neural network, thus realizing the learning of end-to-end.

Emphasis: End to End (End to End) is to get the output directly from the input. There is no middleman and he earns the difference.

Problems of 3.3 Deep (Multi) Layer Neural Networks

Generally speaking, more layers of neural network will improve the accuracy. However, the deepening of network layers leads to:

Over fitting
Learn to recite the answers to the questions in the college entrance examination, and do not understand them. If the test questions are returned by the examinees, the candidates will be able to answer correctly. We can say that “learning over” has predicted the test questions.

Correspondingly:Under fitting
People who can’t get rid of slag can not even predict the test questions. Even if the exam questions are exactly the same as those predicted, they can only answer 30% correctly. So you can say this kind of person.need a spankingUnder fitting.

If you are interested, you can learn more about it.
Gradient dispersion and gradient explosion
Here is a formula that is very popular and inspiring on the Internet. Weights are multiplied in multi-layer networks. For example, the weights of each layer are 0.01, and the weights of each layer are transferred to 100 layers.The 100 th power of 0.01It becomes very small, and learning will become very slow during Gradient Descent’s gradient descent. (It’s like putting a small ball down from the top of a bowl and wandering slower and slower at the bottom.

Non convex optimization
The learning process may stop at the local minimum because the gradient (slope) is zero. In the case of local minimum stop rather than global minimum stop, the learning model is not accurate enough.

Look at the picture and feel it.

The bottom of your story is not the bottom. What top are you talking about?


Uniform Initialization, Batch Normalization and Shortcut involve comparative majority logic, which is not explained here.

3.4 application

Face recognition

Cut the pictures taken on the top of the car into small blocks, each of which detects whether the object is a car or a pedestrian or a dog, a red light or a green light, and identifies various traffic signs, etc. The distance of the object can be judged by radar and so on.

4 what is this song (speech recognition)?

4.1 Feature Extraction

Human sensory characteristics
Volume, tone, timbre

adoptsamplingQuantificationCode。 Acoustic Digitalization (Acoustic to Electrical Signal)

Artificial design features
Mel frequency has high resolution at low frequency and low resolution at high frequency (which is similar to that of human ear, that is, people are sensitive to low frequency sound and insensitive to high-frequency sound in a certain frequency range). The relationship is:

The average value of the spectrum in each frequency range represents the sound energy in each frequency range. There are 26 frequency ranges, so the 26-dimensional characteristics can be obtained. After cepstrum operation, the 13-dimensional cepstrum is obtained.Mel frequency cepstral coefficient (Mel-FrequencyCepstralCoefficients, MFCCs)

Deep learning characteristics
Feature extraction by 1-D convolution introduced in 3.1

4.2 application

Classification of Musical Style

Input: Audio file
Characteristics: Sound characteristics
Output: Music type

Phonetic transcription

Input: Audio file
Characteristics: Sound characteristics
Output: acoustic model (for example, 26 English letters).

The acoustic model is then fed into another learner.

Input: Acoustic model
Features: Semantics and Vocabulary
Output: Smooth statement (see point 6, how to make the computer output smoothly statement)

Listening to music and music
Through window scanning (dividing the music into small segments), and then extracting the features of this segment through the method of 4.1, we can get a feature vector. Do the same operation to the database song and the user recorded song to get the eigenvector, and then calculate the similarity between the two (the distance between the two vectors can be used)Cosine formula to calculate the angleperhapsFormula of distance between two pointsCalculate)

5 what are people doing in video? (video understanding, action recognition)

5.1 introduction

Video, in essence, consists of a continuous frame of pictures because of human vision.Temporary effect(Persistence of vision, when the human eye observes a scene, the light signal enters the brain nerve and does not disappear immediately, creating the impression that the picture is continuous). It seems to be continuous, that is, video.
Identify objects in the video, and use the image recognition and classification methods mentioned above to analyze real-time single frame images, such as:

But video has a more important attribute than image:Action (Behavior)

How to analyze actions from a continuous video?

For example, in the case of Erha above, the pixels in the legs move around relative to the yellow box (frame and dog are relatively static). Here we introduce the concept of “movement”: optical flow (a pixel moves from one location to another), which is formed by the movement of pixels as the training feature of the neural network (X), “running” as training. The target value (Y), after many iterations of training, the machine can be fitted to a Y = f (X) to determine whether the object in the video is running.

5.2 optical flow

Suppose that
1) The motion of objects in two adjacent frames is very small.
2) The color of objects in two adjacent frames remains almost unchanged.

As for how the neural network tracks a certain pixel, there is no explanation here.

The point at the time t points to the point at the time t + 1, which is the optical flow of the point. It is a two-dimensional vector.

This is the flow of light throughout the picture:

The optical flow (trajectory) of the entire video is like this

Different dashed lines represent the trajectory of a point moving on the image

Assuming Video WidthwidthHighheightA total ofmFrame, then the video can be usedwidth * height * m * 2The tensor (that is, the three-dimensional matrix) is expressed, and the vector is fed to the neural network for classification training.

Further optimization can simplify the optical flow into eight directions, add all the optical flow of a video frame to these eight directions to get the optical flow histogram of a frame, and further get the eigenvectors of 8 dimensions.

6 what does a text express? (Natural Language Processing)

6.1 Feature Extraction

number sentence classification
1 It is scientifically proved that swimming is beneficial to physical development. Sports
2 Fu Yuanhui won the gold medal in the Olympic swimming competition. Sports
3 Excellent reading is a very useful knowledge management application. tool
4 An article explains the application of impression notes in knowledge management. tool

Here are four sentences. First, participle:

number sentence
1 It is scientifically proved that swimming is beneficial to physical development.
2 Fu Yuanhui won the gold medal in the Olympic swimming competition.
3 Excellent reading is a very useful knowledge management application.
4 An article explains the application of impression notes in knowledge management.

Remove stop words (adverbs, prepositions, punctuation coincidence, etc.) and generally have a stop word list in text processing.

number sentence
1 Scientific proof that swimming is good for physical development
2 Fu Yuanhui won the gold medal in the Olympic swimming competition
3 Application of Knowledge Management for Good Reading and Good Use
4 This paper explains the application of Impression Notebook Knowledge Management

Coded vocabulary

Sentence Vectorization

In this way, we get a 19-dimensional feature vector of a sentence, and then use the 19-dimensional feature vector as X-reading (feeding it), text classification (such as positive and negative) as training label value Y, using ordinary convolution network or LSTM cyclic neural network. The model obtained by iterative training can be used for emotional analysis or text classification tasks.

6.2 advanced

Word oriented quantification
Fierce – bull, computer – computer is synonymous. From the above steps, we may think that “fierce” and “bull” are two completely different words, but in fact, they are similar. How can AI learn to know this? We need to further enrich the connotation of words from multiple dimensions, such as:

For example, men say 1, women use 0, 0.5 without sex. When the dimensions are expanded, the eigenvectors of the word “man” are obtained (1, 0, 0.5, 0, 1).

Reverse Document Frequency
The more a word appears in one type of article, the less it appears in another category, the more it can represent the classification of this article.
For example, swimming appears more in sports articles (2 times), but less in instrumental articles (0 times), which is more representative of sports articles than other words (1 time).

Assuming that there are N words in a sentence, and that the number of occurrences of a word is T, there are X sentences. If the word appears in W sentences, the reverse document frequency TF-IDF isT/N * log(X/W)

6.3 application

7 Let the computer draw pictures (generate confrontation networks)

Once upon a time, there was a man who made money selling paintings by copying famous artists. He began to copy a famous painting:

For the first time, he painted like this:

The connoisseur could see at a glance that it was false. He had to go back and draw the second picture and the third picture.

After 100,000 times of “painting – identification” process, the copyist painted the painting, but the appreciator thought it was the real original, and bought the painting at a high price.

suchGeneration (Painting) - Identification (Counterfeiting)The pattern is the core of GAN.

Through the generator, the random pixels are arranged orderly to form meaningful pictures, and then the difference between the classification of the generated images and the real picture is obtained through the discriminator, and the direction for the generator to be optimized is pointed out. After many rounds of training, the generator learns to draw the true picture.

How does a computer turn random pixels into meaningful pictures? Let’s look at a simplified example.

Some evenly distributed points in a straight line pass throughy=2x+1After transformation, it becomes non-uniform distribution. A randomly arranged picture of pixels passing through af(x)After transformation, it becomes a meaningful picture, and the generator keeps approximating it.f(x)Like 2.2 perceptron fitting a straight line.

Focus: Functions can transform data distribution (Cook says: straight can be turned into curved)

8 How does AlphaGo play chess? (Intensive learning)

8.1 Rough Cognition

Supervised/unsupervised training: Make every task as right as possible
Reinforcement Learning: Does Multiple Tasks Achieve Final Goals

Every task is accurate, is not it possible to achieve the ultimate goal? Let’s take an example:

The boss of a wholesale store asked her manager to increase sales. He instructed his salesmen to sell more radios. One of the salesmen was able to get a large profit list, but then the company could not deliver the receiver because of the shortage of supply. Who should be blamed? From a point of view, the behavior of the company disgraced the company. But from Bill’s point of view, Charles successfully completed his sales task, while Bill also increased sales. —— Mental Society Chapter 7.7

8.2 AlphaGo

The oldest way to play Go is to traverse the decision tree from the upper left corner to the lower right corner. Every empty position is a branch. Then we can predict the probability of winning each game and find out the most probable way to play. This is the Fall Predictor.

However, due to the huge chessboard of Weiqi 19X19, the space complexity is as high as 10 to 360, it is almost impossible to exhaust all walking methods, such as looking for needles in a haystack.

To reduce the complexity, the key is to reduce the breadth and depth of search.

When we grow a small potted plant, if we don’t prune the branches and leaves, the nutrients will be wasted on the Ungrown branches. The withered or abnormal branches need to be pruned in time to ensure that nutrients are transported to the normal (or the direction we want it to grow) branches.

In the same way, if the limited computer power is wasted on exhausting all the Go moves, it will lead to a very slow game deduction, and it will take a lot of time to find the best plan.

Can we speed up the selection of better drop-offs by pruning the huge decision tree of drop-offs selector? How to judge which are good “branches” and which are bad “branches”? This requires a chess board value assessor (which chessboard has a higher probability of winning), which removes the worthless chessboard first and no longer traverses downwards, which reduces the breadth and depth of search.

Among them,
The Fall Predictor has a name called Fall Predictor.Policy Network
Value assessor has a name calledValue network
Policy Network UtilizationMonte Carlo Search TreeFrom the current game deduction (random chess) to the final game, the final win will return positive, and vice versa. After that, the algorithm traces back step by step along the falling scheme of the game process, and increases the score of the falling scheme selected by the winner on the path, which correspondingly reduces the score of the falling scheme of the loser, so the probability of selecting the winning scheme will increase when the same situation is encountered later. So it can speed up the selection of fallers, which is calledFast Walking Subnetwork

adoptPolicy Network+Value Network+Monte Carlo Search TreeAt the same time, the two robots play games with each other, so that the network can be continuously trained and the scheme can be learned.

8.3 definition

Next, let’s talk about the definition of boredom.

What is reinforcement learning?

When weWhat matters is not whether a judgment is accurate, but whether the action process can bring the greatest benefits.Reinforcement learning is often used. For example, in chess, stock trading or business decision-making scenarios.

The goal of reinforcement learning is to get onePolicy to guide action
For example, in the go game, this strategy can guide each step in the light of the disk situation; in stock trading, this strategy will tell us when to buy and when to sell.

A reinforcement learning model generally consists of the following parts:

A set of dynamic states (sute)

Distribution of black and white pieces on Go board
For stock trading, it’s the price of the stock.

A set of optional actions

For Go, it is the position where the ball can fall.
For stock trading, it is the number of stocks bought or sold at each time point.

An environment that interacts with decision-making agents
This environment determines how the state changes after each action.

The falling pieces of a chess player (subject) will affect the chess game (environment), and the environment will reward (win) or punish (lose) the player.
The buying or selling of the manipulator (subject) will affect the stock price (environment, supply and demand determine price), and the environment will reward (make money) or punish (lose money) the subject.

Reward rule
When the decision-maker changes the state through action, it will be rewarded or punished (the return is negative).

“Artificial Intelligence Foundation High School Edition” this book, have time to suggest that readers can read by themselves, book links

Link https://jinkey.ai/post/tech/5…
The author of this article is Jinkey (public address jinkey-love, official website https://jinkey.ai).
If the article permits non-tampering with the signature to reproduce, deleting or modifying the copyright information reproduced in this paragraph, it will be regarded as infringement of intellectual property rights. We reserve the right to pursue your legal responsibility, hereby declare!

Recommended Today

Nltk natural language processing library

Natural language processing, usually referred to as NLP, is a branch of artificial intelligence, dealing with the interaction between computers and people using natural language. The ultimate goal of NLP is to read, interpret, understand and understand human language in a valuable way. Most NLP technologies rely on machine learning to extract meaning from human […]