Abstract:This paper USES Python code to complete the whole task of classifying the image of luan tchenghua without calling any data package, which is suitable for beginners to read and understand and to experience the general process of machine learning methods.
Have you tried the plant recognition apps launched by various companies? For example, Microsoft flower recognition, flower companion and other apps. When you see a flower whose scientific name you don’t know, you just need to open the plant recognition APP, take a photo of the plant you want to identify and upload it. The APP will automatically identify the species of the flower and introduce it in detail. In fact, the principle behind it is very simple. It is a process of image classification. The uploaded image can be matched with the pre-stored data set or network data in the mobile phone and classified into corresponding categories. With the application of deep learning method, the accuracy of image classification is getting higher and higher, which has surpassed the ability of human eyes in some data sets.
Compared with traditional neural network methods, deep learning methods generally have higher requirements on the size of data set and hardware platform. If you simply want to try to understand the basic process of image classification task, it is recommended to adopt small data set samples and traditional neural network methods. This article will lead the reader by Iris Data Set (Iris Data Set) to implement a classification task, the Iris Data Set is has a long history of the machine learning Data sets, than it is now commonly used digital handwriting Data Set (Mnist Data Set) Data Set much earlier, the Data Set is derived from the famous British statistician, biologist Ronald Fiser. In this paper, without the use of relevant software libraries, a neural network model for iris data was constructed from scratch to train it and obtain good results.
Iris data set is the most commonly used data set for testing machine learning algorithms. This data contains four characteristics, sepal length, sepal width, petal length and petal width, used for different species of iris (versicolor, virginica and setosa). In addition, there are 50 instances per species (rows of data). Let’s look at the distribution of sample data.
We will use neural networks to build the classification model on this dataset. For simplicity, petal length and petal width are used as characteristics, and there are only two species: versicolor and virginica. Let’s step through training the neural network for this sample dataset in Python:
Step 1: prepare the data set of iris
Import the Iris data set into python and subset the data to preserve the correlation between rows:
The blue dots represent the Versicolor species and the red dots represent the Virginica species. The neural network constructed in this paper will be trained on these data in order to correctly classify species.
Step 2: initialize parameters (weights and bias)
Let’s build a neural network with a single hidden layer. In addition, set the hidden layer size to 6:
Step 3: forward propagation
In the process of forward propagation, tanh activation function is used as the activation function of the first layer and sigmoid activation function as the activation function of the second layer:
Step 4: calculate cost function
The goal is to minimize the calculated cost function. In this paper, cross-entropy is adopted as the cost function:
Step 5: back propagation
The back-propagation process is mainly calculated by calculating the derivative of the cost function:
Step 6: update the parameters
Use the gradient calculated in the back propagation process to update the weight and bias:
Step 7: build a neural network
Combine all of the above functions to create a designed neural network model. In summary, the following is the overall order of the model functions:
1. Initialization parameters
2. Forward propagation
3. Calculate the cost function
4. Back propagation
5. Update parameters
Step 8: running model
Set the hidden layer node as 6, set the maximum number of iterations as 10,000, and print out the training results every 1,000 times:
Step 9: draw the classification boundary
As can be observed from the figure, only four points were misclassified. Although we can adjust the model to further improve the training accuracy of the model, these operations will obviously lead to overfitting.
Read the original
This article is the original content of yunqi community, shall not be reproduced without permission.