An introduction to how to fine tune machines and deep learning models using the following techniques: random search, automatic hyper parameter tuning, and artificial neural network tuning.
The machine learning model consists of two different types of parameters
- Hyper parameter = is all the parameters that the user can set arbitrarily before starting training (for example, the estimator in random forest).
- Instead, they learn in the process of model trainingmodel parameter (for example, weights in neural networks, linear regression).
The model parameters define how to use the input data to obtain the desired output and learn during training. Instead, the hyperparameters first determine the structure of our model.
Machine learning model adjustment is an optimization problem. We have a set of hyperparameters, and our goal is to find the right combination of their values, which can help us find the minimum (e.g., loss) or maximum (e.g., precision) of a function.
This is especially important when comparing how different machine learning models perform on datasets. In fact, for example, it is unfair to compare SVM model with stochastic forest model which has not been optimized.
In this paper, the following methods of superparametric optimization are described:
- Manual search
- Random search
- Grid search
- Automatic super parameter adjustment (Bayesian optimization, genetic algorithm)
- Artificial neural network (ANN) adjustment
In order to demonstrate how to perform parametric optimization in Python, I decided toCredit card fraud detection kaggle data setPerform complete data analysis. The purpose of this paper is to correctly classify which credit card transactions should be marked as fraudulent or genuine (binary classification). The dataset was anonymized before distribution, so the meaning of most of the features has not been disclosed.
In this case, I decided to use only a subset of the dataset to speed up training time and ensure a perfect balance between two different classes. In addition, only a small number of functions are used to make the optimization task more challenging. The final dataset is shown in the following figure (Figure 2).
First of all, we need to divide the data set into training set and test set.
In this paper, we will use random forest classifier as a model to optimize.
Stochastic forest model is composed of a large number of unrelated decision trees, which together constitute a whole. In the random forest, each decision tree makes its own prediction, and selects the output of the whole model as the most common prediction.
Now we can start by calculating the accuracy of the basic model.
Using the random forest classifier with the default scikit learn parameter can achieve 95% overall accuracy. Now let’s see if some optimization techniques can be applied to improve accuracy.
When using manual search, we will select some model hyperparameters based on our judgment / experience. Then we train the model, evaluate its accuracy, and restart the process. Repeat the cycle until satisfactory accuracy is obtained.
The main parameters of random forest classifier are as follows:
- standard= a function used to evaluate the quality of segmentation.
- max_depth= the maximum number of levels allowed in each tree.
- max_features= the maximum number of features to consider when splitting nodes.
- min_samples_leaf= the minimum number of samples that can be stored in leaves.
- min_samples_split= the minimum number of samples in a node that causes the node to split.
- n_estimators= number of integration trees.
You can use scikit learnIn the documentFind out more about random forest parameters.
As an example of a manual search, I try to specify the estimators in the model. Unfortunately, this did not lead to an improvement in accuracy.
In the random search, we create a hyperparametric grid and train / test the model only based on some random combinations of these parameters. In this example, I also decided to perform cross validation on the training set.
When performing machine learning tasks, we usually divide the data set into training set and test set. This is done to test our model after training it (in this way we can check its performance when dealing with invisible data). When cross validation is used, we divide the training set into n other partitions to ensure that our model does not over fit our data.
One of the most commonly used cross validation methods is k-fold validation. In k-fold, we divide the training set into n partitions, then train the model iteratively with n-1 partitions, and test with the remaining partitions (we change the remaining partitions in each iteration). Once the model is trained n times, we can average the training results of each iteration, so as to obtain the overall training results.
It is very important to use cross validation in the implementation of superparametric optimization. In this way, we may avoid using some super parameters that are very effective for training data but not good for test data.
Now, we can start to implement random search by defining a hyperparametric grid_RandomizedSearchCV（）A kind of The superparametric mesh will be randomly sampled. For this example, I decided to divide the training set into four fold (_cv = 4_） , and select 80 as the number of combinations to sample (_n_iter = 80_）。 Then, use scikit learn_best_estimator_A kind of Attribute, we can retrieve the best set of parameters in the training process to test our model.
After training the model, we can visualize how changing some of its super parameters affects the accuracy of the overall model (Figure 4). In this case, I decided to see how the number of changes in estimates and criteria affected our random forest accuracy.
We can then make the visualization more interactive, which takes this step a step further. In the chart below, we can check (using the slider) the estimated min to be considered in the model_ Split and min_ How to change the number of estimators will affect the overall accuracy of the model.
Now we can use random search to evaluate the performance of the model. In this case, compared with our basic model, the use of random search will lead to continuous improvement in accuracy.
In grid search, we build a hyperparametric grid and train / test our model on every possible combination.
To select the parameters to use in grid search, we can now see which parameters work best with random search and grid them to see if we can find a better combination.
You can use scikit learn_GridSearchCV（）A kind of Function to implement grid search in Python. Also in this case, I decided to divide the training set into four fold (_cv = 4_）。
When you use grid search, all possible combinations of parameters in the grid are tried. In this case, 128000 combinations (2 × 10 × 4 × 4 × 4 × 4 × 10) will be used during training. In contrast, in the “combined” example, only 80 kinds of grid are used.
Compared with random search, grid search is slower, but it is more efficient in general because it can traverse the whole search space. Instead, random search can be faster and faster, but may miss some important points in the search space.
Automatic over parameter adjustment
When using automatic parameter adjustment, the following techniques will be used to identify the model parameters to be used: Bayesian optimization, gradient descent and evolutionary algorithm.
Bayesian optimization can be performed in Python using the hyperopt library. Bayesian optimization uses probability to find the minimum value of a function. The ultimate goal is to find the input value of the function, which can provide us with the lowest output value possible.
Bayesian optimization has been shown to be more efficient than random, grid or manual search. Therefore, Bayesian optimization can bring better performance and reduce optimization time in the test phase.
In hyperopt, Bayesian optimization can be realizedfmin（）Three main parameters are provided.
- objective function = defines the loss function to be minimized.
- Domain space= defines the range of input values to be tested (in Bayesian optimization, this space creates a probability distribution for each superparameter used).
- optimization algorithm = defines the search algorithm used to select the best input value to use in each new iteration.
In addition, you can also use the_In fmin()A kind of Defines the maximum number of evaluations to perform.
Bayesian optimization can select the input value by considering the past results, so as to reduce the number of search iterations. In this way, we can focus our search from the beginning on values closer to the desired output.
Now, we can use itfmin（）Function runs the Bayesian optimizer. First create aTrials（）Object for visualization laterfmin（）What is happening when the function is running (for example, how the loss function changes and how hyperparameters are used).
Now we can retrieve the best set of parameters identified and use the_optimumA kind of Dictionary to test the model. Some parameters are already indexed_withA kind of Digitally stored in_optimumA kind of Dictionary, so we need to first convert them back to strings and then input them into a random forest.
The classification report using Bayesian optimization is shown below.
Genetic algorithm attempts to apply natural selection mechanism to machine learning environment. Inspired by Darwin’s natural selection process, they are often called evolutionary algorithms.
Suppose we create n machine learning models with some predefined parameters. We can then calculate the accuracy of each model and decide to keep only half of the models (the best performing models). Now, we can generate offspring with hyper parameters similar to the optimal model, so as to obtain the population of N models again. At this point, we can recalculate the accuracy of each model and repeat the cycle over the defined generations. In this way, only the best model can survive at the end of the process.
To implement genetic algorithms in Python, we can useTpot automatic machine learning library。 Tpot is based on scikit learn library and can be used for regression or classification tasks.
The following code fragment shows the training report and optimal parameters determined using genetic algorithm.
The overall accuracy of our stochastic forest genetic algorithm optimization model is as follows.
Artificial neural network (ANN) adjustment
Using the keras classifier wrapper, you can apply grid search and random search to the deep learning model just like when using scikit learn machine learning model. In the following example, we will try to optimize some Ann parameters, such as how many neurons are used in each layer, and which activation function and optimizer to use.hereprovide了More examples of deep learning hyperparametric optimization.
The overall accuracy of scoring using our artificial neural network (ANN) can be seen below.
Now we can compare the performance of all the different optimization techniques in this given exercise. In general, random search and evolutionary algorithm are the best.
The results obtained are highly dependent on the selected grid space and the dataset used. Therefore, in different situations, different optimization technologies will perform better than other technologies.
If you like this article, please click like to forward! thank you.
Don’t leave after watching, there are still surprises!
I carefully collated the 2TB video courses and books related to computer / Python / machine learning / deep learning, worth 1W yuan. Focus on WeChat official account “computer and AI”, click on the menu below to get SkyDrive links.