A super parameter optimization technique hyperopt

Time:2021-5-3

By guest blog
Compile VK
Source: analytics vidhya

introduce

In a machine learning project, you need to follow a series of steps until you reach your goal. One of the steps you must perform is to optimize the super parameters of the model you choose. This task is always completed after the model selection process (selecting the best model with better performance than other models).

What is superparametric optimization?

Before you define superparametric optimization, you need to know what superparameters are. In short, super parameters are different parameters used to control the learning process, which have a significant impact on the performance of machine learning model.

The example of super parameter in random forest algorithm is the number of estimators(n_estimators), maximum depth(max_depth)And guidelines. These parameters are adjustable, which can directly affect the quality of the training model.

Super parameter optimization is to find the appropriate combination of super parameter values in order to achieve the maximum performance of data in a reasonable time. It plays an important role in the prediction accuracy of machine learning algorithm. Therefore, superparametric optimization is considered to be the most difficult part of machine learning model.

Most machine learning algorithms have default super parameter values. Default values don’t always perform well in different types of machine learning projects, which is why you need to optimize them for the right combination of optimal performance.

Good super parameters can make an algorithm shine.

There are some common strategies to optimize super parameters

(a) Grid search

This is a widely used traditional method, which determines the optimal value of a given model by performing super parameter adjustment. Grid search works by trying all possible combinations of parameters in the model, which means that it takes a lot of time to perform the whole search, which can lead to very high computational costs.

Note: you can learn how to implement grid search herehttps://github.com/Davisy/Hyperparameter-Optimization-Techniques/blob/master/GridSearchCV .ipynb

(b) Random search

When the random combination of super parameter values is used to find the best solution for the constructed model, this method works differently. The disadvantage of random search is that sometimes important points (values) in the search space are missed.

Note: you can learn more about random search methods herehttps://github.com/Davisy/Hyperparameter-Optimization-Techniques/blob/master/RandomizedSearchCV.ipynb

Super parameter optimization technology

In this series of articles, I will introduce you to different advanced superparametric optimization techniques / methods, which can help you get the best parameters for a given model. We will study the following techniques.

  • Hyperopt
  • Scikit Optimize
  • Optuna

In this article, I’ll focus on the implementation of hyperopt.

What is hyperopt

Hyperopt is a powerful Python library for hyper parameter optimization, developed by James Bergstra. Hyperopt uses Bayesian optimization to adjust parameters, allowing you to get the best parameters for a given model. It can optimize the model with hundreds of parameters in a wide range.

Characteristics of hyperopt

Hyperopt contains four important features that you need to know in order to run your first optimization.

(a) Search space

Hyperopt has different functions to specify the range of input parameters. These are random search spaces. Select the most common search options:

  • hp.choice(label, options)-This can be used for classification parameters, which return one of the options, which should be a list or tuple. Example: HP. Choice (“criterion”, [“Gini”, “entropy”,])
  • hp.randint(label, upper)-Can be used for integer parameters, which return random integers in the range (0, upper). Example: hp.randint (“Max_ features”,50)
  • hp.uniform(label, low, high)-It returns a value between low and high. Example: hp.uniform (“Max_ leaf_ nodes”,1,10)

Other options you can use include:

  • hp.normal(label, mu, sigma)-This returns an actual value that follows a normal distribution with a mean of Mu and a standard deviation of sigma
  • hp.qnormal(label, mu, sigma, q)-Returns a value similar to round (normal (mu, sigma) / Q) * Q
  • hp.lognormal(label, mu, sigma)-Return exp (normal (mu, sigma))
  • hp.qlognormal(label, mu, sigma, q)-Returns a value similar to round (exp (normal (mu, sigma)) / Q) * Q

You can learn more about search space options here:https://github.com/hyperopt/hyperopt/wiki/FMin#21-parameter-expressions

Note: each optimizable random expression has a label (for example, n)_ Estimators) as the first parameter. These tags are used to return parameter selection to the caller during optimization.

(b) Objective function

This is a minimization function that takes the super parameter value as input from the search space and returns the loss. This means that in the optimization process, we use the selected hyperparametric value to train the model and predict the target features, and then evaluate the prediction error and return it to the optimizer. The optimizer will decide which values to check and iterate again. You will learn how to create an objective function in a practical example.

(c) fmin

Fmin function is an optimization function that iterates different algorithm sets and their super parameters, and then minimizes the objective function. Fmin has five inputs:

  • Objective function of minimization

  • Defined search space

  • The search algorithms used include random search, TPE (tree Parzen estimator) and adaptive TPE.

    Note: hyperopt.rand.suggest and hyperopt.tpe.suggest provide logic for the sequential search of the hyperparameter space.

  • Maximum number of evaluations

  • Trials object (optional)

example:

from hyperopt import fmin, tpe, hp,Trials

trials = Trials()

best = fmin(fn=lambda x: x ** 2,
    		space= hp.uniform('x', -10, 10),
    		algo=tpe.suggest,
    		max_evals=50,
    		trials = trials)

print(best)
(d) Subjects

The trials object is used to hold all the super parameters, losses and other information, which means that you can access them after running the optimization. In addition, trials can help you save and load important information before continuing with the optimization process( You’ll learn more in the actual example).

from hyperopt import Trials 

trials = Trials()

After understanding the important features of hyperopt, we will introduce how to use hyperopt.

  • Initialize the space to search.

  • Define the objective function.

  • Select the search algorithm to use.

  • Run the hyperopt function.

  • Analyze the evaluation output stored in the test object.

Hyperpot in practice

Now that you understand the important features of hyperopt, in this practical example, we will use the mobile price data set. The task is to create a model to predict the price of mobile devices as 0 (low cost) or 1 (medium cost) or 2 (high cost) or 3 (very high cost).

Install hyperopt

You can install hyperopt from pypi.

pip install hyperopt

Then import the important packages

#Import package
import numpy as np 
import pandas as pd 
from sklearn.ensemble import RandomForestClassifier 
from sklearn import metrics
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler 
from hyperopt import tpe, hp, fmin, STATUS_OK,Trials
from hyperopt.pyll.base import scope

import warnings
warnings.filterwarnings("ignore")

data set

Let’s load the dataset from the data directory. For more information about this dataset:https://www.kaggle.com/iabhishekofficial/mobile-price-classification?select=train.csv

#Loading data

data = pd.read_csv("data/mobile_price_data.csv")

Check the first five lines of the dataset.

#Read data

data.head()

As you can see, in our dataset, we have different numerical characteristics.

Let’s look at the shape of the dataset.

#Show shape

data.shape

(2000, 21)

In this dataset, we have 2000 rows and 21 columns. Now let’s look at the list of features in this dataset.

#Display list 

list(data.columns)
[‘battery_power’, ‘blue’, ‘clock_speed’, ‘dual_sim’, ‘fc’, ‘four_g’, ‘int_memory’, ‘m_dep’, ‘mobile_wt’, ‘n_cores’, ‘pc’, ‘px_height’, ‘px_width’, ‘ram’, ‘sc_h’, ‘sc_w’, ‘talk_time’, ‘three_g’, ‘touch_screen’, ‘wifi’, ‘price_range’]

You can find the meaning of each column name here:https://www.kaggle.com/iabhishekofficial/mobile-price-classification

The data set is decomposed into target feature and independent feature

This is a classification problem, we will separate the target features and independent features from the dataset. Our goal is price range.

#Split data into features and targets

X = data.drop("price_range", axis=1).values 
y = data.price_range.values

Preprocessing data sets

Then the standardscaler method in scikit learn is used to standardize the independent features.

#Standardized characteristic variables

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Define parameter space for optimization

We will use the three super parameters of the random forest algorithm, namely n_ estimators、max_ Depth and criterion.

space = {
    "n_estimators": hp.choice("n_estimators", [100, 200, 300, 400,500,600]),
    "max_depth": hp.quniform("max_depth", 1, 15,1),
    "criterion": hp.choice("criterion", ["gini", "entropy"]),
}

We set different values in the super parameters selected above. Then define the objective function.

Define minimization function (objective function)

Our minimization function is called hyper parameter adjustment, and the classification algorithm to optimize its hyper parameters is random forest. I use cross validation to avoid over fitting, and then the function returns a loss value and its status.

#Define objective function

def hyperparameter_tuning(params):
    clf = RandomForestClassifier(**params,n_jobs=-1)
    acc = cross_val_score(clf, X_scaled, y,scoring="accuracy").mean()
    return {"loss": -acc, "status": STATUS_OK}

Note: remember that hyperopt minimizes the function, so I added a minus sign to ACC:

Fine tuning model

Finally, the trial object is instantiated to fine tune the model, and then the optimal loss is printed with its super parameter value.

#Initialize the trial object
trials = Trials()

best = fmin(
    fn=hyperparameter_tuning,
    space = space, 
    algo=tpe.suggest, 
    max_evals=100, 
    trials=trials
)

print("Best: {}".format(best))
100%|█████████████████████████████████████████████████████████| 100/100 [10:30<00:00, 6.30s/trial, best loss: -0.8915] Best: {‘criterion’: 1, ‘max_depth’: 11.0, ‘n_estimators’: 2}.

After super parameter optimization, the loss is -0.8915, and the N in the random forest classifier is used_ estimators=300,max_ The accuracy of the model is 89.15%.

Analyze the results using the trials object

The trials object can help us check all the return values calculated during the experiment.

(1) Trials.results

This shows a list of dictionaries returned by “objective” during the search.

trials.results
[{‘loss’: -0.8790000000000001, ‘status’: ‘ok’}, {‘loss’: -0.877, ‘status’: ‘ok’}, {‘loss’: -0.768, ‘status’: ‘ok’}, {‘loss’: -0.8205, ‘status’: ‘ok’}, {‘loss’: -0.8720000000000001, ‘status’: ‘ok’}, {‘loss’: -0.883, ‘status’: ‘ok’}, {‘loss’: -0.8554999999999999, ‘status’: ‘ok’}, {‘loss’: -0.8789999999999999, ‘status’: ‘ok’}, {‘loss’: -0.595, ‘status’: ‘ok’},…….]
(2) Trials. Losses ()

This shows a list of losses

trials.losses()
[-0.8790000000000001, -0.877, -0.768, -0.8205, -0.8720000000000001, -0.883, -0.8554999999999999, -0.8789999999999999, -0.595, -0.8765000000000001, -0.877, ………]
(3) Trials. Statuses ()

This displays a list of status strings.

trials.statuses()
[‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ‘ok’, ……….]

Note: this test object can be saved, passed to the built-in drawing routine, or analyzed with your own custom code.

ending

Congratulations, you have finished this article

You can download the datasets and notebooks used in this article here:https://github.com/Davisy/Hyperparameter-Optimization-technologies

Link to the original text:https://www.analyticsvidhya.com/blog/2020/09/alternative-hyperparameter-optimization-technique-you-need-to-know-hyperopt/

Welcome to panchuang AI blog:
http://panchuang.net/

Sklearn machine learning official Chinese document:
http://sklearn123.com/

Welcome to pancreato blog Resource Hub:
http://docs.panchuang.net/