Launch optuna 2.0
Since the first major release in January this year, we have witnessed the community’s tremendous efforts to come up with pull requests, issues, and use cases that are not common to HPOS. The framework has been developed to accommodate a large number of new functions, including features for assessing the importance of hyperparameters, complex pruning algorithms to save computing resources, and tight integration with lightgbm. This article will introduce you to these updates and show how the project has evolved since the last version. At the same time, we’ll also share our plans for the future, so that you can understand the new features that will be introduced in the future..
If you’ve ever used optuna, you can quickly understand what’s going on below. If you want to know about this framework first, please refer to our previous blog（ https://medium.com/optuna/optuna-v1-86192cd09be5 ）It explains the concepts behind optuna.
- Importance of hyperparameters
- Performance improvement
- Hyperband Pruning
- New cma-es sampling
- Integration with third party frameworks
- Collaboration with pytorch
- Future planning
Here are some of the most important features in this release.
Importance of hyperparameters
Although optuna’s design can handle any number of hyperparameters, we usually recommend keeping the number of parameters as small as possible to reduce the dimension of search space. In fact, in many cases, only a few parameters play a leading role in determining the overall performance of the model. Since version 2.0, we have introduced a new module optuna.importance The module can evaluate the importance of each parameter to the overall performance` optuna.importances.get_ param_ This function takes a study as a parameter and returns a dictionary. The dictionary maps different super parameters to their respective importance values. The floating range of this value is 0.0 to 1.0, and the higher the value, the more important it is. At the same time, you can also try different super parameter importance assessment algorithms by modifying the evaluator parameters, including fanova, a complex algorithm based on random forests. Due to the different ways in which algorithms evaluate importance, we plan to increase the number of optional algorithms in future releases.
study.optimize(...) importances = optuna.importance.get_param_importances(study) # Specify which algorithm to use. importances.optuna.importance.get_param_importances( study, evaluator=optuna.importance.FanovaImportanceEvaluator() )
You don’t have to deal with the important data yourself, optuna has provided` optuna.importance.get_ param_ Importances’ functions with the same interface` optuna.visualization.plot_ param_ importances`。 It will return a plot chart, which is helpful for visual analysis.
fig = optuna.visualization.plot_param_importances(study)
The following is a map of importance drawn using neural networks written by pytorch. It can be seen that the learning rate “LR” is dominant.
The importance of the super parameters is obtained by means decrease importance. The different colors of the horizontal column in the figure are used to distinguish the parameter types, including integer, floating-point and category parameters.
In the new version of optuna, the super parameter sampling under RDB, the pruning of hopeless experiments and the general processing performance of optimized historical records have been greatly improved. Through the well-designed cache mechanism and database query statements, we have made a lot of improvements to the underlying storage layer responsible for reading and writing experimental data (this is particularly important for distributed training, because different nodes share data through data tables at this time). This allows optuna to be used in more general black box optimization problems. This improvement is especially obvious when running lightweight objective functions or dealing with a large number of experiments, in which the sampling process was once a performance bottleneck.
Compared with the optimization time of version 2.0 and version 1.0, they all adopt the simple objective function of sampling floating-point numbers through tree structured Parzen estimator (` tpesampler ‘). Version 2.0 is obviously faster with more parameters and trials.. Note: in each suggestion, the ‘tpesampler’ needs to collect the suggestion values from all previous tests, and the y-axis in the above figure is the logarithmic axis. The storage engine used in this comparative experiment is MySQL database.
Pruning is very important to optimize the objective function to be calculated. It enables you to effectively discover and stop meaningless experiments at an early stage, so as to save computing resources and find the best optimization scheme in a shorter time. This is also the case that we often encounter in the deep learning, the main application scenario of optuna.
For example, you may need to train neural networks with millions of parameters, which usually take hours or days to process. Hyperband is a pruning algorithm, which is based on the previous successive halving pruner algorithm. Halving each time can significantly reduce the time required for each test, but it is known to be sensitive to the way you configure it, and hyperband solves this problem. There are many ways to solve this problem, and optuna chooses heuristic algorithm to further reduce the requirements of user configuration, so that users without relevant technical background can easily use it. It was originally introduced as an experimental feature in version 1.1, but it is now stable in terms of interface and performance. Experiments show that it performs better than other pruners in general benchmark, including median pruner (default pruner in optuna). You can see that in the benchmark results below.
study = optuna.create\_study( pruner=optuna.pruners.HyperbandPruner(max\_resource=”auto”) )
Compared with the previous pruner including the median pruner (` TPE median ‘), hyperband (` TPE hyperband’) not only converges faster, but also is more stable in multiple runs (see variance in shaded regions). In the figure, 1 budget corresponds to 100 training epochs. “TPE NOP” represents no pruning.
New cma-es sampling
` optuna.samplers.CmaEsSampler `It is a new cma-es sampler. It’s better than before` optuna.integration `Cma-es sampler under sub module is faster. This new sampler can handle a large number of tests, so it should be suitable for a wider range of problems. In addition, although the previous cma-es sampler has not considered the optimization of pruning test in the past, it also has experimental function, which can make more effective use of the information obtained from pruning test in the optimization process. The sampler was developed by @ c-bata, @ c-bata is one of the main contributors to optuna and the author of ‘cmaes’ used in it.
In the past, you might create a study like this:
study = optuna.create_study(sampler=optuna.integration.CmaEsSampler())
Now you can do this with a new sub module:
study = optuna.create_study(sampler=optuna.samplers.CmaEsSampler())
Or, if you’re going to use the original module, it’s now renamed:
study = optuna.create_study(sampler=optuna.integration.PyCmaSampler())
The convergence speed of the new cma-es is faster when the pruning experiment is considered in the optimization process.
Integration with third party frameworks
Optuna has various sub modules, which can be integrated with various third-party frameworks. These include gradient enhancement frameworks such as lightgbm and xgboost, deep learning frameworks in various pytorch and tensorflow ecosystems, and many other frameworks. In the following sections, we will introduce some of the most important integrations that are closely related to this release.
Lightgbm is a complete Python framework for gradient enhancement. Optuna provides a variety of integration modules tightly integrated with lightgbm. Among them“ optuna.integration.lightgbm . train “provides an efficient step-by-step adjustment of the super parameters, which can be used for direct substitution“ lightgbm.train ”The user does not need to modify the code.
For cross validation and integration with other optuna components, such as research on recording optimization history and distributed deployment, optuna also provides` optuna.integration.lightgbm . lightgbmtuner ‘and` optuna.integration.lightgbm .LightGBMTunerCV
Mlflow is a popular framework for managing machine learning pipeline and life cycle. Mlflow tracking is a particularly useful tool for monitoring experiments through an interactive GUI. Now, due to the existence of mlflowcallback, it is very simple to use mlflow tracking to track HPO experiments in optuna. You only need to register a callback function in optuna’s optimization process.
Optimization algorithms and optimization history are clearly separated in optuna’s architecture. Storage abstracts the storage process of optimizing history to various back ends (such as RDB or memory). RDB can be used for distributed optimization or persistent preservation of historical records, while in memory storage is suitable for fast experiments without distributed optimization or persistent records. As a key value storage in memory, redis is often used for caching because of its flexibility and high performance. In this release, we experimentally add a redis store, which establishes a compromise between the existing RDB and memory storage. Redis storage is easy to set and can be used as a backup option for users who cannot configure RDS.
Official documents https://optuna.readthedocs.io After redesign and improvement, the appearance has been completely new. By creating separate short pages for each function or class, the readability is greatly improved and it is easier to jump between different pages.
Collaboration with pytorch
Pytorch is a popular deep learning framework, and deep learning is also an important application field of optuna. Optuna recently joined the pytorch ecosystem, and we are continuing to add features to make it easier to integrate with pytorch and other frameworks within its ecosystem.
In addition to the existing integration modules for pytorch lightning and pytorch ignite, we have also added an integration module for allennlp, a pytorch based machine learning framework dedicated to natural language processing (NLP). When using allennlp, users usually need to define the model and training process in the jsonnet file. When used with optuna, this is inconvenient because parameters must be read and written from the system’s files. Allenlpexecutor allows you to optimize these model definitions in just a few lines of code. See @ himkt’s blog post for details.
Using optuna in pytoch reinforcement learning
The team that created the chainerll reinforcement learning algorithm library has migrated its algorithm library to pytorch, renamed pfrl, and already includes optuna support. In fact, as part of pfrl, the benchmark algorithm of minecraft RL competition in neurips 2020 has used optuna
Although we initially planned many features for this release, the final product exceeded expectations. For example, the above integration modules were not in our original plan, but developed by more and more contributors. We will continue to review the functions to be included and will continue to accept community views on new areas where optuna can work.
As for the specific development plan, multi-objective (MO) optimization is a main function in the planning. With multi-objective optimization, you will be able to optimize the objective function based on multiple criteria. A typical scenario is to maximize model accuracy while minimizing flops (floating point operations per second). The Mo algorithm in optuna optimizes these two conditions at the same time, so as to provide users with the so-called Pareto frontier formed by the best experiment (because in multi-objective, the trial with the best single objective cannot be considered as the global optimal). If you are interested, multi-objective optimization can be used as an experimental function, which is implemented with complex algorithms such as NSGA-II. Because its API is very similar to the existing API in optuna, you can start using it with just a few changes to the existing code. In the upcoming release, we will improve and fix the API and expect you to use this function in production or use or develop new Mo algorithm.
As with previous versions, feedback, code, and comments from a large number of contributors are essential for this release.
A03ki, AnesBenmerzoug, Crissman, HideakiImamura, Muragaruhae, PhilipMay, Rishav1, RossyWhite, VladSkripniuk, Y-oHr-N, arpitkh101, araffin, barneyhill, bigbird555, c-bata, cafeal, chris-chris, crcrpar, d1vanloon, daikikatsuragawa, djKooks, dwiel, festeh, g-votte, gorogoroumaru, harupy, hayata-yamamoto, hbredin, henry0312, higumachan, himkt, hross, hugovk, hvy, iwiwi, jinnaiyuu, keisuke-umezawa, kuroko1t, momijiame, mottodora, nmasahiro, nuka137, nzw0301, oda, okapies, okdshin, pablete, r-soga, scouvreur, seungjaeryanlee, sfujiwara, shino-yoshi, sile, smly, suecharo, tadan18, tanapiyo, tohmae, toshihikoyanase, upura, victorhcm, y0z, yasuyuky, ytsmiling, yutayamazaki, zishiwu123 and zzorba.
We thank all those who have followed this project from the beginning and who have come with us.
Optuna Chinese document
Optuna Chinese document: https://zh-cn.optuna.org (it is already a wall, please use this alternate link to access)