In the field of artificial intelligence, algorithm engineers often need to optimize various parameters of the model to obtain better model effect after completing the network construction and preparing the training data in the process of training the neural network model. However, parameter adjustment is not simple. Behind it is often parameter debugging and effect verification all night, and a large number of experiments need to be done, which is not only time-consuming but also computationally expensive.
At this time, I often want to try automatic hyperparametric search, but I begin to worry about the additional training cost caused by computational power requirements.
Don’t panic! Baidu full-featured AI development platform BML comes with free computing power limit and automatic super parameter search ability!
Let’s first introduce Baidu’s recently upgraded BML. Where is sacred?
The full-featured AI development platform BML (Baidu machine learning) is a one-stop AI development service that provides machine learning and in-depth learning for enterprise and individual developersCost effective computing resourcesTo help enterprises quickly build high-precision AI applications. BML provides information fromData acquisition, data cleaning, data tagging, intelligent tagging and multi person tagging, model training and productionreachModel management, cloud and offline reasoning service management and other AI development process life cycle management functions.
BML has built-in Baidu super large-scale pre training model, which can obtain high-precision model effect with only a small amount of data. At present, BML has supportedScript call parameter, notebookAndCustom jobThese three development modeling methods flexibly match the development habits of enterprise developers.
In the current script parameter invocation process, BML has been presetModel hyperparameterHowever, due to the rich and diverse content of user data sets, it is difficult to get a good training effect on all data sets. Users can adjust the super parameters themselves, but manual parameter adjustment is very labor-intensive. In order to reduce the energy input of users in parameter adjustment, BML’s R & D leaders fought day and night and launched it for usersAutomatic hyperparametric searchFunctions to help usersAutomatically search the super parameter combination with better effect to save the trouble of adjusting parameters。
What are the highlights of Baidu BML’s automatic super reference search technology?
Provides a variety of search algorithms
The so-called “hyperparameters”, different from the parameters of each layer within the model network structure, refer to the parameters that need to be manually adjusted and set by human experience to improve the model effect. Common hyperparameters include learning_rate, batch_size, etc. In the process of super parameter search, because the model is complex, the calculation cost is high, and each super parameter has a large value range, the search space is very large, so we need to have“automatic”Hyperparametric search.
Compared with manual parameter adjustment, automatic super parameter search mainly eliminates the process of manually observing the experimental results and adjusting parameters. Automatic super parameter search replaces this step with various search algorithms.
BML provides the following search algorithms:
1. random search
As the name suggests, the random sampling parameters are combined into a candidate set in the variation interval of parameter variables, and the candidate set is used for training and effect comparison. Random search is a universal and efficient search method. It is usually used as a baseline standard and is suitable for situations requiring high efficiency, but it can not guarantee that the best super parameters can be searched.
2. Bayesian search
The initial hyperparametric points are randomly selected in the search space, and then the probability model is fitted according to the index results corresponding to the existing hyperparametric points. The best hyperparametric points are inferred through the probability model, and then the results of these hyperparametric points are obtained by experiment. After repeated optimization, the appropriate super parameters are searched out in a limited number of tests. Sequential model-based optimization (SMBO) is a paradigm of Bayesian search, which includes two parts: surrogate model and acquisition function. According to the difference between agent model and collection function, Bayesian search method also has many implementation forms. TPE (tree structured Parzen estimator) is a method with better global exploration ability. The agent model is generated by kernel density estimation (KDE), and EI (expected improvement) is used as its collection function to generate new sampling points.
3. evolutionary algorithms
Evolutionary algorithm is a super parameter search strategy based on the concept of population. It regards the super parameter configuration as a population, optimizes multiple populations in parallel, screens the survival of the fittest within the population, and finally outputs the best model. This process (as shown in the figure below) is inspired by the genetic algorithm. The initialization of the population is generated in a random way. The survival of the fittest of individuals specifically refers to two steps: exploit and explore. It may not only copy parameters from individuals with good performance, but also explore new super parameter combinations by correcting the current value through random disturbance.
Image source: https://arxiv.org/pdf/1711.09846v1.pdf
Baidu innovatively proposed stochastic differential equation gradient free optimization algorithm pshe2, which uses Hamiltonian dynamic system to search the point with the lowest “potential energy” in the parameter space to replace random disturbance and accelerate iterative convergence. To get the optimal solution in the process of hyperparametric search is to find a method to update the hyperparametric combination, that is, how to update the hyperparametric, so as to make the algorithm converge to the optimal solution faster and better. Pshe2 algorithm determines the next update direction under a certain random disturbance according to the historical optimization of the hyperparameter itself. The process is shown in the figure.
Image source: https://github.com/PaddlePaddle/PaddleHub/blob/release/v1.5/docs/tutorial/autofinetune.md
Comparison of automatic hyperparametric search methods
The above table summarizes the advantages and disadvantages of these search methods. Anyway,The implementation of grid search and random search is relatively simple, and the next set of super parameters is not selected by using a priori knowledge, in which the random search efficiency is relatively high. Bayesian search and evolutionary algorithm need to use the previous round of information for iterative search, and the search efficiency is significantly improved.
Implementation of BML automatic hyper parameter search: system architecture
BML automatic super parameter search function is based on Baidu self developed automatic super parameter search service. As shown in the figure below, the service operation process relies on Baidu Intelligent Cloud CCE computing power to support multiple automatic search tasks concurrency.In order to provide an “easy-to-use” automatic hyper parameter search service, the improvement of concurrent search efficiency and system fault tolerance are emphatically considered in the implementation of the architecture.
A hyper parameter search task includes the following processes:
- The business platform submits the user configuration information of the hyper parameter search task to the hyper parameter search service, creates a search experiment and records it in dB.
- The search service submits the task to the experience controller, which initializes and creates the trial management module and is responsible for the management of the experience life cycle.
- Trial is a specific training experiment. An experiment will produce multiple trials to explore the final effect of different super parameter combinations. Tuner is a super parameter generation module. It will recommend the super parameter value used by the next trial according to the selected super parameter search algorithm. In the trial management module, exp manager will be responsible for generating several trials, requesting specific test super parameters from tuner, and sending trial task information to the trial scheduler.
- The trial scheduler interacts with the underlying resources to actually start trial. The trial scheduler manages the lifecycle of all trials.
- After each trial is completed, indicators and other information will be reported to exp manager for reporting to tuner and recording in dB.
BML automatic super parameter search mainly has the following features:
1. Easy to use: compared with the complex configuration of similar products, BML reduces the complexity of super parameter configuration as much as possible on the premise of providing users with necessary open configuration items, and any work that can be automated is not visible to users.
2. Rich model: get through with the rich models provided on the script parameter call, and you can directly configure to complete the search of corresponding tasks without even writing code!
3. Fault tolerance mechanism: the automatic hyperparameter search task has the characteristics of many times of model training and long overall running time of the task. Due to the limitation of video memory resources, some searched hyperparameters cannot run successfully. Taking into account the search effect and available output, add a threshold for the number of model training failures, and divide the multi task complex state management such as experience and trial into each layer module to provide users with the available search results as much as possible.
4. Support early stop and sampling: the search framework supports automatic early stop, which can be automatically stopped when the set expected results are reached; At the same time, it also supports manual early stop on the interface to reduce the user’s waiting time and avoid unnecessary computing power consumption. Support users to automatically sample when selecting large data sets, so as to reduce the time-consuming of hyper parameter search training, so as to search out the appropriate hyper parameter as soon as possible.
5. Efficient distributed intelligent search: the training time of deep learning model is often long. For the search task of large-scale data sets or complex models, the single-machine serial search method is almost unavailable. We note that in some search algorithms, each experiment can be trained independently (such as grid search and random search), and all experiments can be parallelized directly; Although some search algorithms are essentially based on iteration, the test runs in each iteration are still independent of each other, so we can parallelize the search within the iteration. BML implements an intelligent scheduling system. According to different algorithm types, different concurrency strategies can greatly reduce the overall search time.
Hands on practice: introduction to the use of automatic super parameter search
- First click https://ai.baidu.com/bml/app/project/script/list Create a script parameter calling project. If there is already a project, you can use it directly! At present, the types of items that support hyper parameter search include image classification (single label and multi label) and object detection. You can create corresponding types of items
- After creating a new task in the project and configuring the network, data and script of the task, you can see the option of “configuring super parameters”. If there are already super parameter search results here, you can directly check “existing super parameter search results” to use it. If you haven’t used it for the first time, you can directly select “automatic super parameter search”.
- At present, BML supports three hyperparametric search algorithms, as shown in the figureBayesian search、random search andevolutionary algorithms, you can select one to search according to your needs. Refer to the technical documents for specific configuration item descriptions.
3.1 Parameter description of Bayesian search
【Number of initial points】Represents the number of parameter points during initialization in Bayesian search. The algorithm infers the best advantage based on these parameter information, and fills in the range of 1-20.
[maximum concurrency]In Bayesian search, the number of experiments conducted at the same time. The greater the concurrency, the higher the search efficiency. Fill in the range of 1-20. However, this concurrency is also limited by the number of GPUs selected at the bottom of the page. The actual concurrency is the smaller of the two.
【Super parameter range setting】: can be the default configuration or manual configuration. By default, baidu engineers have set up a basic reliable search range for different networks and GPU card types, which can be used directly. Of course, it can also be configured manually. You can customize the range of each hyperparameter. You can see that object detection supports the following custom search ranges of these hyperparameters:
[maximum search times]: it refers to the maximum number of groups combined to participate in the parallel running test. Of course, it may stop because the goal is reached in advance, so as to save costs.
[data sampling ratio]: when using hyperparametric search, the original data set will be sampled and then trained to speed up the search. When the data set is not large, sampling is not recommended, which may affect the final effect. Sampling is necessary only when there is a large amount of data.
[Max map / max accuracy]: refers to the value of map (object detection) or accuracy (image classification) that we expect the model effect to achieve. When this value is reached in the test, the search will stop to avoid wasting search time in the follow-up.
3.2 Description of random search parameters
Random search is the simplest. There is no need to configure additional parameters related to the algorithm. Other common options have the same meaning as Bayesian search. Refer to Bayesian search.
3.3 Parameter description of evolutionary algorithm
Evolutionary algorithm is an algorithm with good effect. When applying this algorithm, more options need to be set:
【Number of iteration rounds】: the number of iterations in the running of evolutionary algorithm, ranging from 5 to 50.
【Disturbance interval】: the evolutionary algorithm will be disturbed randomly every few epochs, and the random factors are used to prevent the algorithm result from converging to the local optimal solution.
【Disturbance ratio】: similar to the form of chromosome crossing, the best and worst individuals in a population Cross according to the disturbance proportion in the iteration.
【Random initialization probability】: in the disturbance, there is a certain probability to initialize the individual hyperparameters.
【Population individual quantity】: an individual represents a super parameter setting, and a population contains multiple individuals.
Other options are consistent with the meaning of Bayesian search and are not repeated. The configuration of evolutionary algorithm requires a certain understanding of the principle of the algorithm. If you don’t understand the algorithm, you can directly use the default value given by Baidu!
- The option setting of the super parameter is completed. Finally, select the type and number of GPU cards and the maximum search time, and you can submit the task! The default search time here is 24 hours. After all, superparametric search will run multiple tests, which will take a long time and need to wait patiently. Of course, the more GPU cards selected, the higher the number of concurrent tests, and the less time it takes from task submission to search completion. This is obvious
- After the task is submitted, after a while, when the task enters the status of “hyper parameter search”, you can see the progress of each test, including the status, log and accuracy (map) of each test
- After the super parameter search training is completed, the five experiments with the best effect can see the detailed evaluation results, and can also be used for subsequent effect verification and release. Of course, if the data is sampled during the hyperparametric search, you can restart a training task and train the full amount of data with the hyperparameters with satisfactory results, so as to obtain the model effect of complete data
The effect is the last word: the super parameter search effect can be improved by up to 20%+
We compared the effects of using ordinary script parameter adjustment and hyperparametric search for tasks such as image classification, object detection and instance segmentation. The following is the comparison of the effects of using default script parameter adjustment parameters, evolutionary algorithm for hyperparametric search and Bayesian search algorithm for hyperparametric search on BML platform. In the figure, the left vertical axis is the accuracy of the model, and the right vertical axis is the improvement ratio of the hyperparametric search algorithm in effect. It can be seen that the effect of using hyperparametric search on different data sets is improved. When the accuracy of default parameters has exceeded 85%, the effect of using hyperparametric search can still be improved by about 5%. When the effect of default parameters is poor, the effect of hyperparametric search is more obvious, up to 22%.
Under normal operation, the available in-depth learning automatic hyper parameter search is often considered to be configurable only by large companies due to the need for cluster computing resources, which is difficult for ordinary developers to try. By using Baidu’s full-featured AI development platform BML, with limited budget, we also have the opportunity to use automatic super parameter search, so that the development efficiency can instantly catch up with the rocket speed and get rid of the shackles of human “alchemy”.BML new users now also provide 100 hours of free P4 graphics card computing power, the wool is waving to you. Come and collect it!
BML official website: https://ai.baidu.com/bml/