ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

Time:2022-1-11

background

Meituan‘s growing user side and merchant side businesses have very broad and strong demands for artificial intelligence (AI) technology. From the perspective of users, meituan AI has more than 200 life service scenarios such as store consumption and hotel tourism in addition to takeout, which all need AI to improve the user experience. From the perspective of merchants, meituan AI will help merchants improve efficiency and analyze operation conditions. For example, it can conduct fine-grained analysis on user comments to depict the current situation of merchant services, merchant competitiveness analysis, and business district insight, so as to provide merchants with refined business suggestions.

At present, the R & D fields involved in meituan AI include natural language understanding, knowledge map, search, speech recognition, speech generation, face recognition, character recognition, video understanding, image editing, AR, environment prediction, behavior planning, motion control, etc. The two key parts of AI technology landing in these scenarios are large-scale data and advanced in-depth learning models. The design and update iteration of high-quality models are the pain and difficulty of current AI production and development. Automation technology is urgently needed to assist and improve production efficiency. The technology that came into being in this scenario is called automated machine learning (automl). Automl is considered as the future solution of model design, which can liberate AI algorithm engineers from the complex trial and error of manual design.

Google officially proposed neural architecture search (NAS) in 2017[1]Used to automatically generate model architecture, this technology is highly expected by the industry and has become a core component of automl. With the increasing computing power and continuous iterative NAS algorithm, vision model has produced a series of far-reaching models at the architecture level, such as efficientnet and mobilenetv3. NAS has also been applied to many directions in the fields of vision, NLP, voice and so on[2,3]。 As an AI that generates AI model, NAS is of great significance. Meituan has also carried out in-depth research in the NAS direction and maintained a positive exploration in this field.

This paper introduces the cooperative article darts between meituan and Shanghai Jiaotong University-[4], the article will be published at the ICLR 2021 summit. The full name of ICLR (International Conference on learning representations) is the international learning representation conference. In 2013, it was led and founded by two deep learning Daniel and Turing Award winners yoshua bengio and Yann Lecun. ICLR has been established for only seven years, but it has been widely recognized by the academic community and is considered as “the top conference in the field of in-depth learning”. ICLR’s H5 index is 203, ranking 17th among all scientific publications, surpassing neurips, iccv and ICML. A total of 2997 papers were submitted in this ICLR, and 860 papers were finally received, including 53 oral (receiving rate 6%), 114 spotlights and 693 posters, with a receiving rate of 28.7%.

Introduction to neural network architecture search

The main task of neural network architecture search (NAS) is how to search for the optimal model in limited time and resources. NAS is mainly composed of three parts: search space, search algorithm and model evaluation. NAS was first verified in the visual classification task. The common search space in the classification task is divided into two types: cell based and block based. The former is characterized by rich graph structure, and the same units are connected in series to form the final network. The latter is straight tube type, and the focus of search is the selection of sub structural blocks in each layer.

Classified by search algorithm, NAS mainly includes reinforcement learning (RL), evolutionary algorithm (EA) and gradient based optimization. RL method obtains feedback by generating and evaluating the model, adjusts the generated strategy according to the feedback, so as to generate a new model, and cycles this process until it is optimal. EA method encodes the model structure into “genes” that can cross and mutate, and obtains a new generation of genes through different genetic algorithms until it reaches the best. The advantage of EA method is that it can deal with a variety of objectives. For example, the advantages and disadvantages of a model have multiple inspection dimensions such as parameter quantity, calculation delay and performance index. EA method is very suitable for exploration and evolution in multiple dimensions. However, RL and EA are time-consuming, mainly limited by the model evaluation part, and generally adopt the method of full and small amount of training. The latest one shot route adopts the method of training a super network containing all sub structures to evaluate all sub networks, which can greatly improve the efficiency of NAS. However, in the same period, the darts method based on gradient optimization is more efficient and has become the mainstream choice of NAS methods.

Darts was proposed by Liu Hanxiao, a researcher at Carnegie Mellon University (CMU), and its full name is differential architecture search (darts)[5], which greatly improves the search efficiency and is widely recognized by the industry. The differentiable method (darts) is based on gradient optimization. Firstly, it defines a substructure (cell) based on directed acyclic graph (DAG). DAG has four intermediate nodes (gray box in Figure 1 below), and each edge has multiple optional operators (represented by edges of different colors). The results of adding different edges through softmax are used as input to the next node. Stacking such substructures can form the backbone of the network. Darts regards the search process as an optimization process for stacked backbone networks (also known as hypernetworks, or over parameterized networks). Here, each edge is given different structural weights, and cross with the network weights to update the gradient. After optimization, the operator with significant structure weight (represented by thick lines) is used as the final subnet, and the subnet is used as the search result (Figure 1D shows the final cell structure). This process (Figure 1 from C to d) rigidly truncates the continuous structural weight into discrete values, such as 0.2 to 1 and 0.02 to 0, which will produce the so-called discretization gap.

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

Difficulties in neural network architecture search

Briefly summarize the main difficulties to be solved in the current neural network architecture search:

  • Of the search processEfficiency: the computational resources and time consumed by the search algorithm should be in an acceptable range, so that it can be widely used in practice and directly support the model structure search for business data sets;
  • Of search resultsEffectiveness: the search model should have good performance on multiple data sets, good generalization performance and domain migration ability. For example, the search classified backbone can be well migrated to detection and segmentation tasks, and has good performance;
  • Of search resultsRobustness: while it is effective, the results of multiple searches should be relatively stable, that is, improve the reliability of search and reduce the cost of trial and error.

Shortcomings and improvement of differentiable methods

The deficiency of the differential neural network architecture search method is poor robustness and prone to performance collapse, that is, the super network performance in the search process is very good, but the inferred subnet has a large number of skip connections, which seriously weakens the performance of the final model. Many improvements have emerged based on darts, such as progessive darts[6],Fair DARTS[7],RobustDARTS[8],Smooth DARTS[9]Wait. Among them, robustdarts in ICLR 2020 full score paper proposes to use Hessian characteristic root as a sign of performance collapse of darts, but calculating characteristic root is very time-consuming. Moreover, in the standard darts search space, the model performance obtained by robustdarts on cifar-10 dataset is not outstanding. This makes us think about how to improve robustness and effectiveness at the same time. There are different analysis and solutions to these two problems in the industry. The representative ones are fair darts (ECCV 2020), robustdarts (ICLR 2020) and smooth darts (ICML 2020).

Fair darts observed the existence of a large number of jump connections, and emphatically analyzed their possible causes. This paper holds that in the differentiable optimization process of jump connection, there is an unfair advantage in the competitive environment, which makes it easy for jump connection to win in the competition. Therefore, fairdarts proposes to relax the competitive environment (softmax sum) to the cooperative environment (sigmoid sum), so as to invalidate the impact of unfair advantage. The final operator selection method is also different from darts. By adopting threshold truncation, such as selecting operators with structure weight higher than 0.8, the jump connection can appear at the same time with other operators, but this is equivalent to increasing the search space: in the original subnet, only one is finally selected between two nodes.

Robustdarts (r-darts for short) judges whether there is collapse in the optimization process by calculating the Hessian characteristic root. The article believes that there is a sharp local minima (sharp local minima, right point of figure 5a) in the loss landscape, and the discretization process( α* To αdisc)It will lead to the deviation from the sharp points with good optimization to the places with poor optimization, resulting in the degradation of the final model performance. R-darts found that this process is closely related to Hessian characteristic root (Figure 5b). Therefore, it can be considered that when the change range of Hessian characteristic root is too large, the optimization should be stopped, or the large change of Hessian characteristic root should be avoided by regularization.

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

Smooth darts (sdarts) follows the judgment basis of r-darts, adopts the regularization method based on perturbation, and implicitly constrains the Hessian eigenvalues. Specifically, sdarts gives a certain degree of random disturbance to the structural weight, which makes the super network have better anti-interference and smooth the loss function landform.

DARTS-

Analysis on the working mechanism of jump connection

We first analyze the performance collapse phenomenon from the working mechanism of jump connection. ResNet[11]The jump connection is introduced into the network, so that the shallow layer of the network always contains a gradient to the deep layer during back propagation, so the phenomenon of gradient disappearance can be alleviated. The following formula (I, J, K represents the number of layers, X is the input, W is the weight, and F is the calculation unit).

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

In order to clarify the impact of jump connection on the performance of residual network, we did a group of confirmatory experiments on RESNET, that is, adding learnable structural weight parameters to jump connection β, At this time, our gradient calculation becomes the following formula:

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

The three experiments were initialized respectively β For {0, 0.5, 1.0}, we found β It can always grow rapidly near 1 (Figure 2) to increase the transfer of deep gradient to shallow layer, so as to alleviate the disappearance of gradient.

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

In darts, the jump connection is similar to RESNET. When it has learnable parameters, its structural parameters also have this trend, so as to promote the training of super network. However, as fair darts [7] mentioned, the problem is that jump connection has unfair advantages for other operators.

Solution to collapse: add auxiliary jump connection

According to the above analysis, Darts – points out that the jump connection (skip in Figure 1 below) has a dual function:

  • As an optional operator itself, participate in the construction of subnet.
  • The residual structure is formed with other operators, which promotes the hypernetwork optimization.

The first role is to anticipate the role it will play, so as to compete fairly with other operators. The second effect is that the jump connection has an unfair advantage, which promotes the optimization, but interferes with our inference of the final search results.

In order to separate the second effect, we propose to add an additional auxiliary skip and make its structure weight β Attenuate from 1 to 0 (linear attenuation is used for simplicity), which can maintain the structural consistency of hypernets and subnets. Figure 1 (b) shows the connection between two nodes in a substructure.

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

In addition to the additional auxiliary jump connections, the darts optimization process is similar to that of darts. First, build a Supernet according to figure 1 (b) and select one β The attenuation strategy is adopted, and then the Supernet weight W and structure weight are optimized by alternating iteration α, See algorithm 1 below for details.

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

In this method, we eliminate the practice of using indicators to find performance collapse, such as the characteristic root in r-darts, so as to eliminate the performance collapse of darts, so it is named Darts -. In addition, according to pr-darts[12]According to the convergence theory, the auxiliary jump connection has the effect of balancing the competition between operators, and when β After attenuation, the fair competition between operators remains.

Analysis and validation

Change trend of Hessian characteristic root

Under r-darts and multiple search spaces adopted by darts, Darts – found that the subnet performance increased (Figure 4b) but the Hessian characteristic root changed too much (Figure 4a). This result has become a counterexample of the principle proposed by r-darts, that is, if we adopt the r-darts judgment criterion, we will miss some good models. This also shows that Darts – can bring a model structure different from r-darts.

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

Verification set accuracy

The accuracy of the verification set can explain the difficulty and ease of the optimization process of the model to a certain extent. The landform of darts (Figure 3a) near the optimal solution is relatively steep, and the contour is relatively dense and uneven, while Darts – is gentle and smooth, and the contour is more uniform. In addition, smoother landforms are not prone to sharp local optima, and the discretization deviation is reduced to a certain extent.

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

experimental result

model structure

Figure 9 shows the network structure obtained in Darts search space S0 and robust darts search space S1-S4. Figure 10 shows the results of direct search on Imagenet dataset in the search space of mobilenetv2.

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

Classification task results

Darts – has achieved industry-leading results on standard classification dataset cifar-10 and Imagenet, as shown in the following table:

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

Among the multiple search spaces S1-S4 proposed by robustdarts to test robustness, the model performance obtained by darts search is better than r-darts and sdarts.

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

NAS algorithm evaluation

Nas-bench-201 [10] is one of the benchmark tools used to measure NAS algorithms. Darts – has also achieved better results than other NAS algorithms, and the best results are basically close to the best model in the benchmark.

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

Migration ability

As the backbone network, darts-a is also better than the previous NAS model in the target detection task of coco dataset, and the map reaches 32.5%.

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

On the whole, darts method inherits the high efficiency of darts, and proves its robustness and effectiveness in standard data sets, NAS benchmark evaluation and r-darts search space. It also proves its ability of domain migration in detection tasks, which confirms the superiority of the search method itself and solves some problems in the current neural network architecture search, It will play a positive role in promoting NAS research and application.

Summary and Prospect

The article Darts – included by meituan in ICLR 2021 re combs the reasons why darts search results are not robust, analyzes the dual role of jump connection, and puts forward a method to separate it by adding auxiliary jump connection with attenuation coefficient, so that the original jump connection in the inner layer only shows its function as an optional operation. At the same time, we deeply analyze the characteristic root that r-darts depends on, and find that as a sign of performance collapse, there will be counter examples. In the future, Darts – as an efficient, robust and general search method, is expected to be more expanded and applied in tasks and landing in other fields. For more details about the article, please refer to the original text. The experimental code is already inGitHubOpen source.

Automl technology can be applied to computer vision, voice, NLP, search recommendation and other fields. The automl algorithm team of visual intelligence center aims to empower the company’s business and accelerate the implementation of algorithms through automl technology. At present, this paper has applied for a patent, and the algorithm in this paper is also integrated into meituan automatic vision platform system to accelerate the production and iteration of automatic models. In addition to visual scenarios, we will explore the applications in business scenarios such as search recommendation, unmanned vehicle, optimization and voice.

Introduction to the author

Xiangxiang, Xiaoxing, Zhang Bo and Xiaolin are all from meituan visual intelligence center.

reference

  1. Learning Transferable Architectures for Scalable Image Recognition, https://arxiv.org/abs/1707.07012.
  2. NAS-FPN: Learning scalable feature pyramid architecture for object detection, https://arxiv.org/abs/1904.07392.
  3. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation, https://arxiv.org/abs/1901.02985.
  4. DARTS-: Robustly Stepping out of Performance Collapse Without Indicators, https://openreview.net/forum?id=KLH36ELmwIB.
  5. DARTS: Differentiable Architecture Search, https://arxiv.org/pdf/1806.09055.pdf.
  6. Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation, https://arxiv.org/pdf/1904.12760.
  7. Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search, https://arxiv.org/pdf/1911.12126.pdf.
  8. Understanding and Robustifying Differentiable Architecture Search,https://openreview.net/pdf?id=H1gDNyrKDS.
  9. Stabilizing Differentiable Architecture Search via Perturbation-based Regularization, https://arxiv.org/abs/2002.05283.
  10. NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search ,https://openreview.net/forum?…
  11. Deep Residual Learning for Image Recognition, https://arxiv.org/abs/1512.03385.
  12. Theory-inspired path-regularized differential network architecture search,https://arxiv.org/abs/2006.16537.

Read the collection of more technical articles of meituan technical team

front end | algorithm | back-end | data | security | Operation and maintenance | iOS | Android | test

|Special purchases for the Spring Festival official account are special purchases for the Spring Festival special purchases for the Spring Festival. Special purchases for the Spring Festival are also available in the public menu bar dialog box, which is to reply to the following 2020 items: the 2019 year goods, the 2018 goods, and the 2017 goods.

ICLR 2021 | meituan automl paper: robust neural network architecture search Darts-

This article is produced by meituan’s technical team, and the copyright belongs to meituan. You are welcome to reprint or use the content of this article for non-commercial purposes such as sharing and communication. Please indicate that “the content is reprinted from meituan technical team”. This article may not be reproduced or used commercially without permission. For any business activities, please send an email to [email protected] Apply for authorization.