# Darts: classic network search method based on gradient descent, open end-to-end network search | ICLR 2019

Time：2021-6-11

Darts is a classic NAS method, which breaks the previous discrete network search mode and can carry out end-to-end network search. Because darts updates the network based on gradient, the direction of update is more accurate, and the search time is greatly improved compared with the previous method. Cifar-10 only needs 4gpu days.

Source: Xiaofei’s algorithm Engineering Notes official account

Paper: darts: differential architecture search

• Paper code: https://github.com/quark0/darts

# Introduction

At present, most of the popular neural network search methods are to select the discrete candidate network, while darts is to search the continuous search space, and use gradient descent to optimize the network structure according to the performance of the verification set

• Based on bilevel optimization, an innovative gradient based neural network search method darts is proposed, which is suitable for convolution structure and loop structure.
• The experimental results show that the gradient based structure search method has good competitiveness on cifar-10 and PTB datasets.
• The search performance is very strong, only a small number of GPU days are needed, mainly due to the gradient based optimization mode.
• Through darts, the network learned from cifar-10 and PTB can be transferred to Imagenet and wikitext-2.

# Differentiable Architecture Search

### Search Space

The overall search framework of darts is the same as nasnet. It searches the cell as the network infrastructure, and then stacks it into convolutional network or circular network. The computing unit is a directed acyclic graph, which contains an ordered sequence of $n$nodes. Each node $x ^ {(I)}$represents the intermediate information of the network (such as the characteristic graph of convolution network), and the edge represents the operation of $o ^ {(I, j)}$on $x ^ {(I)}$. Each computing unit has two inputs and one output. For convolution unit, the input is the output of the first two layers of computing units. For cyclic network, the input is the input of the current step and the state of the previous step. The output of both of them is to merge all the outputs of the intermediate node. The calculation of each intermediate node is based on all the previous nodes

A special zero operation is included to specify that there is no connection between two nodes. Darts transforms the learning of computing units into the learning of edge operations. The overall search framework is the same as nasnet and other methods. This paper focuses on how darts performs gradient based search.

### Continuous Relaxation and Optimization

Let $o$be the candidate operation set, and each operation represents the function $o (< cdot)$applied to $x ^ {(I)}$. In order to make the search space continuous, the original discrete operation selection is transformed into the softmax weighted output of all operations

The mixed weight of the operations between nodes $(I, J)$is expressed as the vector of dimension $| o |$, $alpha ^ {(I, J)}$, and the whole architecture search is simplified to learn the continuous value of $\ alpha = \ {alpha ^ {(I, J)} \}$, as shown in Figure 1. At the end of the search, each node selects the operation with the highest probability $o ^ {(I, J)} = argmax_{ o\in O}\alpha^{(i,j)}_ O$instead of $\ bar {o} ^ {(I, J)}$, constructs the final network.

# Experiments and Results

The search time is time-consuming, in which run is the best result of multiple searches.

Structure found.

Performance comparison on cifar-10.

Performance comparison on PTB.

Performance comparison of migration to Imagenet.

# Conclustion

