The idea of netadapt is ingenious and effective. It divides the optimization target into several small targets, and introduces the actual indicators into the optimization process. It can automatically generate a series of platform related simplified networks. It not only has fast search speed, but also has good performance in accuracy and delay
Source: Xiaofei’s algorithm Engineering Notes official account
Paper: netadapt: platform aware neural network adaptation for mobile applications
 Address: https://arxiv.org/abs/1804.03230
 Paper code: https://github.com/denru01/netadapt
Introduction
There are mainly two methods of lightweight network, which are structure optimization and artificial quantization, but the above two methods can not guarantee the network to have better performance on different devices, and the current methods are mostly guided by indirect indicators (calculation / parameter), which are often different from the actual results.
Therefore, this paper proposes a platform related automation network simplification method, netadapt. The logic is shown in Figure 1, and the network that meets the expected resource consumption is slowly obtained by iterative optimization. Netadapt introduces the direct index of resources into the optimization process, supports multiple resource constraints at the same time, and can quickly search the simplified network related to the platform.
Methodology: NetAdapt
Problem Formulation
The main goal of netadapt is to solve the following non convex constrained optimization problems
$net $is a simplified network from the initial pre training network; $ACC (\ cdot) $is an accuracy calculation; $res_ J (< cdot) $is the consumption calculation of resource $J $, $bud_ J $is the total amount of resources, and it is also the constraint of optimization, which can be delay, energy consumption, memory or other.
netadapt divides the above optimization objectives into several small objectives for iterative optimization
$Net_ I $is the most accurate network generated by the iteration of $I $, $net_ 0 $is the initial pre training model. With the increase of the number of iterations, the resource consumption of the network will become less_{ i. J} $represents the reduction of resource $J $in iterations of $I $and the overall idea is similar to learning rate scheduling. When $res_ j(Net_{ i1})\Delta R_{ i,j}=Bud_ When J $satisfies all resources, the algorithm stops, outputs the optimal network in each iteration, and selects the appropriate network from it.
Algorithm Overview
Assuming that the current optimization goal is only time delay, we can optimize the data consumption by reducing the number of cores in the convolution layer or full connection layer. The algorithm logic of netadapt is shown in algorithm 1.
Figure 2 shows the details of each iteration. The number of reserved cores (choose # of filters) and the number of reserved cores (choose which filters) are selected layer by layer (or network unit as a unit). The selection of the number of cores is based on empirical estimation (which will be mentioned later). Note that the whole core is removed instead of some weights, For example, the convolution kernel of $512 / times 3 / times 3 is reduced to the convolution kernel of $256 / times 3 / times 3. After removing the kernel, the corresponding feature graph is removed. The optimization of each layer produces a simplified network, which is followed by short term fine tune (short term fine tune) to recover the accuracy.
after the above operation, netadapt generates $k $simplified networks in a single iteration, and the network with the highest accuracy is selected as the initial network of the next iteration (pick highest accuracy). If the network of the current iteration has met the resource requirements, the optimization is exited and the optimal network generated by each iteration is fine tune until convergence (long term fine tune).
Algorithm Details

Choose Number of Filters
The number of cores selected in the current layer is determined based on empirical estimation. The number of cores is gradually reduced and the resource consumption of each simplified network is calculated. The maximum number of cores that can meet the current resource consumption constraint is selected. When reducing the number of cores in the current layer, the related dimensions of the latter layer should be modified correspondingly, which should also be considered in the resource consumption calculation.

Choose Which Filters
There are many ways to choose the reserved core. In this paper, we use the simple magnitude based method, that is, we choose the largest core of L2 norm with $n $which is determined by the above steps.

Short/LongTerm FineTune
In each iteration of netadapt, we use a relatively small number of shortterm fine tune search to recover the accuracy, which is very important for small networks. If we do not do this, the accuracy of the network may be reduced to zero, resulting in the algorithm choosing the wrong network. With the progress of the algorithm, although the network will continue to train, it has not reached the point of convergence, so when the last series of adaptive networks are obtained, long term fine tune is used until convergence is the last step.
Fast Resource Consumption Estimation
In the adaptive process, offline computing is needed to simplify the network resource consumption, which may be very slow and difficult to parallel due to the limited equipment, which will become the bottleneck of the algorithm.
In this paper, we build several layer wise lookup tables to solve the resource consumption calculation problem mentioned above, that is, the empirical estimation mentioned above. Each table pre calculates the resource consumption of the corresponding layer under different input dimensions and number of cores. Note that layers with the same input size and configuration can share the table content. When estimating, first find the corresponding layer table, and estimate the network wise resource consumption by accumulating the layer wise resource consumption. The logic is shown in Figure 3.
Figure 4 compares the estimated delay with the actual delay in the optimization process of mobilenetv1. It can be seen that the two values are highly correlated.
Experiment Results
Compare the effect of netadapt and other network simplification methods on small mobile netv1 (50%).
Compare the simplification effect of netadapt and other network simplification methods on small mobile netv1 (100%) on different devices.
Conclustion
The idea of netadapt is ingenious and effective. It divides the optimization target into several small targets, and introduces the actual indicators into the optimization process. It can automatically generate a series of platform related simplified networks, which not only has fast search speed, but also has good performance in accuracy and delay.
If this article is helpful to you, please like it or read it
More content, please pay attention to WeChat official account.