Lightweight Network Overview – backbone network


The core of lightweight network is the lightweight transformation of the network from both volume and speed on the premise of maintaining the accuracy as much as possible. This paper briefly describes the lightweight network, mainly involving the following networks:

Squeezenet series

Squeezenet series is an early and classic lightweight network. Squeezenet uses fire module for parameter compression, and squeezenext adds separation convolution to improve it. Although the squeezenet series is not as widely used as mobienet, its architecture idea and experimental conclusions can be used for reference.


Squeezenet is one of the early researches that began to pay attention to lightweight networks. It uses fire module for parameter compression.

Lightweight Network Overview - backbone network

The core module of squeezenet is the fire module. The structure is shown in Figure 1. The input layer first compresses the dimension through the squeeze convolution layer ($1 \ times 1 $convolution), and then expands the dimension through the expand convolution layer ($1 \ times 1 $convolution and $3 \ times 3 $convolution). The fire module contains three parameters, which are $1 \ times $convolution kernels of the squeeze layer $s_ {1×1} $, expansion layer $1 \ times 1 $convolution kernels $e_ {1×1} $and $3 \ times 3 $convolution kernels of expand layer $e_ {3×3} $, general $s_ {1×1}<(e_{1×1}+e_{3×3})$


Squeezenext is an upgraded version of squeezenet, which directly compares the performance with mobilenet. Squeezenext uses standard convolution to analyze the actual reasoning speed. The optimization means focus on the optimization of the overall network structure.

Lightweight Network Overview - backbone network

The design of squeezenext follows the residual structure. Instead of using the popular deep separation convolution at that time, it directly uses the separation convolution. The design is mainly based on the following strategies:

  • Low Rank Filters
      the core idea of low rank decomposition is to decompose a large matrix into multiple small matrices. Here, CP decomposition is used to decompose the convolution of $K \ times K $into separate convolutions of $K \ times 1 $and $1 \ times K $. The parameter quantity can be reduced from $k ^ 2 $to $2K $.
  • Bottleneck Module
      the number of parameters is related to the input and output dimensions. Although deep separation convolution can be used to reduce the amount of calculation, the calculation of deep separation convolution in the terminal system is not efficient. Therefore, the squeeze layer of squeezenet is used to compress the input dimension. Two consecutive squeeze layers are used at the beginning of each block, and each layer is reduced by 1 / 2 dimension.
  • Fully Connected Layers
      in alexnet, the parameters of the full connection layer account for 96% of the total model. Squeezenext uses the bottleneck layer to reduce the input dimension of the full connection layer, so as to reduce the amount of network parameters.

Shufflenet series

Shufflenet series is a very important series in lightweight networks. Shufflenetv1 puts forward channel shuffle operation, so that the network can use packet convolution to accelerate, while shufflenetv2 pushes down most of the design of V1, puts forward channel split operation from reality, accelerates the network and reuses features at the same time, and achieves good results.

ShuffleNet V1

The core of shufflenet is to use the channel shuffle operation to make up for the information exchange between packets, so that the network can make full use of pointwise packet convolution, which can not only reduce the main network calculation, but also increase the dimension of convolution.

Lightweight Network Overview - backbone network

In some current mainstream networks, pointwise convolution is usually used to reduce the dimension, so as to reduce the complexity of the network. However, due to the high input dimension, the overhead of pointwise convolution is also very huge. For small networks, expensive pointwise convolution will bring significant performance degradation. For example, in resnext unit, pointwise convolution accounts for 93.4% of the computation. Therefore, the paper introduces packet convolution, and first discusses the implementation of two kinds of shufflenet:

  • Figure 1a is the most direct method, which separates all operations in absolute dimensions, but this will lead to a specific output associated with only a small part of the input, blocking the information flow between groups and reducing the expression ability.
  • Figure 1b redistributes the dimensions of the output. First, divide the output of each group into multiple subgroups, and then input each subgroup into different groups, which can well preserve the information flow between groups.

The idea of Fig. 1b can be simply realized by channel shuffle operation. As shown in Fig. 1C, assuming that the convolution layer containing $g $group outputs $g \ times n $dimension, first output reshape() as $(g, n) $, then transfer() and finally flatten() back to $g \ times n $dimension.

ShuffleNet V2

The pointwise packet convolution and bottleneck results of shufflenetv1 will improve the Mac, resulting in computational loss that can not be ignored. In order to achieve high performance and high accuracy, the key is to obtain the same large-dimensional convolution as the input and output without dense convolution and too many packets. Starting from practice and guided by the actual reasoning speed, shufflenet V2 summarizes five design essentials of lightweight network, and puts forward shufflenetv2 according to the essentials, which takes into account the accuracy and speed. Among them, the channel split operation is very bright, and the input features are divided into two parts, which achieves the effect of feature reuse similar to densenet.

Lightweight Network Overview - backbone network

  the unit structure of shufflenetv1 is shown in Figure 3AB. Add the channel split operation on the basis of V1, as shown in Figure 3C. At the beginning of each unit, the characteristic graph is divided into $C-C ^ {‘} $and $C ^ {‘} $two parts. One branch is passed directly back, and the other branch contains three convolutions with the same input and output dimensions. V2 no longer uses grouping convolution because the beginning of unit is equivalent to grouping convolution. After completing the convolution operation, connect the feature, restore it to the input size of unit, and then conduct the channel shuffle operation. There is no element wise addition operation here, which also saves some computation. When implementing, concat / channel shuffle / channel split are combined to further improve the performance.
  a small amount of modification is made to unit during spatial down sampling, as shown in Figure 3D. The channel split operation is removed, so the output size is doubled and the dimension is doubled.


Lightweight Network Overview - backbone network

This paper proposes a neural network architecture search method for mobile terminal. This method mainly has two ideas. Firstly, the multi-objective optimization method is used to integrate the time-consuming of the model in the actual equipment into the search, and then the decomposed hierarchical search space is used to keep the network layer diversity. At the same time, the search space is still very simple, and mnasnet can have a better trade off in accuracy and time-consuming

Mobilenet series

Mobilenet series is a very important lightweight network family. It comes from Google. Mobilenetv1 uses deep separable convolution to build a lightweight network. Mobilenetv2 proposes an innovative inverted residual with linear bottleneck unit. Although the number of layers has increased, the overall network accuracy and speed have been improved. Mobilenetv3 combines automl technology and manual fine-tuning to build a lighter network.


Mobilenetv1 constructs a very light-weight and small delay model based on deep separable convolution, and can further control the size of the model through two super parameters. The model can be applied to terminal equipment and has very important practical significance.

Lightweight Network Overview - backbone network

Mobilenet optimizes the amount of computation through the depth separable convolution optimization, and converts the standard convolution into depth convolution and $1 \ times $pointwise convolution. BN and relu will be connected behind each layer.


Lightweight Network Overview - backbone network

Mobilenetv2 first shows that high-dimensional features can actually be expressed by compact low-dimensional features, and then proposes a new layer unit, inverted residual with linear bottleneck. This structure is similar to the residual network unit and includes shorcut. The difference is that the structure has few input and output dimensions. In the middle, the dimension is first expanded by linear convolution, then the feature is extracted by depth convolution, and finally the dimension is reduced by mapping, The network performance can be well maintained and the network is lighter.


Lightweight Network Overview - backbone network

Mobilenetv3 first builds the network based on automl, and then carries out manual fine-tuning optimization. The search method uses platform aware NAS and netadapt for global search and local search respectively. Manual fine-tuning adjusts the structure of the front and rear layers of the network, bottleneck adds se module and proposes computationally efficient h-swish nonlinear activation.


Densenet can achieve good performance based on feature reuse, but the paper believes that there is a lot of redundancy in its internal connection, and the early features do not need to be reused to the later layer. Therefore, this paper proposes condensenet based on learnable packet convolution, which can automatically sparse the network structure in the training stage, select the optimal input-output connection mode, and finally convert it into a conventional packet convolution structure.

Lightweight Network Overview - backbone network

The learning of packet convolution includes multiple stages, and the first half of the training process includes multiple condensing stages. The network is trained repeatedly combined with the regularization method of guided sparsity, and then the unimportant filter is pruned. The second half is the optimization stage, which studies the network after pruning and fixing.

Espnet series

The core of espnet series lies in the cavity convolution pyramid. Each layer has different division rates. Without increasing the amount of parameters, it can integrate multi-scale features. Compared with the depth separable convolution, the depth separable cavity convolution pyramid has higher cost performance. In addition, the multi-scale feature fusion method of HFF is also worthy of reference.


Lightweight Network Overview - backbone network

  espnet is a lightweight network for semantic segmentation, and its core lies in the ESP module. As shown in figure a, the module includes point wise convolution and hole convolution pyramid, which are used to reduce the computational complexity and resample the different features of the effective sensing domain, respectively. ESP module is more efficient than other convolution decomposition methods (mobilenet / shufflenet). Espnet can reach 112fps / 21fps / 9fps on GPU / notebook / terminal equipment.
  in addition, the paper found that although the empty convolution pyramid brings a larger perception domain, the direct concate output will bring strange grid lines. In order to solve this problem, the paper proposes the HFF operation of figure B, adding the output hierarchically before concate. Compared with adding additional convolution for post-processing, HFF can effectively solve the grid texture without too much computation. In addition, in order to ensure the gradient transmission of the network, a shortcut connection from input to output is added to the ESP module.


Lightweight Network Overview - backbone network

Espnetv2 further lightens the model based on espnet and the design method of deep separation convolution. Firstly, the point wise convolution is replaced by the grouped point wise convolution, and then the hole convolution with large amount of calculation is replaced by the deep separable hole convolution. Finally, HFF is still used to eliminate the grid texture, and the output feature is added with a feature extraction to obtain the structure of figure B. Considering that the calculation of K point wise convolutions alone is equivalent to the point wise packet convolution with a single packet number of K, and the implementation of packet convolution is more efficient, it is improved to the final structure of Fig. C.


Lightweight Network Overview - backbone network

This paper puts forward the concept of channel wise convolution, which thins out the connection of input and output dimensions rather than full connection. It is different from the strict grouping of grouping convolution. It associates the input channel with the output channel in the form of convolution sliding, which can better retain the information exchange between channels. Based on the idea of channel wise convolution, this paper further proposes channel wise deep separable convolution, and constructs channelnets based on this structure to replace the last full connection layer + global pooling operation of the network.


Lightweight Network Overview - backbone network

Based on densenet’s dense connection idea, through a series of structural optimization, this paper proposes a network structure peleenet for mobile devices, and integrates SSD to propose the target detection network Pelee. From the experimental point of view, peleenet and Pelee are good choices in speed and accuracy.

IGC series

The core of IGC series network is the extreme application of packet convolution. The conventional convolution is decomposed into multiple packet convolutions, which can reduce a large number of parameters. In addition, the principle of complementarity and sorting operation can ensure the information flow between packets with the least amount of parameters. However, on the whole, although the amount of parameters and calculation are reduced after using IGC module, the network structure becomes more cumbersome, which may lead to slower speed in real use.


Lightweight Network Overview - backbone network

The interleaved group convolution (IGC) module includes the main group convolution and the sub group convolution, which extract the features of the main and sub partitions respectively. The main partition is obtained by grouping the input features. For example, the input features are divided into $l $partitions, and each partition contains $M $dimensional features, while the corresponding sub partition is divided into $M $partitions, and each partition contains $l $dimensional features. The main group convolution is responsible for extracting the grouping features of the input feature map, while the sub group convolution is responsible for fusing the output of the main group convolution, which is $1 \ times $convolution. IGC module is similar to deep separable convolution in form, but the concept of grouping runs through the whole module and is also the key to saving parameters. In addition, two sorting modules are added in the module to ensure the information exchange between channels.


Lightweight Network Overview - backbone network

Igcv1 decomposes the original convolution through two packet convolutions to reduce parameters and maintain complete information extraction. However, the author found that because the main packet convolution and sub packet convolution are complementary in the number of packets, the number of packets of sub convolution is generally small, the dimension of each packet is large, and the sub convolution kernel is dense. Therefore, igcv2 proposes interleaved structured sparse convolution, which uses multiple continuous sparse packet convolutions to replace the original sub packet convolution. The number of packets of each packet convolution is enough to ensure the sparsity of convolution kernel.


Lightweight Network Overview - backbone network

Based on the ideas of igcv and bootleneck, igcv3 combines low rank convolution kernel and sparse convolution kernel to form a dense convolution kernel. As shown in Figure 1, igcv3 uses low rank sparse convolution kernel (botleneck module) to expand and input the dimension of grouping features and reduce the dimension of output. In the middle, it uses deep convolution to extract features. In addition, it introduces the relaxation complementarity principle, which is similar to the strict complementarity principle of igcv2, It is used to deal with the situation that the input and output dimensions of packet convolution are different.

Fbnet series

Fbnet series is a lightweight network series based entirely on NAS search. It analyzes the shortcomings of current search methods and gradually adds innovative improvements. Fbnet combines DNAs and resource constraints, fbnetv2 adds channel and input resolution search, and fbnetv3 uses accuracy prediction for fast network structure search.


Lightweight Network Overview - backbone network

This paper proposes fbnet, which uses differentiable neural network search (DNAs) to find hardware related lightweight convolutional networks. The flow is shown in Figure 1. DNAs method represents the overall search space as a hypernetwork, transforms the problem of finding the optimal network structure into finding the optimal candidate block distribution, trains the block distribution through gradient descent, and can select different blocks for each layer of the network. In order to better estimate the network delay, the actual delay of each candidate block is measured and recorded in advance, which can be accumulated directly according to the network structure and the corresponding delay.


Lightweight Network Overview - backbone network

DNAs samples the optimal subnet by training the Supernet containing all candidate networks. Although the search speed is fast, it needs a lot of memory, so the search space is generally smaller than other methods, and the memory consumption and computation consumption increase linearly with the search dimension. In order to solve this problem, the paper proposes dmaskingnas, which adds the number of channels and input resolution to the hypernetwork in the form of mask and sampling respectively, which greatly increases the search space of $10 ^ {14} $times with a small amount of memory and computation.


Lightweight Network Overview - backbone network

The paper believes that most of the current NAS methods only meet the search of network structure, and do not care whether the setting of training parameters in network performance verification is appropriate, which may lead to the decline of model performance. Therefore, this paper proposes jointnas to search the most accurate training parameters and network structure at the same time under the condition of resource constraints. Fbnetv3 is completely separated from the design of fbnetv2 and fbnet. The accuracy predictor and genetic algorithm used have been widely used in the NAS field. The main highlight is that the training parameters are added to the search process, which is very important to improve the performance.


Lightweight Network Overview - backbone network

In this paper, the model scaling is deeply studied, and a hybrid scaling method is proposed. This method can better select the dimension scaling scale of width, depth and resolution, so that the model can achieve higher accuracy. In addition, this paper proposes efficientnet through NAS neural architecture search, which can achieve high accuracy with a small number of parameters combined with the hybrid scaling method.


Lightweight Network Overview - backbone network

Trained networks generally have rich or even redundant feature map information to ensure the understanding of input. Similar feature maps are similar to each other’s ghost. However, redundant features are the key characteristics of the network. The paper believes that instead of avoiding redundant features, it is better to accept them in a cost-effective way. Therefore, a ghost module that can extract more features with fewer parameters is proposed. Firstly, the original convolution operation (non convolution layer operation) with little output is used for output, and then a series of simple linear operations are used for output to generate more features. In this way, without changing the number of output characteristic diagrams, the overall parameter quantity and calculation quantity of ghost module have been reduced.


Lightweight Network Overview - backbone network

This paper proposes a simple and efficient dynamic generation network weightnet, which integrates the characteristics of senet and condconv in the weight space. A layer of packet full connection is added behind the activation vector to directly generate the weight of convolution kernel. It is very efficient in calculation, and trade-off in accuracy and speed can be carried out through the setting of super parameters.


Lightweight Network Overview - backbone network

This paper proposes a lightweight network micronet for very low computation scenarios, which includes two core ideas: Micro factorized revolution and dynamic shift max. micro factorized revolution decomposes the original convolution into multiple small convolutions through low rank approximation to maintain the connectivity of input and output and reduce the number of connections. Dynamic shift Max increases the connection of nodes and improves nonlinearity through dynamic inter group feature fusion, Make up for the performance degradation caused by the reduction of network depth.


Lightweight Network Overview - backbone network

This paper deeply analyzes the design concept and shortcomings of inverted residual block, and puts forward sandglass block which is more suitable for lightweight network and mobilenext based on this structure. Sandglass block consists of two depthwise convolutions and two pointwise convolutions. Some convolutions do not need to be activated and shorcut is based on high-dimensional features. According to the experimental results of the paper, mobilenext has better performance in parameter quantity, calculation quantity and accuracy.

If this article is helpful to you, please like it or read it
For more information, please pay attention to wechat official account [Xiaofei’s algorithm Engineering Notes]

Lightweight Network Overview - backbone network