# Shufflenetv1 / V2 overview | lightweight network

Time：2021-7-24

Shufflenet series is a very important series in lightweight networks. Shufflenetv1 puts forward channel shuffle operation, so that the network can use packet convolution to accelerate, while shufflenetv2 pushes down most of the designs of V1, puts forward channel split operation from reality, accelerates the network and reuses features at the same time, and achieves good results
Source: Xiaofei’s algorithm Engineering Notes official account

## ShuffleNet V1

Thesis: shufflenet: an extremely efficient revolutionary neural network for mobile devices

#### Introduction

The accuracy of neural network is getting higher and higher, and the reasoning performance is gradually slowing down. In practical application, we have to make a compromise between performance and accuracy. Therefore, the paper analyzes the time-consuming of small network and puts forward shufflenet. This paper first introduces the core operation channel shuffle and group revolutions of shufflenet, then introduces the structure of shuffle unit, and finally introduces the architecture of shufflenet.

#### Channel Shuffle for Group Convolutions

In some current mainstream networks, pointwise convolution is usually used to reduce the dimension, so as to reduce the complexity of the network. However, due to the high input dimension, the overhead of pointwise convolution is very huge. For small networks, expensive pointwise convolution will bring significant performance degradation. For example, in resnext unit, pointwise convolution accounts for 93.4% of the computation. Therefore, this paper introduces packet convolution. Firstly, two implementations of shufflenet are discussed:

• Fig. 1a is the most direct method, which makes all operations absolutely dimensionally isolated, but this will lead to a specific output associated with only a small part of the input, blocking the information flow between groups and reducing the expression ability.
• Figure 1b redistributes the dimensions of the output. First, divide the output of each group into multiple subgroups, and then input each subgroup into different groups, which can well preserve the information flow between groups.

The idea of Fig. 1b can be simply implemented by the channel shuffle operation. As shown in Fig. 1C, assuming that the convolution layer containing the $g$group outputs the $g \ times n$dimension, first output reshape() as $(g, n)$, then transfer(), and finally flatten() returns the $g \ times n$dimension.

#### ShuffleNet Unit

Based on the channel shuffle operation, the paper proposes two shufflenet units, starting from the basic residual structure in Fig. 2a, including a $3 \ times$depth convolution for feature extraction:

• Figure 2B shows the shufflenet unit with the same size of the characteristic diagram. Replace the initial $1 \ times$convolution layer with the pointwise packet convolution + channel shuffle operation. The function of the second pointwise packet convolution is to restore the input dimension of the unit and facilitate the element wise addition with the shortcut. The latter two convolution operations only connect BN and not BN + relu according to the recommendations of the separable depth convolution paper. This paper attempts to follow the second pointwise packet convolution with another channel shuffle operation, but it does not improve much accuracy.
• Figure 2C shows the shufflenet unit with the feature map size halved, which can be used for feature down sampling between blocks. It mainly adds $3 \ times$average pooling in the shortcut and replaces the last element wise addition with channel concatenation to increase the output dimension without too much computation.

the calculation of shuffle unit is relatively efficient. For the input of $C \ times h \ times w$, the middle dimension of bottleneck is $M$, the calculation amount of RESNET unit is $HW (2cm + 9m ^ 2)$flops, the calculation amount of resnext unit is $HW (2cm + 9m ^ 2 / g)$flops, the calculation amount of shufflenet unit is $HW (2cm / G + 9m)$, and $g$is the number of convoluted groups. Under the condition of equal computing resources, the reduction of computing amount means that shuffenet can use more dimensional feature graphs, which is very important in small networks.
it should be noted that although deep convolution usually has low theoretical complexity, its efficiency in implementation is not high. For this purpose, shufflenet uses deep convolution only for features (lower dimensions) in bottleneck.

#### Experiment

Apply shufflenetv2 unit to large networks for comparison.

Compare the performance of shufflenetv2 as the detection network backbone.

The performance is compared with mainstream classification networks of different sizes.

##### Conclusion

Starting from practice and guided by the actual reasoning speed, this paper summarizes five design essentials of lightweight network, and puts forward shufflenetv2 according to the essentials, which takes into account the accuracy and speed. Among them, the channel split operation is very bright and achieves the effect of feature reuse similar to densenet.

## CONCLUSION

Shufflenet series is a very important series in lightweight networks. Shufflenetv1 puts forward channel shuffle operation, so that the network can use packet convolution to accelerate, while shufflenetv2 pushes down most of the designs of V1, puts forward channel split operation from reality, accelerates the network and reuses features at the same time, and achieves good results.