# IGC series: full packet convolution network, packet convolution extreme use | lightweight network

Time：2021-6-13

The core of IGC series network is the extreme application of packet convolution. The conventional convolution is decomposed into multiple packet convolutions, which can reduce a large number of parameters. In addition, the principle of complementarity and sorting operation can ensure the information flow between packets with the least number of parameters. But on the whole, although the amount of parameters and calculation is reduced after using IGC module, the network structure becomes more complicated, which may lead to slower speed in real use

Source: Xiaofei’s algorithm Engineering Notes official account

# IGCV1

Paper: interleaved group resolutions for deep neural networks

### Introduction

The interleaved group convolution (IGC) module includes primary convolution and secondary convolution, which extract features from primary partition and secondary partition respectively. The primary partition is obtained by grouping input features. For example, the input features are divided into $l$partitions, each partition contains $M$dimensional features, and the corresponding secondary partition is divided into $M$partitions, each partition contains $l$dimensional features. The main group convolution is responsible for extracting the group features of the input feature graph, while the secondary group convolution is responsible for fusing the output of the main group convolution, which is $1 times 1$convolution. IGC module is similar to deep separable convolution in form, but the concept of grouping runs through the whole module, which is also the key to saving parameters. In addition, two sorting modules are added in the module to ensure the information exchange between channels.

### Interleaved Group Convolutions

The IGC module is shown in Figure 1. The main convolution extracts the input group features, and then samples the output features of the main convolution in intervals to facilitate the feature fusion of the subsequent secondary convolution. The output of the secondary convolution is concatenated and restored to the order before sampling, and then used as the final output.

• ##### Primary group convolutions

Suppose there are a total of $l$primary partitions, and each primary partition contains $M$dimensional features. The operation of primary block convolution is shown in Formula 1, and $Z_ L$is the $(MS)$dimension eigenvector extracted according to the convolution kernel size; $s$is the convolution kernel size; $W ^ p_{ Ll}$corresponds to the convolution kernel of partition $l$, with the size of $m / times (MS)$, $x = [Z ^ {top}_ 1\ z^{\top}_ 2\ \cdots \ z^{\top}_ 50] ^ {top}$represents the input of the main block convolution.

• ##### Secondary group convolutions

Convolute the output of the main block $\ {y}_ 1, y_ 2,\cdots ,y_ L \}$is rearranged into $M$sub partitions, and each partition contains $l$dimension features, so as to ensure that each partition contains features from different primary partitions

$\bar{y}_ M$corresponds to the second partition of $M$, $Y_{ LM}$is $y_ The second block convolution is calculated on the second partition$W^d_{ Mm} $corresponds to the convolution kernel of$1 times 1 $corresponding to the$M $sub partition, with the size of$l times L $. The output of the sub block convolution will be rearranged in the order of the main partition, and the$l $rearranged partition is$\ {x ^ {‘}_ 1, x^{‘}_ 2, \cdots, x^{‘}_ The calculation of L} $is as follows: Combined with the formula of primary convolution and secondary convolution, IGC module can be summarized as follows:$W ^ p $and$W ^ D $are block diagonal matrices, and$W = PW ^ DPT ^ {top} w ^ p $is defined as mixed convolution kernel In other words, IGC module can be regarded as conventional convolution, and its convolution kernel is the product of two sparse kernels. ### Analysis • ##### Wider than regular convolutions Considering the input of single pixel, the parameters of IGC module are as follows$g = ml $is the number of dimensions covered by IGC. For conventional convolution, the input / output dimension is$C $, and the parameter quantity is: Given the same parameter amount,$t_{ igc}=T_{ RC} = t $, get$C ^ 2 = – frac {1} {s} T $,$G ^ 2 = – frac {1} {s / L + 1 / M} T $Considering the case of$s = 3 times 3 $, when$l > 1 $, we can get$g > C $, that is, IGC module can process more input dimensions than conventional convolution. • ##### When is the widest？ In this paper, we study the influence of partition number$l $and$M $on convolution width When$l = MS $, formula 12 takes the equal sign and gives a certain parameter. The upper bound of convolution width is: When$l = MS $, the convolution width is the largest The paper lists the width comparison under different settings, and it can be seen that the width is the largest when the$l-simeq 9m $. • ##### Wider leads to better performance？ Fixed parameter means that the effective parameters of primary convolution and secondary convolution are fixed. When the input feature dimensions are more, the convolution kernel is larger and the convolution becomes more sparse, which may lead to performance degradation. Therefore, the paper also compares the performance of different configurations, as shown in Figure 3. ### Experiment The network structure of small-scale experiment and the comparison of parameters and calculation amount, pay attention to the structure of IGC + BN + relu. Performance comparison on cifar-10. Performance comparison with SOTA on multiple datasets ### Conclusion IGC module uses two-layer group convolution and sorting operation to save the amount of parameters and calculation. The structure design is simple and ingenious. The paper also makes full derivation and analysis of IGC. It should be noted that although the paper obtains the lightness of IGC module from the amount of parameters and calculation, as mentioned in shufflenetv2 paper, the amount of parameters and calculation can not be equal to the reasoning delay. # IGCV2 Paper: igcv2: interleaved structured sparse convolutional neural networks • Address: https://arxiv.org/abs/1804.06202 ### Introduction Igcv1 decomposes the original convolution by two block convolutions to reduce the parameters and keep the complete information extraction. However, the author finds that the number of sub convolutions is generally small, the dimension of each packet is large, and the core of sub convolution is dense because the primary convolution and sub convolution are complementary in the number of packets. Therefore, igcv2 proposes interleaved structured sparse convolution, which uses multiple continuous sparse convolutions to replace the original sub convolutions. The number of packets in each convolution is enough to ensure the sparsity of convolution kernel. ### Interleaved Structured Sparse Convolutions The core structure of igcv2 is shown in Figure 1. Multiple sparse convolutions are used to replace the original dense secondary convolutions, which can be formulated as follows:$P_ lW_ L $is a sparse matrix, where$p_ The matrix is used to rearrange dimensions_ The dimension of each group is $K_ l$。

There is a complementary principle in the design of igcv2. Each packet of a packet convolution needs to be associated with each packet of another packet convolution, and only the one-dimensional characteristics of each packet are associated, that is, there is only one connection between the packets. You can see from Figure 1 that the core is the method of sequencing. According to the principle of complementarity, we can get the input dimension $C$and the number of dimensions per layer $K_ L$relationship:

In addition, similar to the derivation of igcv1, when $l = log (SC)$, the parameter of igcv2 is the least, and $s$is the convolution kernel size. Here, the convolution is calculated by sub convolution of $1 / times 1$.

### Discussions

The design of igcv2 is also discussed

• Non structured sparse kernels do not use sparse matrix, but use regularization method to guide convolution kernel sparsification. This paper finds that this will restrict the expression ability of the network.
• The principle of complementarity is not necessary. It is only a criterion for efficient design of block convolution. Composite convolution can be designed more sparsely without full connection of input and output.
• Sparse matrix multiplication and low rank matrix multiplication, low rank matrix factorization is a common method of compression, but sparse matrix factorization is rarely studied. Next, we can explore the combination of sparse matrix factorization and low rank matrix factorization to compress convolutional network.

### Experiment

Compared with the network structure, igcv2 main convolution uses deep convolution.

Compared with the similar network structure.

Compared with SOTA network.

### Conclusion

Igcv2 is further sparsified on the basis of igcv1, using multiple sparse convolutions instead of the original dense sub convolutions. This paper still uses sufficient derivation to analyze the principle and super parameters of igcv2. However, as mentioned above, the amount of parameters and calculation can not be equal to the reasoning delay, so it needs to be compared on the actual equipment.

# IGCV3

Paper: igcv3: interleaved low rank group resolutions for efficient deep neural networks

### Introduction

Based on the idea of igcv and bootleneck, igcv3 combines low rank convolution kernel and sparse convolution kernel to form dense convolution kernel. As shown in Figure 1, igcv3 uses low rank sparse convolution kernel (botleneck module) to expand and input the dimension of grouping features and reduce the dimension of output. In the middle, deep convolution is used to extract features. In addition, relaxation complementarity principle is introduced, Similar to the strict complementarity principle of igcv2, it is used to deal with the different input and output dimensions of block convolution.

### Interleaved Low-Rank Group Convolutions

Igcv3 mainly extends the structure of igcv2 by introducing low rank block convolution instead of the original block convolution, which contains $g of packets_ 1$low rank pointwise block convolution, deep convolution, and the number of blocks is $G_ Two low rank pointwise convolutions are used to expand the feature dimension and recover the feature dimension to the original size, respectively$p ^ 1 $and$p ^ 2 $are sorting matrices;$W ^ 1 $is$3 times 3 $deep convolution;$\ hat {w} ^ 0 $and$W ^ 2 $are low rank sparse matrices$\hat{W}^g_{ j. K} in R ^ C $contains$\ frac {C} {G_ 1} A non-zero weight corresponding to the convolution kernel of the first block is used to expand the dimension$W^2_ G$is the convolution kernel of the second block convolution (the third block convolution in Fig. 2), which is used to reduce the dimension to the original size.

Because the input and output dimensions of the block convolution of igcv3 are different, the complementarity principle proposed by igcv2 can not be satisfied (there are multiple connection paths between the input and output), and the sorting operation can not be used as before. To solve this problem, this paper proposes the concept of Super channels, which divides the input / output / intermediate dimension into $C_ S$is a super dimension. The input and output super dimensions contain $\ frac {C} {C}_ s} The super dimension of the intermediate feature contains$\ frac {C}_{ int}}{C_ s} As shown in Figure 2, the principle of complementarity is satisfied in the unit of super dimension, and the sorting operation is carried out based on it, that is, the principle of relaxation complementarity is defined as follows:

### Experiment

Compared with the previous two versions, igcv3-d’s $g_ 1$and $g_ 2$is 1 and 2, respectively.

Compared with other networks on Imagenet.

This paper makes an experiment on the use of relu, mainly focusing on the use of mobile netv2.

Compare the number of different groups.

### Conclusion

Based on igcv2, igcv3 integrates the main structure of mobile netv2, and uses a more aggressive low rank sparse packet convolution, which is very close to mobile netv2 in overall structure. The core of igcv3 is still sparse packet convolution and sorting operation. Although the performance is slightly higher than that of mobile netv2, the overall innovation is slightly insufficient.

# Conclustion

The core of IGC series network is the extreme application of packet convolution. The conventional convolution is decomposed into multiple packet convolutions, which can reduce a large number of parameters. In addition, the principle of complementarity and sorting operation can ensure the information flow between packets with the least number of parameters. But on the whole, although the amount of parameters and calculation is reduced after using IGC module, the network structure becomes more cumbersome, which may lead to slower speed in real use.