Five papers on ECCV 2020 image recognition

Time:2021-6-11

Today, I’d like to introduce five papers on ECCV 2020 oral. ECCV, CVPR and iccv are called the top three conferences in the field of computer vision. A total of 5025 articles were submitted to ECCV 2020, of which 1361 were accepted, with an acceptance rate of 27%.

ECCV 2020 accepted papers list address:

https://eccv2020.eu/accepted-papers/

Chapter 1: adaptive learning network width and input resolution

Paper title: mutualnet: adaptive convnet via mutual learning from network width and resolution

Download address:https://arxiv.org/abs/1909.12978

github:https://github.com/taoyang1122/MutualNet

Introduction: deep neural network has achieved great success in various sensing tasks. However, deep neural networks usually require a lot of computing resources, which makes them difficult to deploy on mobile devices and embedded systems. The contribution of this paper is as follows

  1. The importance of input resolution in designing efficient networks is emphasized. Previous studies either ignored it or separated it from the network structure. In contrast, this paper embeds the network width and input resolution into a unified mutual learning framework to learn a mutualnet, which can achieve adaptive tradeoff between accuracy and efficiency at runtime.

  2. A lot of experiments have been doneTo prove that the performance of mutualnet is obviously better than that of the network trained independently.

  3. We conducted a comprehensive ablation study to analyze the proposed mutual learning program. We further prove that our framework is expected to be used as a plug and play strategy to improve the performance of a single network, and its performance greatly exceeds that of popular performance improvement methods, such as data expansion, senet and knowledge distillation.

  4. The proposed architecture is a general training plan, independent of the model. It can be applied to any network without any structural adjustment. This makes it compatible with other state-of-the-art technologies. For example, neural structure search (NAS), automatic data enhancement technology.

This paper proposes a method to train the network under dynamic resource constraints (such as floating-point operations per second). This method proposes a mutual learning scheme of input resolution and network width, which realizes the constraint of adaptive precision efficiency trade-off at run time. In image classification, target detection and instance segmentation, this method can improve the performance of the network, The trade-off between accuracy and efficiency of lightweight network is significantly improved. The advantages of this method are also verified in the aspects of coco target detection, instance segmentation and transfer learning. Surprisingly, mutualnet’s training strategy can also improve the performance of a single network, which is much better than the powerful automatic data enhancement technology in terms of efficiency (GPU search hours: 15000 vs 0) and accuracy (Imagenet: 77.6% vs 78.6%).

The mutual learning scheme is also proved to be an effective training strategy to improve the performance of single network. The generality of the framework makes it well transformed into a general problem domain, and can also be extended to video input and 3D neural network, whose spatial and temporal information can be utilized.

The overall framework is as follows:

Part 2: spatial adaptive reasoning based on random feature sampling and interpolation

Title: spatially adaptive inference with stochastic feature sampling and interpolation

Download address:https://arxiv.org/pdf/2003.08866.pdf

GitHub: not found yet (welcome to add)

Introduction: in the feature mapping of convolutional neural network (CNN), there are a lot of spatial redundancy and repetitive processing. In order to reduce the redundant computation, this paper proposes a method to reduce the computational complexity of convolution network: only the features of sparse sampling locations are calculated, which are selected according to the activation response probability, and then the feature map is densely reconstructed by using an effective interpolation process.

Due to the inherent sparsity and spatial redundancy of feature mapping, we avoid expensive computation in the space where the interpolation can be effective.

Frame diagram:

Part 3: hybrid model of open set recognition

Title: hybrid models for open set recognition

Download address:https://arxiv.org/abs/2003.12506

GitHub: not found yet (welcome to add)

Introduction: open set recognition requires a classifier to detect samples that do not belong to any class in its training set. Existing methods fit a probability distribution on the embedding space of training samples, and detect outliers according to this distribution. However, this kind of classification only focuses on the known classes, which may have little effect on distinguishing the unknown classes. In this paper, the representation space should be learned by inner classifier and density estimator (as outlier detector). In this framework, an encoder encodes the input data into a joint embedding space, a classifier classifies the samples into an internal class, and a stream based density estimator detects whether the samples belong to an unknown class. A large number of experiments show that our method has reached the most advanced level. A common problem with flow based models is that they usually allocate more possibilities for samples outside the distribution. This problem is solved by learning a joint feature space and observing on different datasets. The research also shows that joint training is another key factor to improve the recognition performance of open sets.

Frame diagram:

Chapter 4: gradient concentration (GC): a new deep neural network optimization technique

Thesis title: gradient centralization: a new optimization technology for deep neural networks

Download address:https://arxiv.org/abs/2004.01461

github: https://github.com/Yonghongwei/Gradient-Centralization

Introduction: optimization technology is of great significance for effectively training deep neural network (DNN). Different from the existing optimization methods based on activation degree or weight, this paper proposes a new optimization technique, namely gradient centralization (GC), which directly operates the gradient by centralizing the gradient vector to the zero mean value. Comprehensive experiments show that GC can regularize the weight space and output feature space, which can be well applied to different tasks under different optimizers and network architectures, and improve the training efficiency and generalization performance.

Frame diagram:

Chapter 5: multi task learning enhances the robustness of confrontation

Title: multi task learning increases adverse robustness

Download address:https://arxiv.org/pdf/2007.07236.pdf

GitHub: not found yet (welcome to add)

Introduction: although deep networks have achieved high accuracy on a series of computer vision benchmarks, they are still vulnerable to attack by opponents, and some imperceptible interference data deceive the network. This paper provides a theoretical and empirical analysis to link the robustness of a model to the number of tasks it trains. Experiments on two datasets show that the difficulty of attack model increases with the number of target tasks. In addition, our results show that when the models are trained for multiple tasks at the same time, they will become more robust against attack in a single task. In other words, when the number of training tasks is low, the model is more vulnerable to the opponent’s attack. Our work is the first time to link this vulnerability with multi task learning, and suggests a new research direction to understand and reduce this vulnerability.

Welcome to panchuang AI blog:
http://panchuang.net/

Sklearn machine learning official Chinese document:
http://sklearn123.com/

Welcome to pancreato blog Resource Hub:
http://docs.panchuang.net/