Deep learning model for multi pedestrian attribute recognition based on attention

Time:2021-10-23

Thesis title: an attention based deep learning model for multiple pedestrian attributes recognition

Link:https://arxiv.org/abs/2004.01110

Author unit: Tsinghua University

Year: 2020

Official account: CVpython update synchronously

1. What problems does the paper mainly solve?

Pedestrian attribute prediction is a multi task learning problem. In order to share feature expression, traditional multi task learning methods usually learn features or linear combinations of feature subspaces. However, this combination eliminates the complex interdependence between channels. Moreover, spatial information exchange is rarely considered. This paper proposes a collaborative attention sharing (CAS) model to extract channels and spatial regions with judgment, so as to share features well in multi task learning.

Speaking to people: the previous multitasking methods are really weak. Many simply add features without considering the dependence of feature channel information and the interaction of spatial information?

2. How does the paper solve the problem?

In the pedestrian attribute classification method, the common network structure is shown in Figure 1:

Deep learning model for multi pedestrian attribute recognition based on attention

  • Hard sharing structure, but it may be prone to negative transfer, that is, when predicting a pedestrian attribute, it may be easily affected by other attributes.
  • Vanilla structure, which integrates two independent network structures, which are respectively responsible for predicting different attributes. Closely related attributes are divided into the same group, which is responsible by the same network. However, there is no interaction between the two networks, and some useful relevant information may not be used.
  • Soft sharing structure integrates the advantages of hard sharing and vanilla structure. Each layer uses a module to decide which features should be shared and which should not be shared.

In the previous multi task learning methods, such as Cross Stich module and sluice module, the feature interaction between different tasks is only through simple element addition operation, ignoring the channel information. Moreover, pedestrian attributes are usually related to different spatial locations. Therefore, the author proposes a collaborative attention sharing (CAS) model to extract judgmental channels and spatial locations in order to share features among networks.

The CAS model proposed by the author is shown in Figure 2:

Deep learning model for multi pedestrian attribute recognition based on attention

This kind of soft sharing structure is composed of two networks and their interaction modules. The upper and lower network structures are consistent, and the input feature $feat $gets $V through gap (global average pooling)_ G $, and then put $v_ G $”feed” the fully connected layer to get the intermediate vector $v_ m$。

  • Collaborative branch: the input of this branch is $v_ {sh} $, which consists of the intermediate vector $V between the upper network $a $and the lower network $B $_ M $results obtained through the full connection layer$ V_ {sh} $multiplies $element wise $by $feat $of this layer, and the results are recorded as $feat ^ a $and $feat ^ {B} $respectively. Then for $feat ^ A_ {sh} $and $feat ^ {B}_ {sh} $performs channel splicing to obtain $feat_ {cat}$。 Then $concat (AVG (feat {cat}), max (feat {cat})) $, convolute the result, and the result is recorded as $M $. Where $AVG $and $Max $are the average and maximum functions on the channel, respectively$ feat_ {cat} $is convoluted to get $feat_ {sym}$。 The output of the cooperative branch is $M $and $feat_ {sym} $. Of which $M $will be sent to the attention branch.
  • Attention branch: the input of this branch is $v_ A $, which consists of $v_ M $is obtained through the full connection layer. Then $V_ A $is multiplied by $element wise $by the output $M $of the cooperative branch, and the result is recorded as $a $.
  • Task specific branch: the input of this branch is $v_ T $, which is also composed of $v_ M $is obtained through the full connection layer. Then $V_ A $is multiplied by $element wise $of $feat $of this layer, and the result is recorded as $feat_ t$。
  • Branch aggregation: $feat $, $feat_ {sym} $and $feat_ T $adds $element eise $and multiplies $element wise $with $a $. The results will be “fed” into the next layer of network.

3. What are the experimental results?

  • The result is better than the traditional shared unit method and achieves better results compared with SOTA.

Deep learning model for multi pedestrian attribute recognition based on attention

4. What is the guiding significance for us?

  • In multi task learning, soft sharing structure is better than hard sharing structure and vanilla structure.
  • Spatial information is still very important for pedestrian attribute recognition. The addition operation of feature $element wize $may not be used to extract spatial area information, but concat operation should still be useful.

This paper is based on the operation tool platform of blog group sending one article and multiple sendingOpenWriterelease

Recommended Today

Swift advanced (XV) extension

The extension in swift is somewhat similar to the category in OC Extension can beenumeration、structural morphology、class、agreementAdd new features□ you can add methods, calculation attributes, subscripts, (convenient) initializers, nested types, protocols, etc What extensions can’t do:□ original functions cannot be overwritten□ you cannot add storage attributes or add attribute observers to existing attributes□ cannot add parent […]