Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

Time:2021-10-19

Author | ass
Edit | CV
Report | I love computer vision (wechat ID: aicvml)

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

Unsupervised Person Re-identification via Softened Similarity Learning:

Unsupervised pedestrian re recognition based on softening similarity learning

  • Paper link: https://arxiv.org/abs/2004.03547
  • Code link: https://github.com/ryanaleksa… (Unofficial)
  • First author: Yutian Lin (now an associate researcher at Wuhan University)
  • Co authors: Hangzhou University of Electronic Science and Technology (first author), Huawei Technology Co., Ltd., baidu Research Institute, reler Laboratory of Sydney University of science and technology

01 highlights

  • The image data is completely unlabeled;
  • Give up clustering method and use soft label to solve hard quantization loss;
  • The application of image slice information and cross camera identification information in unsupervised field;
  • SOTA is implemented in the field of pedestrian re recognition using unsupervised method.

The main highlights are as follows:

1. Abandon the clustering method and adopt softened classification

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

Disadvantages of clustering: the image is roughly divided into clusters for training based on clustering method, which makes the model highly dependent on the clustering results. As shown in Fig. 1 (b), the image of the same person can be divided into different clusters, which are further trained using incorrectly assigned pseudo tags. Since the error of unsupervised clustering is inevitable, learning with hard quantization loss may tend to fit the noise labels generated by clustering.

Softened label classification: with the clustering method, the image belongs to an accurate category of the originalone-hotDifferent from tags, this paper mines the relationship between unlabeled images as mild constraints. The author will talk aboutThe first k images with high target similarity are assigned soft labels, treat labels as a distribution and encourage images to be associated with several related categories. In the figure below, purple is the target, and yellow is the K dependent images close to the target.

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

2. Some auxiliary information is introduced to help find similar images

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

The constraint of soft label is relatively weak, but compared with hard classification, it also provides more space for the algorithm. thereforeWhen measuring the similarity between imagesThe global and partial features of each pedestrian image and camera identification information will also be taken into account.

02 proposed method

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

The frame can be divided into three sub components (displayed in three colored rectangles):

  1. The baseline classification network is used to classify each image into different categories and generate feature representation;
  2. Explore the similarity between unlabeled images based on feature embedding and auxiliary information, and select k reliable images for each training data;
  3. Soften the target label distribution according to the generated K reliable images, and fine tune the network with the softened labels to make the selected K reliable images closer and exclude other images.

Next, I will introduce the specific implementation steps of each component.

1、 Baseline: initialization with hard labels


The red box and red arrow in the overall model frame diagram belong to this baseline step

Purpose:

Maximize image featuresWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningAnd lookup tablesWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningAt the same time, minimize each image featureWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningAnd corresponding centroid featuresWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningCosine distance between. The initial baseline network recognizes each unmarked image by learning and obtains the initial discrimination ability.

Steps:

1. Label initialization: because we don’t have a ground truth label for each pedestrian, for each pedestrianWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning, its label is defined according to its index, and each pedestrian is also considered as an independent class.

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

2. Nonparametric classifier:

Classification model of nonparametric classifier: my understanding is to directly use the standardized image features for classification without going through other layers, which is called nonparametric classifier.

Where the author uses a lookup tableWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningTo store the features of all training images, and take the features of each image as the weight vector of each category. Finally, softmax is used to realize multi classification.

① Data preprocessing: through standardizationWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningTo achieveWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

② Classification: the possibility that an image x belongs to i-th class is defined by softmax

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

amongWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningIt represents the ith row of lookup table V and stores the weight parameters (i.e. image features) of this class.Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningIs a temperature parameter, that is, it indicates the softening degree of the probability distribution on different categories (i.e. the hardening degree of the label).

③ Loss and optimizer

Loss: cross entropy loss

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

Where t (YJ) is the conditional empirical distribution on the category label. For the ground truth class, we set the probability of distribution to 1 and 0 for all other classes.

2、 Model learning with soft simulation


Green and blue in the overall model frame diagram belong to this step

Purpose:

Not only minimize the cosine distance between each image feature and the ground truth feature in the lookup table, but also minimize the distance between each image feature and its reliable image. At the same time, the cosine distance between each image feature and other categories of features is maximized.

Forcing the same person’s characteristics to belong to different categories will have a negative impact on the network. Therefore, the author proposes a method to assign a similar representation to the image estimated as the same pedestrian, that is, the soft label method.

Steps:

1. Similarity calculation: for two imagesWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningandWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning, we define the distance between two images as the difference between two images. (refer to the next section for image distance calculation)

2. Define label: forWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningGenerally speaking, the K images closest to it are called dependent images. And define these images asWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning, their labels are set toWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningCalled yes andWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningThe same people, andWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningIs a dependent class. Not the same class.

3. Redefine the target label: we propose a softened classification network, which learns the similarity between identities in a smoother way (non hard label), rather than training K reliable images as the same class. In the training process, we hope that the network can not only predict each image into the ground truth class, but also predict the training image into the reliable class. Therefore, we reassign a non-zero value to the reliable class in the target tag. dataWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningThe target label distribution of is written as:

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

among λ Is a super parameter that balances the relationship between the ground truth class and the reliable class. When λ When it is 1, the baseline network is simplified to a function with only 0 and 1 labels, that is, the model learning recognizes the ground truth label of each image, but can not learn the similarity and consistency between the images of the same person. On the other hand, when λ If it is too small, the model may not be able to predict the ground truth tag.

4. Loss: cross entropy loss

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

summary

Images are labeled with soft label distribution (representing probability) rather than one hot label. The tag is no longer the ground truth class, but the probability of K possible reliable classes. By considering the reliable class, the reliability of the ground truth class is reduced and the reliability of the reliable class is increased, so as to guide the network to smoothly learn the similarity between pedestrian images.

3、 Similarity estimation with auxiliary information


In order to achieve better results, the author also adds other methods to help estimate similarity.

Part similarity exploration

After extracting the CNN feature map, the author divides it horizontally into P parts. Each partition feature is averaged and pooled into a partial feature representation. We take the average distance of the corresponding parts of the two images as the partial distance between the two images

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

amongWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningIs the feature embedding function of part I of two images.

The cross-camera encouragement(CCE)

objective

Using the CCE item, the difference between images with the same camera identification increases. Therefore, CCE helps to include more reliable images under different cameras and reduce some negative images under the same camera.

The implementation effect of pedestrian re recognition will be affected by different camera attributes. Images taken by the same camera “naturally” have some similarities. Therefore, a cross camera encouragement term (CCE) is also proposed in this paper, which is trained to promote the images taken under different cameras to be regarded as reliable images.

After training, first, by learning cross camera information, the network can predict a person’s similar features in different camera views, which is conducive to re recognition tasks. Second, many different pedestrians wearing similar clothes appear in the same lens. CCE can help find the ground truth across cameras instead of these negative samples.

As shown in the figure below, in the absence of CCE, although the query image and the image captured by CAM3 belong to the same person, they are very different due to the camera gap. Even a negative sample (red example) because they come from the same camera. Therefore, the query distance is also small,

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

The author represents the camera identification of the training sample asWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning。 In addition, two imagesWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningandWithout clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning  The CCE formula between is:

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learningIt is a parameter to control the influence of CCE.

Overall dissimilarity

After adding the above CCE and image slice similarity, the overall distance is defined as:

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

Summary:

Among them λ It balances the contribution of overall and partial similarity. As shown in the green part of the overall framework, the differences between the two images include global distance, local distance and cross camera incentives. By calculating the global and local distances, the similarity between global appearance and local details is measured to ensure the accuracy of reliable image selection.

By adding CCE items, images from different cameras are often selected as reliable images, which enables the network to learn from different images. Both are beneficial to the resolution of the training model.

03Experiments


Comparison with the State-of-the-Arts

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

Compared with all unsupervised methods, the author’s method achieves SOTA in two image data sets market-1501 and dukemtmc Reid.

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

Compared with all unsupervised methods, the author’s method achieves SOTA in two video data sets Mars and dukemtmc videoreid.

Diagnostic Studies

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

The author experimented with hyperparameters in market-1501 λ、 The number of reliable images K and other parameters.

Without clustering, Hangzhou Electric scholars proposed unsupervised pedestrian re recognition based on softening similarity learning

Finally, ablation experiments on image slice information and CCE information are done in market-1501 and dukemtmc data sets, which proves the necessity of both.