Open Access
27 July 2018 Discriminative deep transfer metric learning for cross-scenario person re-identification
Tongguang Ni, Xiaoqing Gu, Hongyuan Wang, Zhongbao Zhang, Shoubing Chen, Cui Jin
Author Affiliations +
Abstract
A discriminative deep transfer metric learning method called DDTML is proposed for cross-scenario person re-identification (Re-ID). To develop the Re-ID model in a new scenario, a large number of pairwise cross-camera-view person images are deemed necessary. However, this work is very expensive due to both monetary cost and labeling time. In order to solve this problem, a DDTML for cross-scenario Re-ID is proposed using the transferring data in other scenarios to help build a Re-ID model in a new scenario. Specifically, to measure distribution difference across scenarios, a maximum mean discrepancy based on class distribution called MMDCD is proposed by embedding the discriminative information of data into the concept of the maximum mean discrepancy. Unlike most metric learning methods, which usually learn a linear distance to project data into the feature space, DDTML uses a deep neural network to develop the multilayers nonlinear transformations for learning the nonlinear distance metric, while DDTML transfers discriminative information from the source domain to the target domain. By bedding the MMDCD criteria, DDTML minimizes the distribution divergence between the source domain and the target domain. Experimental results on widely used Re-ID datasets show the effectiveness of the proposed classifiers.

1.

Introduction

For the last decade, surveillance systems have become an active research topic of computer vision, since they have become ubiquitous in public places such as airports, railway stations, college campuses, and office buildings.12 There are a large number of cameras in surveillance systems and they provide huge amounts of video data. The analysis of the computer vision abstained in a surveillance system often requires the ability to track people across multiple cameras. Therefore, person re-identification (Re-ID) model is generating more and more interests.38 Re-ID has been widely treated as a recognition problem of matching different persons across disjoint cameras.911 In the past five years, a large number of models have been proposed for Re-ID models. The current work can be categorized generally into two types: (1) designing of discriminative, descriptive, and robust visual descriptors to characterize a person’s appearance1215 and (2) learning suitable distance metrics that maximize the chance of a correct correspondence.1620 In this paper, we focus on the second type, i.e., we learn the optimal distance measure to give correct matches in Re-ID.

However, it is not easy to develop a deployable and efficient Re-ID model in a new scenario (e.g., from an indoor classroom to an outdoor square). First, due to different illumination environments, posture, and view angle, the robust features obtained in one scenario will not have good performance for another scenario. Second, in order to obtain a robust Re-ID model, one must collect a large number of labeled person images about the new scenario for training. However, the work is very expensive due to both monetary cost and labeling time. Some unsupervised methods are proposed to address this problem. For example, Ma et al.21 introduced a time shift dynamic time warping model for unsupervised person representation. Ye et al.22 proposed a dynamic graph matching method to mine the intermediate estimated labels across disjoint cameras, and then with the estimated labels, its remaining steps can be considered as a supervised learning method. However, compared to supervised Re-ID methods, the matching performance of unsupervised methods is less effective when a person recognizable is under severe appearance changes.23

Recently, transfer learning mechanism has been widely used in Re-ID. The principal goal of transfer learning is to help build a Re-ID model in a new scenario (target domain) by leveraging the data collected from the other scenarios (source domain).24 For example, in a crowded station, there may exist of a large number of data used for building some Re-ID models for their own respective scopes. In order to build a Re-ID model for a new scenario, we may use these existed data in the source domain without collecting a lot of labeled data in the target domain. In Ref. 25, it is demonstrated that certain discriminative information or common variations (such as pose and resolution) shared in different scenarios can lead to significant performance gains in a new scenario. Different from original multitask learning which aims to benefit all tasks both on the target domain and source domain, transfer learning for Re-ID mainly aims to benefit the target one.

In this work, we first propose a maximum mean discrepancy based on class distribution called MMDCD to measure distribution difference across domains. MMDCD embeds the discriminative information of data taken from the source domain into the concept of the maximum mean discrepancy (MMD).26 Minimizing MMDCD leads to minimize the distribution difference across domains in a supervised way. Then we propose a discriminative deep transfer metric learning method called DDTML for cross-scenario transfer Re-ID. Figure 1 shows the basic idea of the proposed method. Using a deep neural network, DTDML learns a set of multilayers nonlinear transformations to transfer discriminative information from the source domain to the target domain; meanwhile, DTDML reduces the distribution divergence between the source data and the target data by minimizing MMDCD at the top layer of the network.

Fig. 1

Framework of the proposed method DDTML.

JEI_27_4_043026_f001.png

The contribution of this work can be summarized in the following three aspects.

  • (1) Unlike MMD working in an unsupervised way, MMDCD works in a supervised way, which not only exploits the discriminative information of data taken from the source domain, but also sets different coefficients for matched/mismatched pairs. Minimizing MMDCD could enhance the discriminant ability of DTDML.

  • (2) By embedding MMDCD into a deep metric network, DDTML learns a set of multilayers nonlinear transformations to better exploit the discriminative information for cross-scenario Re-ID tasks.

  • (3) Extensive experiments on several Re-ID datasets are conducted and the experimental results demonstrate that the proposed method DDTML obtains better performance compared with several state-of-the-art methods.

2.

Related Work

According to the process of Re-ID, existing works can be generally divided into two categories, namely, seeking robust features methods and seeking the optimal distance learning methods. The goal of seeking robust features methods is to increase their representative capability. For example, Ma et al.27 proposed a BiCov descriptor based on Gabor filters and the covariance descriptor to track persons. Kviatkovsky et al.28 constructed an invariant intradistribution structure of color to adopt with a wide range of imaging conditions. Yang et al.29 developed a robust semantic salient color names-based color descriptor to calculate photometric variance. However, descriptors of visual appearance are so highly susceptible to cross-view variations and heavily rely on foreground segmentations that it is difficult for them to achieve a balance between discriminative power and robustness.

As the popular similarity distance learning methods, the goal of metric learning methods is to find a distance or similarity function of extracted features from different persons’ images to make the most likely correct matching. For example, Pedagadi et al.30 applied a two-stage method, local Fisher discriminant analysis (LFDA), in a low-manifold learning framework using principal component analysis (PCA) and the LFDA. Kostinger et al.16 proposed a metric learning principle of keeping it simple and straightforward (KISSME) to learn a distance metric from equivalence constraints based on a statistical inference perspective. Hu et al.31 exploited the discriminative information to propose a discriminative deep metric learning (DDML), which is a major reference of this paper.

Note that cross-scenarios transfer learning has been adopted for Re-ID methods in the hope that the target domain (new scenario) can exploit transferable discriminative information from the source domain (other scenarios) with labeled images. For example, Wang et al.25 proposed the constrained asymmetric multitask discriminative component analysis (cAMT-DCA) method to explore discriminative modeling in the shared latent space for cross-scenarios transfer learning. Cheng et al.32 proposed a transfer metric learning method OurTransD to learn both the commonalities and the personality of the data from different scenarios jointly. Zhang et al.33 proposed a two-stage transfer metric learning (TSTML) method, which transfers the generic knowledge information from the source set in the first stage and then transfers the distance metric for each probe-specific person in the second stage. In terms of similarity function, optimization method, whether a transfer learning and deep learning method, Table 1 summarizes seven Re-ID methods, i.e., LFDA, KISSME, DDML, TSTML, cAMT-DCA, OurTransD, and DDTML, which is proposed in this study. Different from the other three transfer learning methods, our proposed DDTML uses a deep learning network to learn a set of multilayer nonlinear projections for the cross-scenario transfer learning. In particular, an MMDCD is proposed to measure distribution difference across domains.

Table 1

LFDA, KISSME, DDML, TSTML, cAMT-DCA, and OurTransD versus DDTML.

MethodsSimilarity functionOptimization methodTransfer learningDeep learning
Scatter (divergence) matrixMahalanobis distanceNonlinear projection distanceEigenvalue decompositionGradient decentSemidefinite programming
LFDA30
KISSME31
DDML31
TSTML33
cAMT-DCA25
OurTransD32
DDTML (proposed in this study)

3.

Proposed Methods

3.1.

Discriminative Deep Metric Learning

DDML method is originally proposed for face verification in the wild. DDML uses a deep neural network to learn the nonlinear mapping function of samples for projecting face samples into the feature space.

Assume DDML constructs a deep neural network with M+1 layers, p(m) is the units in the m’th layer, where m=1,2,,M. For a given person image sample xRd, h(0)=x is the original input of the network and h(1)=φ[W(1)x+b(1)]Rp(1) is the output of the first layer, where W(1) and b(1) are the projection matrix and bias vector in the first layer, respectively. φ() is a nonlinear activation function, which operates component wisely, such as widely used tanh or sigmoid functions. Then using h(1) as the input of the second layer, we can obtain the output of this layer h(2), i.e., h(2)=φ[W(2)h(1)+b(2)]Rp(2). In this case, we can obtain the output of topmost layer f(x)

Eq. (1)

f(x)=h(M)=φ[W(M)h(M1)+b(M)]Rp(M),
where f:RdRp(M) is a parametric nonlinear function determined by the parameters W(m) and b(m) (m=1,2,,M).

For two person images xi and xj, they will be finally represented as f(xi)=hi(M) and f(xj)=hj(M) at the topmost layer of the network. Then using the squared Euclidean distance, the distance between xi and xj at the top level can be measured as

Eq. (2)

df2(xi,xj)=f(xi)f(xj)22.

The optimization problem of DDML is designed as follows:

Eq. (3)

argminfJ=12i,jg{1lij[τdf2(xi,xj)]}+λ2m=1M[W(m)F2+b(m)22],
where the function g(z)=1βlog[1+exp(βz)] is the smoothed approximation for [z]+=max(z,0), β is a sharpness parameter, AF is the Frobenius norm, λ is a regularization parameter, and τ is a threshold. The pairwise label lij denotes the similarity of the pairs {xi,xj}: lij=1 means xi and xj are matched image pairs, lij=1 means xi and xj are mismatched image pairs. lij can be determined as follows:

Eq. (4)

lij={1,df2(xi,xj)<τ11,df2(xi,xj)>τ+1.

From the optimization problem shown in Eq. (3), it can be seen that without enough training data in a new scenario, we cannot directly use data collected from different scenarios to help build the Re-ID model in this new scenario. This is the key problem we aim to solve in this work.

3.2.

Discriminative Deep Transfer Metric Learning method

Based on the projection scheme for deep neural network, we learn a set of multilayers nonlinear transformations to project the data in the source domain and target domain into the same transformed space. Therefore, it is needed to measure the distribution difference between the source domain and target domain in this transformed space. As a well-known criterion to estimate the distance between different distributions, MMD) is a nonparametric estimation criterion and it does not need an intermediate density estimate.26 Let Xs={(xsi,ysi)|i=1,2,,Ns} and Xt={(xti,yti)|i=1,2,,Nt} be the training set in the source domain and target domain, respectively, where both xsi and xti are the samples of dimensionality d, ysi and yti are the labels of xsi and xti, respectively, Ns and Nt are the numbers of training data in the source domain and target domain, respectively. The distance between distributions of two domains is equivalent to the distance between the mean of total-class data across domains, which can be written as follows:26

Eq. (5)

Dts(Xt,Xs)=1Nti=1Ntf(xti)1Nsi=1Nsf(xsi)22.

However, MMD measures the distribution difference between two domains in an unsupervised way. That is to say, MMD ignores the label information of samples. In addition, for a practical transfer Re-ID task, there often exist imbalance between matched (positive) image pairs and mismatched (negative) pairs. In order to carry out effective transfer learning, we propose an MMDCD. MMDCD embeds the discriminative information of data taken from the source domain into the concept of the MMD by the following equation:

Eq. (6)

MMDCDts(Xt,Xs)=1Nti=1Ntf(xti)Ns+Ns+2+Ns2i=1Ns+f(xsi+)NsNs+2+Ns2i=1Nsf(xsi)22,
where xsi+ and xsi are the matched and mismatched image samples in the source domain, respectively. Ns+ and Ns(Ns++Ns=Ns) are the numbers of matched and mismatched image samples in the source domain, respectively. Following the deep network learning strategy in Ref. 29, the nonlinear representation f(x) can be computed using Eq. (1) at the topmost layer of the network. Obviously, in order to measure the distance between the mean of the data across domains, MMDCD not only utilizes the label information of data taken from the source domain, but also sets the different coefficients to represent the weight of matched and mismatched pairs according to their different sizes.

As shown in Fig. 1, DDTML constructs a deep neural network to obtain the representations of data in the source domain and target domain through a multiple layers of nonlinear transformations. Considering minimizing MMDCD at the top layer of the network, the optimization problem of DDTML can be given as follows:

Eq. (7)

argminfJ=12i,jg{1lij[τdf2(xi,xj)]}+αMMDCDts(M)(Xt,Xs)+βm=1M[W(m)F2+b(m)22],
where MMDCDts(M)(Xt,Xs) is the MMDCD at the M’th layer of deep neural network. α(α0) and β(β0) are the regularization parameters.

To solve the optimization problem in Eq. (7), we use the stochastic subgradient descent scheme to obtain the parameters W(m) and b(m), where m=1,,M. The gradient of the objective function J with respect to the parameters W(m) and b(m) can be computed as follows:

Eq. (8)

LW(m)=ij[Δij(m)hi(m1)T+Δji(m)hj(m1)T]+2α[1Nti=1NtΔti(m)hti(m1)T+Ns+Ns+2+Ns2i=1Ns+Δsi+(m)hsi+(m1)T+NsNs+2+Ns2i=1NsΔsi(m)hsi(m1)T]+2βW(m),

Eq. (9)

Lb(m)=ij[Δij(m)+Δji(m)]+2α[1Nti=1NtΔti(m)+1Ns+i=1Ns+Δsi+(m)+1Ns+i=1Ns+Δsi(m)]+2βb(m),
where hi(0)=xi and hj(0)=xj, hi(0) and hj(0) are the original inputs.

For the M’th layer of our network, we can obtain the following updating equations:

Eq. (10)

Δij(M)=g(c)lij[hi(M)hj(M)]Θφ[zi(M)],

Eq. (11)

Δji(M)=g(c)lij[hj(M)hi(M)]Θφ[zj(M)],

Eq. (12)

Δti(M)=[1Ntj=1Nthtj(M)Ns+Ns+2+Ns2j=1Ns+hsj+(M)NsNs+2+Ns2j=1Nshsj(M)]Θφ[zti(M)],

Eq. (13)

Δsi+(M)=[Ns+Ns+2+Ns2j=1Ns+hsj+(M)+NsNs+2+Ns2j=1Nshsj(M)1Ntj=1Nthtj(M)]Θφ[zsi+(M)],

Eq. (14)

Δsi(M)=[Ns+Ns+2+Ns2j=1Ns+hsj+(M)+NsNs+2+Ns2j=1Nshsj(M)1Ntj=1Nthtj(M)]Θφ[zsi(M)].

For the other layers m=1,2,,M1 of our network, we can obtain the following updating equations:

Eq. (15)

Δij(m)=W(m+1)TΔij(m+1)Θφ[zi(m)],

Eq. (16)

Δji(m)=W(m+1)TΔji(m+1)Θφ[zi(m)],

Eq. (17)

Δti(m)=W(m+1)TΔij(m+1)Θφ[zti(m)],

Eq. (18)

Δsi+(m)=W(m+1)TΔsi(m+1)Θφ[zsi(m)],

Eq. (19)

Δsi(m)=W(m+1)TΔsi(m+1)Θφ[zsi(m)],
where Θ denotes the element-wise multiplication. c and zi(m) (m=1,2,,M) are given as follows:

Eq. (20)

c=1lij[τdf2(xi,xj)],

Eq. (21)

zi(m)=W(m)hi(m1)+b(m).

Then W(m) and b(m) can be updated using the gradient descent algorithm until convergence as follows:

Eq. (22)

W(m)=W(m)λLW(m),

Eq. (23)

b(m)=b(m)λLb(m),
where λ is the learning rate.

Based on the analysis above, we summarize the entire construction procedure of DDTML in Algorithm 1.

Algorithm 1

DDTML

Input: Training set: Source domain data Ds and target domain data Dt;
Parameters: α, β, τ, M, learning rate λ, convergence error ϵ, and total iterative number T.
Output: Weights and biases {W(m),b(m)}m=1M
Initialize: Initialize weights and biases
Optimization by back prorogation
fork=1,2,,T do
 Compute MMDCD by Eq. (6)
 Randomly select a sample pair
 //Forward propagation
 Compute hi(m) and hj(m), where h(m)=φ[W(m)h(m1)+b(m)], m=1,2,,M
 // Computing gradient
 Compute gradients by Eqs. (8) and (9)
 //Back propagation
 Update W(m) and b(m) by Eqs. (22) and (23), m=1,2,,M
 Compute Jk by Eq. (7)
 If k>1 and |JkJk1|<ϵ, Return
end
Return{W(m),b(m)}m=1M

4.

Experiments

4.1.

Datasets and Experimental Setting

In our experiments, four Re-ID datasets are adopted: 3DPeS,34 i-LIDS,35 CAVIAR,19 and VIPeR.36 The 3DPeS dataset is a collection of 1011 person images of 192 individuals from eight different surveillance cameras captured on an academic campus. The i-LIDS dataset is a collection of 119 person images captured in airport. Each person is with an average of four images. Therefore, i-LIDS consists of 476 images in total. The CAVIAR dataset is a collection of 1220 person images from 72 individuals with 10 to 20 images per person. The VIPeR dataset is a collection of 632 person images by two different camera views, so it consists of 1264 images. In order to construct the transfer learning Re-ID model, we choose one dataset as the target dataset and another dataset as the source dataset from the other three datasets following the same settings of.25 So there are in total 12 cross-scenario transfer learning tasks.

In our experiments, all person images from the above four datasets are scaled to 128×48 for feature extraction. Following the same settings of Ref. 25, three kinds of features descriptor: color, LBP, and HOG are generated for each image. After extracting the feature vector, we use PCA to compress them into 500-dimensional feature vectors.

For comparison purposes, six state-of-the-art Re-ID methods are applied to compare against our proposed DDTML. The comparison methods can be grouped into two groups: (1) nontransfer learning methods: LFDA,30 KISSME,31 and DDML31 and (2) transfer learning methods: geometry preserving large margin nearest neighbor (GPLMNN),37 OurTransD,32 and cAMT-DCA.25 Furthermore, in order to better observe the behavior of MMDCD, we develop another transfer learning Re-ID method called DDTML-MMD through replacing MMDCD in DDTML with MMD criterion. We train a deep network with three layers for DDTML, and its neural nodes are given as: 200200100 for all datasets. Based on our extensive experiments, the tanh function is used in φ() function, and the parameters α, β, τ, and λ are set to be 101, 10, 3, and 0.3, respectively.

In our experiments, we randomly split the target dataset into two equal partitions; one partition is used as target training set and the other partition is used as target testing set. For five transfer learning methods, all person images in the source dataset and target training set are used for training, and all images in the target testing set are used for testing. For three nontransfer learning methods, all images in source dataset are used for training. In particular, in order to observe the performance change of nontransfer learning methods on transfer datasets, LFDA and KISSME are trained in three cases. LFDA-S and KISSME-S only use the source dataset for training; LFDA-T and KISSME-T only use the target dataset for training, whereas LFDA-Mix and KISSME-Mix use both the source and target training datasets for training.

Following Ref. 38, the performance of each method is evaluated in terms of the cumulative matching characteristic (CMC) in our experiments. The CMC represents the probability of finding the correct match over the top r image ranking, with r varying from 1 to 20. The CMC described above is usually used to measure the performance of closed-set Re-ID problem. It assumes the same person can be found both in the probe set and gallery set. But in many real-world scenarios, this assumption is often not satisfied, e.g., the scenarios with imposters. In order to simulate these open-set scenarios, only images of 40% of the gallery people are randomly removed. The receiving operating characteristic (ROC) curve on i-LIDS as target dataset is used as the evaluation metric to compare DDTML with other algorithms. In order to make our results fair, we repeat the aforementioned partition 10 times for each dataset, and both the CMC and ROC curves for 10 runs are recorded.

4.2.

Results and Analysis

In this section, we examine the effectiveness of the proposed method DDTML by comparing their performance with LFDA (LFDA-S, LFDA-T, and LFDA-Mix), KISSME (KISSME-S, KISSME-T, and KISSME-Mix), DDML, GPLMNN, DDTML-MMD, OurTransD, and cAMT-DCA on 12 cross-scenario transfer Re-ID datasets. The experimental results of CMC are shown in Tables 2Table 3Table 45, respectively. Best results are in boldface font. The ROC curves of eight methods (LFDA-Mix, KISSME-Mix, DDML, GPLMNN, DDTML-MMD, OurTransD, cAMT-DCA, and DDTML) on the i-LIDS dataset as target dataset are shown in Fig. 2. Because the performance of both LFDA-S and LFDA-T is weaker than LFDA-Mix and the performance of both KISSME-S and KISSME-T is weaker than KISSME-Mix, the ROC curves of these four methods are not demonstrated in Fig. 2.

Table 2

Matching rate (%) on the VIPeR dataset as target dataset.

MethodsSourcer=1r=5r=10r=20
LFDA-Si-LIDS8.3121.3632.1545.02
CAVIAR8.4120.9830.2142.15
3DPeS8.5920.9629.3746.82
LFDA-Ti-LIDS18.9545.6955.7870.51
CAVIAR18.9545.6955.7870.51
3DPeS18.9545.6955.7870.51
LFDA-Mixi-LIDS16.8843.3359.0964.25
CAVIAR16.2839.6654.2264.91
3DPeS16.08422.6357.8564.87
KISSME-Si-LIDS8.2121.9631.8543.87
CAVIAR9.3719.0628.3643.75
3DPeS9.7719.0528.3140.25
KISSME-Ti-LIDS19.3847.3655.9970.58
CAVIAR19.3847.3655.9970.58
3DPeS19.3847.3655.9970.58
KISSME-Mixi-LIDS15.1135.1749.6263.97
CAVIAR8.9619.2529.3639.68
3DPeS12.3530.5844.1257.11
DDMLi-LIDS16.3641.0852.6565.34
CAVIAR18.1840.3552.0263.55
3DPeS18.9946.6352.3265.56
GPLMNNi-LIDS20.9847.9865.9672.33
CAVIAR21.3548.1665.2072.21
3DPeS21.2248.0265.3572.98
DDTML-MMDi-LIDS21.2347.3864.6772.11
CAVIAR21.5748.3365.3372.19
3DPeS21.8648.4665.7072.56
cAMT-DCAi-LIDS22.0947.7765.5373.11
CAVIAR21.5748.8865.6773.26
3DPeS21.2948.9165.5675.74
OurTransDi-LIDS22.4547.9865.5973.11
CAVIAR22.0947.8565.8773.23
3DPeS21.1147.9665.7773.24
DDTMLi-LIDS25.1153.2667.2279.34
CAVIAR25.1853.4466.3179.65
3DPeS24.6353.7166.5979.48

Table 3

Matching rate (%) on the i-LIDS dataset as target dataset.

MethodsSourcer=1r=5r=10r=20
LFDA-SVIPeR28.3150.2261.1175.85
CAVIAR28.0748.2161.8575.29
3DPeS30.9252.1365.4975.21
LFDA-TVIPeR29.1049.6663.9875.22
CAVIAR29.1049.6663.9875.22
3DPeS29.1049.6663.9875.22
LFDA-MixVIPeR30.9852.0162.3577.32
CAVIAR30.0150.9962.5277.96
3DPeS30.2548.9362.3178.26
KISSME-SVIPeR31.2150.3562.8775.96
CAVIAR26.3948.2464.4775.25
3DPeS29.3653.1867.4575.21
KISSME-TVIPeR19.2539.8652.3165.23
CAVIAR19.2539.8652.3165.23
3DPeS19.2539.8652.3165.23
KISSME-MixVIPeR34.8953.2966.9976.52
CAVIAR25.3144.2858.0477.19
3DPeS27.0343.6956.1977.11
DDMLVIPeR28.7850.7959.3377.15
CAVIAR29.4546.5861.8776.66
3DPeS29.8547.9361.1577.43
GPLMNNVIPeR32.9855.9167.1178.69
CAVIAR33.1555.3967.1979.25
3DPeS33.4756.8768.2478.65
DDTML-MMDVIPeR32.8055.7666.9778.45
CAVIAR33.0955.2366.6578.67
3DPeS33.3655.6966.1978.39
cAMT-DCAVIPeR32.5653.5266.7878.99
CAVIAR33.1754.9167.2177.87
3DPeS33.6855.9867.3778.69
OurTransDVIPeR32.7154.7767.6778.58
CAVIAR33.2555.3367.4978.76
3DPeS33.6755.8767.1479.25
DDTMLVIPeR36.2261.0369.4981.02
CAVIAR33.9056.8770.5881.15
3DPeS34.5157.8969.2781.23

Table 4

Matching rate (%) on the CAVIAR as target dataset.

MethodsSourcer=1r=5r=10r=20
LFDA-SVIPeR27.0347.9360.2980.22
3DPeS26.9646.5861.5381.00
i-LIDS25.3346.6261.1180.44
LFDA-TVIPeR24.9543.9758.3478.88
3DPeS24.9543.9758.3478.88
i-LIDS24.9543.9758.3478.88
LFDA-MixVIPeR30.9150.1661.2583.33
3DPeS30.1151.1766.5583.20
i-LIDS32.1952.7566.5884.93
KISSME-SVIPeR18.3250.5562.2277.55
3DPeS18.5652.0162.2979.29
i-LIDS18.2651.2562.0379.58
KISSME-TVIPeR29.3835.0251.2676.53
3DPeS29.3835.0251.2676.53
i-LIDS29.3835.0251.2676.53
KISSME-MixVIPeR29.3350.2266.8583.26
3DPeS28.5551.4467.2383.75
i-LIDS29.8652.1568.3285.54
DDMLVIPeR30.1950.4965.2383.21
3DPeS28.5951.6865.9682.74
i-LIDS29.1950.2963.8184.77
GPLMNNVIPeR33.2955.2269.2286.51
3DPeS34.5254.8969.8787.55
i-LIDS33.8854.6870.4187.96
DDTML-MMDVIPeR33.9955.1870.3486.12
3DPeS34.8754.9870.7887.13
i-LIDS34.6954.7770.8687.59
cAMT-DCAVIPeR34.1255.7769.6987.21
3DPeS34.2555.2970.1387.02
i-LIDS34.1154.6570.6787.39
OurTransDVIPeR34.6755.7970.3589.93
3DPeS34.3855.8770.1187.55
i-LIDS34.5556.0870.5887.75
DDTMLVIPeR35.1959.5572.3689.57
3DPeS36.3058.2674.1189.29
i-LIDS35.2258.3672.0590.15

Table 5

Matching rate (%) on the 3DPeS dataset as target dataset.

MethodsSourcer=1r=5r=10r=20
LFDA-SVIPeR29.3949.3262.2273.25
i-LIDS30.9951.1162.3474.18
CAVIAR29.3351.2862.1774.81
LFDA-TVIPeR26.5547.2960.2871.98
i-LIDS26.5547.2960.2871.98
CAVIAR26.5547.2960.2871.98
LFDA-MixVIPeR26.3948.3257.2168.32
i-LIDS26.7848.7759.8670.21
CAVIAR23.4142.5953.2665.98
KISSME-SVIPeR28.2644.9654.0866.49
i-LIDS27.2143.3255.5766.41
CAVIAR25.6943.9953.0066.36
KISSME-TVIPeR12.5629.6343.7557.93
i-LIDS12.5629.6343.7557.93
CAVIAR12.5629.6343.7557.93
KISSME-MixVIPeR28.3649.5759.2170.58
i-LIDS25.6846.3557.6870.55
CAVIAR22.1438.7750.8562.71
DDMLVIPeR23.4552.6957.4970.25
i-LIDS25.7852.6661.1868.98
CAVIAR25.1152.9957.1867.01
GPLMNNVIPeR31.0853.7464.1775.29
i-LIDS31.4253.1763.5075.68
CAVIAR31.0554.2863.5975.49
DDTML-MMDVIPeR30.9353.2663.2174.34
i-LIDS30.9853.5863.3675.21
CAVIAR31.1254.5763.8674.83
cAMT-DCAVIPeR31.4453.8764.3175.19
i-LIDS31.2955.5163.6775.32
CAVIAR31.3554.3363.8775.03
OurTransDVIPeR31.3653.8964.1775.11
i-LIDS31.5453.5863.5075.75
CAVIAR30.5754.5363.5975.46
DDTMLVIPeR32.1555.3965.7877.49
i-LIDS33.4755.2765.1477.58
CAVIAR32.7855.4665.2878.15

Fig. 2

Performance comparison using ROC curves on the i-LIDS dataset as target dataset. (a) 3DPeS, (b) CAVIAR, and (c) VIPeR as source dataset.

JEI_27_4_043026_f002.png

From Tables 2Table 3Table 45 and Fig. 2, we can have the following conclusions:

  • (1) Compared with transfer learning methods, although DDML, LFDA, and KISSME are the popular discriminant distance learning methods, their performances are weaker than transfer metric learning methods. In particular, we can also see that LFDA-S and KISSME-S obtain better performance than LFDA-T and KISSME-T except for the case with VIPeR as target dataset. The reason is that there are much more intraperson pairs in the source dataset than pairs in the target dataset; using such a target dataset it cannot have enough intraperson pairs to train a reliable metric learning model. In addition, due to the existence of cross-domain differences, LFDA-Mix and KISSME-Mix cannot consider the essential discrepancies across domains; they obtain lower performance than five transfer learning methods.

  • (2) Compared with DDTML-MMD, DDTML performs better because the proposed MMDCD criterion in DDTML efficiently exploits the discriminative information of data in the source domain. Minimizing MMDCD can better help to minimize the distribution difference between the source domain and target domain.

  • (3) Compared with the other three transfer learning methods GPLMNN, OurTransD, and cAMT-DCA, DDTML achieves the satisfactory performance. In particular, it obtains the best average matching rate in 10 out of the 12 datasets. It is because that DDTML uses a deep neural network to learn a set of multiple layers nonlinear transformations, so that more reliable representations of data in the feature space can be well exploited.

  • (4) From the aforementioned four tables, we can observe that when using VIPeR as target dataset, the performances for all the methods are all lower. This is because there are about 316 person images in the test dataset for VIPeR, whereas average 60 person images in the other three datasets. Thus it is hard to find the correct match from a larger gallery. However, DDTML achieves the best performance on this dataset. This further indicates that DDTML can specifically consider the essential discrepancy across domains.

  • (5) Similar results are also observed on the ROC curves on the i-LIDS dataset as target dataset under the open-set setting. DDTML achieves the satisfactory performance. It can be clearly seen that our proposed DDTML is very suitable for transfer learning Re-ID tasks.

5.

Conclusion

In this paper, by integrating DDML with transfer learning, we propose a DDTML method to learn a distance metric that measures the similarity between image pairs of Re-ID dataset. But DDTML is not a simple transfer learning version of DDML. Taking account of the discriminative information of data and inherent characteristics of Re-ID dataset, the developed method also utilizes an MMDCD to minimize the distribution divergence of source data and target data. Extensive experimental results on the 3DPeS, i-LIDS, CAVIAR, and VIPeR datasets have shown that our method outperforms the state-of-the-art methods on most of the cross-scenario transfer Re-ID tasks. Since the formula of MMDCD is uncomplicated, how to take full advantage of the Re-ID dataset is still an interesting direction of future work.

Disclosures

This paper has been listed in the proceedings of 2018 SPIE Commercial + Scientific Sensing and Imaging (SI18C), volume DL10670.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61502058 and 61572085, and Jiangsu Joint Research Project of Industry, Education, and Research under Grant No. BY2016029-15.

References

1. 

S. Zhou et al., “Point to set similarity based deep feature learning for person re-identification,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), (2017). https://doi.org/10.1109/CVPR.2017.534 Google Scholar

2. 

Z. Zhong et al., “Re-ranking person re-identification with k-reciprocal encoding,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), (2017). https://doi.org/10.1109/CVPR.2017.389 Google Scholar

3. 

C. Su et al., “Attributes driven tracklet-to-tracklet person re-identification using latent prototypes space mapping,” Pattern Recognit., 66 4 –15 (2017). https://doi.org/10.1016/j.patcog.2017.01.006 Google Scholar

4. 

X. Liu et al., “Person re-identification by multiple instance metric learning with impostor rejection,” Pattern Recognit., 67 287 –298 (2017). https://doi.org/10.1016/j.patcog.2017.02.015 Google Scholar

5. 

L. Ren et al., “Multi-modal uniform deep learning for RGB-D person re-identification,” Pattern Recognit., 72 446 –457 (2017). https://doi.org/10.1016/j.patcog.2017.06.037 Google Scholar

6. 

X. Ma et al., “Person re-identification by unsupervised video matching,” Pattern Recognit., 65 197 –210 (2017). https://doi.org/10.1016/j.patcog.2016.11.018 Google Scholar

7. 

G. Watson and A. Bhalerao, “Person reidentification using deep foreground appearance modeling,” J. Electron. Imaging, 27 (5), 051215 (2018). https://doi.org/10.1117/1.JEI.27.5.051215 JEIME5 1017-9909 Google Scholar

8. 

L. Hou et al., “Normalized distance aggregation of discriminative features for person reidentification,” J. Electron. Imaging, 27 (2), 023006 (2018). https://doi.org/10.1117/1.JEI.27.2.023006 JEIME5 1017-9909 Google Scholar

9. 

Z. Cao et al., “Face recognition with learning-based descriptor,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2707 –2714 (2010). https://doi.org/10.1109/CVPR.2010.5539992 Google Scholar

10. 

X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. of the Int. Conf. on Artificial Intelligence and Statistics, 249 –256 (2010). Google Scholar

11. 

M. Guillaumin, J. Verbeek and C. Schmid, “Is that you? Metric learning approaches for face identification,” in Proc. of the IEEE 12th Int. Conf. on Computer Vision, 498 –505 (2009). https://doi.org/10.1109/ICCV.2009.5459197 Google Scholar

12. 

S. Liao et al., “Person re-identification by local maximal occurrence representation and metric learning,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2197 –2206 (2015). https://doi.org/10.1109/CVPR.2015.7298832 Google Scholar

13. 

L. Bazzani, M. Cristani and V. Murino, “Symmetry-driven accumulation of local features for human characterization and re-identification,” Comput. Vision Image Understanding, 117 (2), 130 –144 (2013). https://doi.org/10.1016/j.cviu.2012.10.008 Google Scholar

14. 

M. Farenzena et al., “Person re-identification by symmetry-driven accumulation of local features,” in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, 2360 –2367 (2010). https://doi.org/10.1109/CVPR.2010.5539926 Google Scholar

15. 

L. An et al., “Person reidentification with reference descriptor,” IEEE Trans. Circuits Syst. Video Technol., 26 (4), 776 –787 (2016). https://doi.org/10.1109/TCSVT.2015.2416561 Google Scholar

16. 

M. Köstinger et al., “Large scale metric learning from equivalence constraints,” in Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, 2288 –2295 (2012). https://doi.org/10.1109/CVPR.2012.6247939 Google Scholar

17. 

P. M. Roth et al., Mahalanobis Distance Learning for Person Re-identification, 247 –267 Springer, Cambridge, UK (2014). Google Scholar

18. 

C. Loy, C. Liu and S. Gong, “Person re-identification by manifold ranking,” in Proc. of the 20th IEEE Int. Conf. on Image Processing, 3567 –3571 (2013). https://doi.org/10.1109/ICIP.2013.6738736 Google Scholar

19. 

W. S. Zheng, S. Gong and T. Xiang, “Re-identification by relative distance comparison,” IEEE Trans. Pattern Anal. Mach. Intell., 35 (3), 653 –668 (2013). https://doi.org/10.1109/TPAMI.2012.138 ITPIDJ 0162-8828 Google Scholar

20. 

L. An, S. Yang and B. Bhanu, “Person re-identification by robust canonical correlation analysis,” IEEE Signal Process. Lett., 22 (8), 1103 –1107 (2015). https://doi.org/10.1109/LSP.2015.2390222 IESPEJ 1070-9908 Google Scholar

21. 

X. L. Ma et al., “Person Re-identification by unsupervised video matching,” Pattern Recognit., 65 197 –210 (2017). https://doi.org/10.1016/j.patcog.2016.11.018 Google Scholar

22. 

M. Ye et al., “Dynamic label graph matching for unsupervised video re-identification,” in Proc. of Int. Conf. on Computer Vision, 5152 –5160 (2017). https://doi.org/10.1109/ICCV.2017.550 Google Scholar

23. 

P. X. Peng et al., “Unsupervised cross-dataset transfer learning for person re-identification,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 1306 –1315 (2016). https://doi.org/10.1109/CVPR.2016.146 Google Scholar

24. 

J. L. Hu et al., “Cross-scenario transfer metric learning for person re-identification,” IEEE Trans. Image Process., 25 (12), 5576 –5588 (2016). https://doi.org/10.1109/TIP.2016.2612827 IIPRE4 1057-7149 Google Scholar

25. 

X. Wang et al., “Cross-scenario transfer person re-identification,” IEEE Trans. Circuits Syst. Video Technol., 26 (8), 1447 –1460 (2016). https://doi.org/10.1109/TCSVT.2015.2450331 Google Scholar

26. 

S. J. Pan, J. T. Kwok and Q. Yang, “Transfer learning via dimensionality reduction,” in Proc. of the 23rd National Conf. on Artificial Intelligence (AAAI), 677 –682 (2008). Google Scholar

27. 

B. Ma, Y. Su and F. Jurie, “BiCov: a novel image representation for person reidentification and face verification,” in Proc. of the 2012 British Machine Vision Conf., 1 –11 (2012). https://doi.org/10.5244/C.26.57 Google Scholar

28. 

I. Kviatkovsky, A. Adam and E. Rivlin, “Color invariants for person re-identification,” IEEE Trans. Pattern Anal. Mach. Intell., 35 (7), 1622 –1634 (2013). https://doi.org/10.1109/TPAMI.2012.246 ITPIDJ 0162-8828 Google Scholar

29. 

Y. Yang et al., “Salient color names for person re-identification,” in Proc. of European Conf. on Computer Vision, 536 –551 (2014). https://doi.org/10.1007/978-3-319-10590-1_35 Google Scholar

30. 

S. Pedagadi et al., “Local Fisher discriminant analysis for pedestrian re-identification,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 3318 –3325 (2013). https://doi.org/10.1109/CVPR.2013.426 Google Scholar

31. 

J. Hu, J. Lu and Y. P. Tan, “Discriminative deep metric learning for face verification in the wild,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 1875 –1882 (2014). https://doi.org/10.1109/CVPR.2014.242 Google Scholar

32. 

D. Cheng et al., “Cross-scenario transfer metric learning for person re-identification,” Pattern Recognit. Lett., (2018). https://doi.org/10.1016/j.patrec.2018.04.023 PRLEDG 0167-8655 Google Scholar

33. 

G. Zhang et al., “People re-identification using two-stage transfer metric learning,” in Proc. of 14th IAPR Int. Conf. on Machine Vision Applications (MVA), 588 –591 (2015). https://doi.org/10.1109/MVA.2015.7153260 Google Scholar

34. 

W. Li and X. Wang, “Locally aligned feature transforms across views,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 3594 –3601 (2013). https://doi.org/10.1109/CVPR.2013.461 Google Scholar

35. 

M. Sugiyama, “Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis,” J. Mach. Learn. Res., 8 (8), 1027 –1061 (2007). Google Scholar

36. 

J. V. Davis et al., “Information-theoretic metric learning,” in Proc. of the 24th Int. Conf. on Machine Learning, 209 –216 (2007). https://doi.org/10.1145/1273496.1273523 Google Scholar

37. 

P. Yang, K. Huang and C. L. Liu, “Geometry preserving multi-task metric learning,” Mach. Learn., 92 (1), 133 –175 (2013). https://doi.org/10.1007/s10994-013-5379-y MALEEZ 0885-6125 Google Scholar

38. 

W. Zheng et al., “Partial person re-identification,” in Proc. of IEEE Int. Conf. on Computer Vision (ICCV), 4678 –4686 (2015). https://doi.org/10.1109/ICCV.2015.531 Google Scholar

Biography

Tongguang Ni received his PhD from Jiangnan University in May 2015. He is a lecturer in the School of Information Science and Engineering, Changzhou University, Changzhou, China. His current research interests include pattern recognition, intelligent computation, and their application.

Xiaoqing Gu received her PhD in light industry information technology and engineering from Jiangnan University, Wuxi, China, in 2017. She is a lecturer in the School of Information Science and Engineering, Changzhou University, Changzhou, China. She has published more than 10 papers in international/national journals, including the IEEE Transactions on Industrial Informatics and IEEE Transactions on Systems, Man and Cybernetics: Systems. Her current research interests include pattern recognition and machine learning.

Hongyuan Wang received his PhD in computer science from Nanjing University of Science and Technology. He is currently a professor at Changzhou University. His general research interest is in pattern recognition and intelligence system. His current interest is in pedestrian trajectory discovery in intelligent video surveillance.

Biographies for the other authors are not available.

CC BY: © The Authors. Published by SPIE under a Creative Commons Attribution 4.0 Unported License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI.
Tongguang Ni, Xiaoqing Gu, Hongyuan Wang, Zhongbao Zhang, Shoubing Chen, and Cui Jin "Discriminative deep transfer metric learning for cross-scenario person re-identification," Journal of Electronic Imaging 27(4), 043026 (27 July 2018). https://doi.org/10.1117/1.JEI.27.4.043026
Received: 2 May 2018; Accepted: 5 July 2018; Published: 27 July 2018
Lens.org Logo
CITATIONS
Cited by 14 scholarly publications.
Advertisement
Advertisement
KEYWORDS
Data modeling

Neural networks

Cameras

Distance measurement

Nickel

Statistical analysis

Detection and tracking algorithms

Back to Top