Riemannian manifold has attracted an increasing amount of attention for visual classification tasks, especially for video or image set classification. Covariance matrices are the natural second-order statistics of image sets. However, nonsingular covariance matrices, known as symmetric positive defined (SPD) matrices, lie on the non-Euclidean Riemannian manifold (SPD manifold). Covariance discriminative learning (CDL) is an effective discriminative learning method that employs the Riemannian manifold in the SPD kernel space. However, in practice, the discriminative learning of CDL often suffers from the problems of poor generalization and overfitting caused by a finite number of training samples and noise corruption. Hence, we propose to address these problems by importing eigenspectrum regularization and graph-embedded frameworks. Discriminative learning with SPD manifold is generalized by the graph-embedded framework, which combines with eigenspectrum regularization in the SPD kernel space. Three local Laplacian graphs of graph-embedded framework and two eigenspectrum regularized models are incorporated to the proposed method. Comprehensive mathematical deduction of the proposed method is depicted with the “kernel tricks.” Experimental results on set-based face recognition and object categorization tasks reveal the effectiveness of the proposed method. |
1.IntroductionThe development of intelligent video surveillance, social networks, and electronic commerce enables a probe image set to be matched against all gallery image sets that becomes an image set classification task.1 Image sets can be extract from videos or albums. Each probe image set and gallery image set contains multiple images that belong to the same class, which allows the extraction of considerably more discriminative information than is possible in the traditional single image classification task.2 Image set classification has achieved widespread success in face recognition3–6 and object categorization.7–11 Recently, many studies have indicated that numerous particular visual features lie on a Riemannian manifold.12 The subspaces of image sets form the Grassmann manifold, the symmetric positive definite (SPD) matrices form the SPD manifold,13 and the two-dimensional shapes lie on Kendall shape spaces.14 The use of Riemannian manifold to model image set and build the corresponding classifier for classification is popular in recent years.15 Subspace and covariance matrix are two typical representations for modeling image sets on the Riemannian manifold. The subspaces of image sets form the Grassmann manifold,8 and the nonsingular covariance matrices form the SPD manifold.10 Linear subspace is a popular choice for modeling image sets due to its excellent accommodation of image variations. Hence, the Grassmann manifold formed by subspaces is widely used for image set classification. However, linear subspace-based modeling has the limitation that it incorporates only relatively weak information (such as the subspace angles) about the location and boundary of the samples in the input space.10 The second-order statistic feature known as the (nonsingular) covariance matrices of image sets that form the SPD manifold characterizes the set structure more faithfully.10 Many studies10,12,16 have shown the effectiveness of SPD manifold for image set classification and that the covariance descriptor is robust to noise and illumination variations. Covariance discriminative learning (CDL)10 is one of the most representative methods that uses covariance descriptor for image set classification. Covariance descriptor provides a natural representation for an image set, which makes no assumption about the set data distribution. Hence, covariance descriptor characterizes the set structure more faithfully, and the representation possesses stronger resistance to outliers.10 The SPD manifold formed by covariance matrices is mapped to a high-dimensional reproducing kernel Hilbert spaces (RKHS), where Euclidean geometry applies. Subsequently, linear discriminant analysis (LDA)17 is applied to perform discriminative learning with the “kernel trick,” which is known as kernel discriminant analysis.18 CDL has achieved considerable results on set-based face recognition and objection categorization tasks. In this work, we focus on the discriminative learning problem of SPD manifold on the mapped RKHS. Due to the conventional problems of linear discriminative learning, such as the singularity of within-class scatter matrix and the instability of its inverse caused by the finite number of training samples,17 CDL may suffer from overfitting and poor generalization since conventional problems may also occur during discriminative learning in the kernel space.19 To address the conventional problems of LDA, numerous approaches, such as the Fisherface LDA,17 direct LDA,20 and null space LDA21 on linear Euclidean space, have been proposed. For the conventional problems in the kernel space, kernel methods of kernel Fisherface LDA,22 null space kernel LDA,19 and kernel direct-LDA23 in the nonlinear kernel space exist. However, these approaches usually discard a subspace (either the principal space or null space) to circumvent the singularity before discriminant learning, which causes a loss of discriminative information.24 Although dual-subspace LDA25 considers the contributions of both subspaces, the associated average scaling factor may not be a suitable choice for information in the principal subspace. To address these problems, the eigenfeature regularization and extraction (ERE)24 and complete discriminant evaluation and feature extraction (CDEFE)26 approaches were proposed to address these problems in a linear flat space and nonlinear kernel Euclidean space, respectively. ERE considers that the entire eigenspace of the within-class scatter matrix should be retained for discriminant analysis and regularized by the eigenspectrum regularization weighting function. The entire eigenspace is partitioned into three parts according to the median operation, and three different strategies according to the eigenspectrum of are devised for regularization.24 CDEFE tackles these problems in the kernel space by nonlinear mapping; it decomposes the kernel within-class variation matrix into principal and noise dominated subspaces. A weighting function that is based on the ratios of the successive eigenvalues of the eigenspectrum was proposed to circumvent the undue scaling of projection vectors.26 Discriminative vectors by applying predicted eigenvalues27 combined the eigenspectrum regularization models of ERE and CDEFE. Recently, regularized locality preserving discriminant embedding28 and locality regularization embedding (LRE)29 were proposed; these methods generalized the eigenfeature extraction of ERE by the graph-embedded framework to better preserve data locality. An adaptive locality preserving regulation model was devised for eigenspectrum regularization. The experimental results have demonstrated the effectiveness of these eigenspectrum regularization techniques.29 Inspired by eigenspectrum regularization, in this work, we aim to address the conventional problems of CDL in discriminative learning by exploiting the eigenspectrum regularization with the graph-embedded framework in the RKHS, which is mapped from the SPD manifold. We refer to the proposed method as regularized graph-embedded covariance discriminative learning (RGCDL). Figure 1 shows the conceptual illustration of the proposed method. The main contributions of this paper are presented as follows.
The rest of this paper is organized as follows. We present the works related to image set classification according to the image set representations in Sec. 2. The original CDL method and the architecture of eigenspectrum regularization are introduced in Sec. 3. Then, the RGCDL approach is presented in Sec. 4. Experimental evaluation and discussions are presented in Sec. 5. Finally, Sec. 6 concludes this paper. 2.Related WorkIn this paper, we aim to use the proposed method to solve the image set classification task. The major issues of image set classification focus on how to represent image set and measure the distance or similarity between two sets.5 Various techniques have been proposed to represent image set, such as the statistical distribution,30,31 affine/convex hull model,4 spare representation,5 subspace,7,32 and covariance matrix.10,12 The methods30,31 that model each image set by statistical distribution are one of the earliest approaches employed for image set classification. They measure the similarities between pairs of distributions of two sets and achieve considerable results. However, if the set data have no strong statistical correlations for parameter estimation, these methods often fail to work.5 The most representative affine/convex hull-based methods are the affine/convex hull-based image set distance (AHISD/CHISD);4 AHISD/CHISD represents images as points in a linear or affine feature space and computes the distance of convex geometric region spanned by its feature points. Hu et al.5 incorporated the sparse representation to regularize the affine hull model. Zhu et al.6 employed the collaborative representation technique to utilize the discrimination information between gallery sets. The affine/convex hull approaches actually aim to find the synthetic nearest points between image sets.11 However, these hull models usually cannot handle the complex appearance variations caused by multiple views and extreme illumination. Subspace is a popular and effective approach for modeling image sets. Mutual subspace method (MSM)32 is one of the earliest classic subspace-based method for image set classification. MSM models all image sets by linear subspaces, and the similarity between pairs of subspaces is measured by canonical correlation analysis (CCA).33 Fukui and Yamaguchi34 and Fukui and Maki35 projected the linear subspaces to a “difference subspace,” which can extract the disparity between two subspaces. Kim et al.7 incorporated discriminative learning into subspace-based set classification according to canonical correlations (DCC). DCC attempts to obtain a linear transformation that maximizes the canonical correlations of within-class subspaces and minimizes the canonical correlations of between-class subspaces. Arandjelovic36 extended CCA to an extended version (ECCA) by extracting the most similar models of variability within two sets and exploited the discriminative learning architecture to train a classifier (DECCA). Subspaces can also be treated as points that lie on a special type of Riemannian manifold, which is known as the Grassmann manifold. The method in Ref. 3 represents an image set as multiple local linear subspaces and treats them as points on the Grassmann manifold; then, the manifold-to-manifold distance (MMD) is defined between two manifolds of two image sets. Manifold discriminant analysis37 was proposed to learn an embedding space by maximizing the manifold margin of the MMD. Grassmann manifold can also be mapped to an RKHS, where Euclidean geometry applies, Grassmann discriminant analysis (GDA)8 implements LDA on the mapped RKHS by the Grassmannian kernel. GDA is generated to kernel GDA (KGDA) using Gaussian kernel principal subspaces.38 Graph-embedding Grassmann discriminant analysis (GGDA)9 is another counterpart to the GDA method; it exploits the graph-embedded framework to implement discriminant analysis on the mapped RKHS. Grassmann nearest points (GNP)11 finds the nearest Grassmann points on the mapped vector space using the affine hull. More recently, regularized Grassmann discriminant analysis (RGDA)2 was proposed to circumvent the conventional problems of LDA, when the training sets are insufficient. However, as previously mentioned, the linear subspace-based methods have the limitation of using weak information to measure the similarity.10 Modeling visual features as covariance matrices for visual classification has become popular in recent years10,12,39 since the nonsingular covariance matrix (as known as SPD matrix) can form a special Riemannian manifold, which is referred to as SPD manifold.12 Previous studies employed covariance matrices to characterize local regions within an image, which is named the region covariance.39 Different from the region covariance descriptor, CDL is the crucial method that models the whole image set by the covariance descriptor for addressing the image set classification with SPD manifold. Huang et al.40 proposed log-Euclidean metric learning to learn a tangent mapping from the original tangent space of the SPD manifold to a new discriminative space. Tan and Gao16 proposed a patch-based principal covariance discriminative learning (PPCDL) method, in which the image set is partitioned into several local maximum linear patches by a hierarchical divisive clustering method, the local patches are modeled by covariance matrices, and the final discriminative learning is similar to CDL. Discriminant analysis on Riemannian manifold of Gaussian distributions (DARG)41 models the image set with a Gaussian mixture model (GMM) and derives a series of kernels for Gaussians discriminative learning on SPD manifold. Symmetric positive definite manifold learning12 learns an orthonormal projection from the high-dimensional SPD manifold to a low-dimensional, more discriminative manifold. 3.PreliminariesIn this section, we first review the theory of CDL10 and then present the architecture of eigenspectrum regularization according to LRE.29 3.1.Covariance Discriminative LearningCDL uses a natural methodology to characterize image sets by the covariance descriptor. Let denote the data matrix of an image set with image vectors, where in the -dimensional vector space. The covariance descriptor can be expressed as where denotes the mean of image vectors in . The covariance matrix of represents one image set, which is rather simple to derive and compute. It is worth noting that, due to the high dimensionality of visual features and insufficient samples within set, the covariance matrix of an image set is usually singular (when the number of image samples is less than the dimensions of the vector space). A simple way to circumvent this problem is to introduce a small perturbation to the covariance matrix.10 This perturbation can be denoted as , where is the identity matrix and is a scaling parameter. Hence, the nonsingular covariance matrix becomes a SPD matrix , which is an element on Riemannian manifold. In the following paper, we still use to denote the nonsingular covariance matrix for simplicity. After modeling the image sets as multiple SPD matrices, CDL explores a Riemannian kernel that is induced by the Riemannian metric, such as the log-Euclidean distance (LED)42 to map the to an Euclidean space. The Riemannian metric of LED defines a true geodesic on the Riemannian manifold, as it is induced by a positive definite kernel,42 and the manifold structure can be preserved as much as possible. The metric of LED is defined as where is the matrix Frobenius norm and denotes the principal matrix logarithm operation. The eigendecomposition of an SPD matrix is given by , and it can compute the principal matrix logarithm of as where is easily calculated using the logarithms of the eigenvalues in the diagonal matrix . CDL implements image set classification in an extrinsic manner by first mapping the Riemannian manifold to an Euclidean space. The mapping induced by the LED metric can be defined as , where denotes the manifold spanned by the SPD matrices and the vector space is the inner product space on RKHS, which can be viewed as an Euclidean space . Subsequently, the kernel function induced by the LED metric, is used to define the inner product on RKHS. For two matrices of and , the LED Riemannian kernel function can be formulated asThe kernel function is shown to be an SPD kernel10,13 that obeys Mercer’s theorem.43 Therefore, the manifold structure can be preserved by the LED Riemannian kernel. The explicit kernel feature mapping allows application of any standard vector space learning algorithms. The discriminative learning of CDL is conducted by the kernel LDA18 with the kernel trick. The mapping of Riemannian manifold to an Euclidean space is defined by the function . Therefore, if points of the specified Riemannian manifold are spanned by the matrices , the mapped feature points on Euclidean space can be denoted as . With the inner product , CDL seeks to solve the following optimization:10 where , is the kernel Gram matrix with elements , and is the connection matrix with element where is the number of sets in the ’th class, we denote the ’th class as in this paper. Here, indicates that the label of belongs to class . The optimal projection matrix is given by the largest ( is the number of training classes) eigenvectors of solving the eigenproblem , which is denoted as . Finally, for a given testing matrix in the input manifold space. The projected feature in the new discriminant Euclidean subspace can be obtained by3.2.Eigenspectrum Regularization TechniqueEigenspectrum regularization24,26,29 was originally proposed to address the conventional problems (problems caused singularity of and the numerical instability of its inverse) of LDA on the linear Euclidean space. In this section, we introduce LRE29 as an instance since it is the prototype of the proposed method in this paper. Consider samples of training data with . In LRE, intrinsic data structure is modeled to regularize the directions of data locality .29 The eigenspectrum and directions can be obtained by decomposing where is the local Laplacian matrix, which manifests the manifold through local geometry preservation,29 and is a diagonal matrix whose diagonal elements are the eigenvalues in descending order. The plot of the eigenvalues against the index is referred to as the eigenspectrum. contains the eigenvectors (directions) of the locality preserving matrix corresponding to . The locality preserving matrix has been shown to be exactly equal to the within-class scatter matrix with equal weights on the edges of adjacent data pairs of .28LRE decomposes the entire eigenspace into two subspaces: (1) the disparity subspace , which corresponds to lower locality preservation, and (2) the principal subspace for higher locality preservation. LRE indicates that the first few eigenvectors of the eigenspace correspond to large eigenvalues that provide lower locality preserving capability, whereas the eigenvectors that correspond to smaller eigenvalues provide higher locality preserving capability. Hence, larger weights are imposed on the subspace with higher locality preservation, whereas smaller weights are assigned to the subspace with lower locality preservation. A method is devised by determining “fences” to separate the disparity subspace and principal subspace, and then regularize these two subspace according to an adaptive eigenspectrum regularization model. The fences is defined by a split point on eigenspectrum , where is the third quartile (cutting off the highest 75% or lowest 25% of the sum of ), and is a parameter for adaptively scaling the separating value. The definition of is , where is the first quartile. This adaptive eigenspectrum regularization model finds the ’th split eigenvalue that satisfies . The piecewise regularization function of LRE is defined as The regularization function is imposed on the corresponding eigenvectors to form a full-dimensional transformation matrix Then, LRE can obtain a more localized feature by transforming the original training data We indicate that there is no dimensional reduction has occurred in this transforming. The information of the original training data is preserved as much as possible. In the subsequent step of LRE, feature extraction and dimensional reduction from the regularized and more compact data are performed. To further preserve the within-locality and between-locality power, a similarity weight matrix is utilized. The element of is defined as where is the number of samples in the ’th class. The within-locality graph edges are weighted with positive-valued coefficients that quantify the intraclass similarity, whereas the between-locality graph edges are weighted with negative-valued coefficients that characterize discriminative features among different class samples.29 The final objective function of LRE is defined asThis problem can be easily solved by converting it to a generalized eigenvalue problem . By retaining eigenvectors (), which correspond to the largest eigenvalues, the projection matrix is used for the final lower-dimensional eigenfeature extraction. 4.Proposed MethodIn this section, the proposed RGCDL is presented. To incorporate the eigenspectrum regularization and graph-embedded framework with SPD manifold in the kernel space, the algorithm of RGCDL is quite different from the original CDL algorithm. Generally, the algorithm of our RGCDL mainly comprises of two steps. The first step is eigenspectrum regularization, and the second step is feature extraction and dimensional reduction. 4.1.Representation of SPD ManifoldAt first, according to Harandi et al.12 and Tan and Gao,16 the computational cost of the Riemannian kernel with a high-dimensional SPD matrix is quite high. Several strategies are available to lower the dimensionality of the SPD matrix and reduce the computational cost of constructing the Riemannian kernel matrix.12 Here, we combine all the training data in different training sets to collaboratively produce the dimensional reduction projection matrix by PCA. Consider training image sets , each set contains images . We combine all images of all sets to build a sample data collection The dimensional reduction projection matrix can be obtained by decomposing the following sample covariance matrix: where is the sample mean. We select orthonormal eigenvectors that correspond to the largest eigenvalues of to form the dimensional reduction projection matrix . All images in each set are transformed to a low-dimensional feature space, and the ’th sample in the low-dimensional feature space is calculated asThis simple PCA that is applied to all training sets not only alleviates the problem of the high computational complexity of constructing the SPD kernel matrix but also better preserves the main variations in the set data to build the covariance matrices, which form the SPD manifold. This operation of refining the high-dimensional SPD matrices can also be viewed as a transformation from the high-dimensional manifold to a low-dimensional manifold.16 4.2.Eigenspectrum Regularization with SPD ManifoldThe ’th dimensional reduced set can be represented as . can be modeled by the covariance descriptor [Eq. (1)] and represented as . To ensure that is nonsingular to form the SPD manifold, a small perturbation is added to the covariance matrix . Hence, the perturbed is an SPD matrix . For image sets of classes, they can be denoted as a collection of matrices that form an SPD manifold. By defining the Riemannian mapping , we can obtain the samples on the RKHS , which is homeomorphic to Euclidean space. To further preserve the local structure, we incorporate the graph-embedded framework into our proposed method. The local Laplacian matrix is utilized to preserve the locality information, whereas the global Laplacian matrix is adjacent regardless of the class membership of all vertices.29 In this step, we aim to obtain the eigenspectrum and directions of the local structure in the SPD Riemannian kernel space. They can be implemented by decomposing the locality preserving matrix on the mapped space; we denote as for simplicity. Then, we have where constructs the kernel eigenspace of , and the eigenvalues in define the kernel eigenspectrum. The local Laplacian matrix can be specified into different local Laplacian graphs. In this work, we employ the binary local Laplacian , intraclass local Laplacian , and adjustable local Laplacian for instances. is a simple-minded Laplacian matrix in which intraclass vertices are adjacent with equal weight of each edge. is the Laplacian graph that satisfies where is the connection weight of the ’th and ’th sets, and it has the same definition as Eq. (6) in CDL.10 The locality preserving matrix can be proved to be equal to the kernel within-class scatter matrix with equal weights on the edges of adjacent data pairs of .29Unlike the edge weights in and , which are fixed in values, the edge weights in are variables that are based on different similarity definitions, such as the heat kernel in locality preserving projections44 and neighborhood reconstruction coefficients in neighborhood preserving embedding.45 is a Laplacian matrix that is computed by where is a diagonal matrix calculated by . In this paper, we compute the edge weights of in Eq. (19) based on the heat kernel, which is calculated by Gaussian distribution. The edge weight of can be computed as where is the kernel width parameter. The Euclidean distance of the mapped feature and can be easily transformed to , where can be calculated by the Riemannian kernel, such as Eq. (4).As known by linear algebra, the projection direction of in Eq. (17) can be represented as a linear combination of the eigenspace on the mapped space By substituting Eq. (21) into Eq. (17), we obtain . We use the Riemannian kernel function [e.g., Eq. (4)] to build the kernel Gram matrix , Eq. (17) can be rewritten as Equation (22) can be solved by the eigendecomposition of subject to . In the theory of eigenspectrum regularization, we need to regularize the whole feature space [see Eq. (21)] on the mapped space. Assume that, the regularization can be generalized to the weighting function, which is defined as where is the number of all training sets. The full-dimensional feature space contains vectors since the dimensions of matrix is ). Hence, the regularized eigenspace can be computed asAccording to Eq. (21), we have . By defining the regularized eigenspace of Eq. (24) can be rewritten asThe regularized eigenspace is known as a transformation matrix24 that can transform the original feature data to an intermediate feature vector space. It is worth noting that the transformation matrix is a full-dimensional matrix with size . Hence, the mapped data from SPD manifold can be transformed to the new feature vector space with no dimensional reduction, which can preserve information as much as possible. We denote as for simplicity. The transformation is depicted as Although is implicitly defined, the transformed feature can be explicitly expressed by the kernel trick. According to Eqs. (26) and (27), we can denote the transformed by . As , Eq. (27) can be rewritten as In this aspect, according to the previous mathematical deduction, by defining the important regularized eigenspace [see Eq. (25)], the regularization of eigenspace is turned into the regularization of the eigenspace . In other words, the effectiveness of eigenspectrum regularization model on the eigenspace is equivalent to the eigenspace . The selection of a suitable eigenspectrum regularization model is a critical aspect of the proposed method. The proper eigenspectrum regularization model ensures that the regularized data can be very close to the real population variances.46 The eigenspectrum regularization of LRE is an adaptive model that estimates the optimal parameter using training data. However, this process is usually time-consuming, and the performance decays quickly when the training data are insufficient. In this paper, we employ the data-independent eigenspectrum regularization models of ERE and CDEFE to regularize the eigenspace , which are more general and robust. The first model is the eigenspectrum regularization model of ERE.24 The heuristic theory of ERE for designing the eigenspectrum regularization model is the median operation. The weighting function applied to the eigenspace is defined as where is the ’th eigenvalue of in descending order, which satisfies where is the median value computed by . is a constant with a recommendation value of 1.24 is the rank of . The parameters of and are calculated asThe second regularization model is taken from CDEFE.26 The eigenspectrum regularization model of CDEFE regularizes the eigenspace in a Gaussian kernel space, which may have a special effect on the proposed RGCDL in the Riemannian kernel space. The second regularization model aims to find the minimum eigenratio from the eigenspectrum of , which is formed by the eigenvalues [see Eq. (17)] in descending order. Let denote the ratio of two adjacent eigenvalues and in the eigenspectrum, we have The minimum eigenratio can be formulated as where is the index of the minimum eigenratio, and is the rank of the locality matrix . The eigenspectrum is split by the point of the ’th eigenvalue, and is defined as . Thus, the final regularized weighting function can be defined as4.3.Feature Extraction and Dimensional ReductionThe new feature vectors are compact and full dimensional, the eigenfeature of should be decorrelated and dimension reduced for classification. According to Jiang et al.,24 PCA is exploited to extract the final discriminative eigenfeatures since it is less sensitive to different training databases. However, the class affinity is not considered in Ref. 24, which may cause the missing discriminative information. In this work, we employ a graph-embedded framework to extract the final discriminative features, which incorporates a similarity weight matrix to form the final scatter matrix. Although LRE had extended the graph-embedded framework to eigenfeature extraction, our method is designed to address the problems in a Riemannian kernel space. According to Eq. (13), the eigenfeature extraction and dimensional reduction of RGCDL can be achieved by solving the following eigendecomposition problem on the mapped space: where is the similarity weight matrix, and the affinity between set and set is defined as where is the number of sets in the ’th class. This similarity weight matrix allows the intraclass samples to be more compact and allows the interclass samples to be more separated. Clearly, the problem of Eq. (35) can be solved by decomposing matrix . The projection matrix consists of the eigenvectors that correspond to eigenvalues in descending order. We retain the first eigenvectors , where , for the final dimension of the extracted feature. Hence, the final regularized projection matrix of RGCDL can be defined asObviously, does not have an explicit expression since , and is the vector space mapped by the Riemannian mapping, which is implicitly defined. However, an explicit expression can be provided by the kernel trick when calculated with the test samples. For a given test nonsingular covariance matrix , which is an element of the SPD manifold, we use to denote the test feature vector that is mapped by the Riemannian mapping. Subsequently, we can extract the discriminative feature by the transformation Substitute by Eq. (26), and calculate a kernel Gram matrix by the Riemannian kernel function [e.g., Eq. (4)], the final extracted eigenfeature can be rewritten as Here, is constructed by feature vectors on the mapped space. Hence, various distance metrics and classification methods that designed in Euclidean space, such as the nearest neighbor (NN) classifier, can be applied for classification.4.4.Complete RGCDL AlgorithmThe steps of RGCDL algorithm are given in Algorithm 1. Algorithm 1RGCDL algorithm.
5.Experimental ResultsExperiments were conducted on set-based face recognition and object categorization tasks. First, we compare the proposed RGCDL to the original CDL method when using different numbers of extracted features. Second, we show the advantages of RGCDL over the recent RGDA method. Last, we evaluate the recognition performance of our RGCDL, and compare it to numerous image set-based classification methods. 5.1.Dataset and Parameter SettingsWe employed the Extended Yale face database B (ExtYaleB)47 for face recognition task and the RGB-D object database48 for object categorization task. The ExtYaleB database is the extension of the Yale face database B; it contains 16,128 images of 28 human subjects with 64 illumination conditions and 9 poses for each subject. According to the 9 poses of each subject, we built 9 image sets ( images per set), which correspond to 9 poses for each subject. We utilized a cascaded face detector49 to collect faces from each image frame. The captured faces were then converted to grayscale and resized to . Some example images are shown in Fig. 2. We selected 2 to 5 image sets of 9 poses for discriminative training (103 sets) and employed the remaining sets for testing (149 sets). The experiments were repeated 10 times by randomly choosing the reference sets for training and the test sets for probe. The RGB-D object database is a large-scale dataset of 300 common household objects that are organized into 51 categories (classes). Each category has 3 to 14 objects that belong to the same category. For each object, 3 video sequences are recorded with a camera that is mounted at different heights so that the object is viewed from different angles with the horizon. The video sequences were captured by placing each object on a turntable for a whole rotation using a Kinect style three-dimensional camera. More than 100 images were extracted for each object’s video sequences; they involve RGB color channels and a depth channel. We removed the depth images in this study to ensure fair comparisons. We built image sets according to each object, forming a total of 300 image sets (102 sets for training and 198 sets for testing). Grayscale and resized images of were adopted for RGB-D dataset. Some example objects are shown in Fig. 3. To obtain more general results, we also conducted 10 cross-validation experiments by randomly choosing different combinations of training sets and test sets. The NN classifier was applied for all evaluations to ensure fair comparisons. 5.2.Stability of Extracted FeaturesIn this section, we evaluated the stability of the extracted features from RGCDL. We show that by applying eigenspectrum regularization, the features extracted by RGCDL are more stable than those extracted by the original CDL method. As described in Sec. 4, the proposed RGCDL aims to extract features from the whole regularized eigenfeature space. As the number of final extracted features increases [controlled by varying dimensions of in Eq. (37)], a higher performance can be achieved by our RGCDL, whereas the original CDL algorithm cannot retain this characteristic. To confirm this assumption and provide evidence, we employed real data of face and object datasets to conduct these experiments. The collaborative dimensional reduction of each image set is set to 100 dimensions, that is, the dimension-reduced covariance matrix of each image set is . Attributed to the covariance descriptor, the computational cost of constructing the Riemannian kernel matrices is not associated with the number of images within a set. The Riemannian kernel induced by the LED in Eq. (4) was employed for the RGCDL and CDL methods. We vary the final feature dimensions of RGCDL and CDL to perform a comprehensive comparison. The comparison results are shown in Figs. 4 and 5. Each figure consists of the recognition rates against the number of final extracted features. The recognition rates are the average results of 10 cross-validation experiments. Two eigenspectrum regularization models of ERE and CDEFE were evaluated for the proposed RGCDL. As shown in Figs. 4 and 5, with the increasing dimensions of the final extracted features, the proposed RGCDL methods with two regularization models generally produce low error rates, whereas the original CDL degrades rapidly from the dimension . The dimension of represents the optimal performance number of features in LDA.17 The degradation of CDL is caused by the incorrectly scaled null kernel space of the within-class scatter matrix,24 which causes overfitting and poor generalization. These results reveal that, with the eigenspectrum regularization models, the conventional problems (e.g., the singularity of within-class scatter matrix) caused by limited training samples in CDL can be alleviated. Since the new feature space is properly scaled, the estimated eigenvalues obey the true variances of the population;24 hence, better generalization can be achieved. The final extracted features are learned from the regularized full-dimensional transformation matrix [Eq. (26)], which can preserve discriminative information as much as possible. As a result, using an increasing number of features, the recognition rates of RGCDL preserve the stable performance. 5.3.Performance Evaluation against RGDARGDA is the preliminary work of this paper, however, RGDA was proposed to solve the overfitting and poor generalization problems of Grassmann discriminative learning, whereas our RGCDL solves these problems against CDL on SPD manifold. Moreover, we further employed different local Laplacian graphs to analyze the locality preserving ability and improve the performance; the locality preserving is evaluated in Sec. 5.4. We discovered that the performance of the SPD manifold with the eigenspectrum regularization techniques is better than that of the Grassmann manifold. We conducted two experiments to evaluate the advantages of the proposed method. First, we compared the classification ability of RGDA and RGCDL when different dimensions of the extracted features were applied. As shown in Figs. 6 and 7, for both eigenspectrum regularization models in two datasets, the error rates of RGCDL with different number of features always lower than those of the RGDA. Moreover, the error rate curves of RGCDL are smoother and steadier for a different number of features, and RGCDL can achieve a lower error rate even with a low number of features, particularly for the ExtYaleB dataset. The RGDA method usually cannot achieve high performance using lower dimensions of features. This finding demonstrates that the SPD manifold formed by the covariance matrices has better discriminative information preservation ability than the Grassmann manifold formed by subspaces. Subsequently, we conducted noisy set data to evaluate the robustness of the proposed method. Image sets may contain noisy data in real-world applications, for example, outliers of other categories or subjects within sets exist, which may degrade the performance of classifiers. Here, we show that the SPD manifold-based RGCDL method is more robust than the Grassmann manifold-based RGDA method. We conducted experiments by systematically corrupting the training (gallery) sets or test (probe) sets. The corruption is implemented by adding images from other classes. The data with no noise are denoted as “clean,” the data with noise in the gallery sets are denoted as “N_G,” and the data with noise in the probe sets are denoted as “N_P.” Experiments were evaluated using both face recognition and object categorization. The average classification rates of several cross-validations with different noise-corrupted datasets are shown in Figs. 8 and 9. The classification rates of our RGCDL always outperform RGDA in clean and different corrupted data. Especially in the gallery corrupted data N_G, RGCDL-ERE and RGCDL-CDEFE exhibit great advantages than RGDA-ERE and RGDA-CDEFE. Once again, these results demonstrate that the SPD manifold formed by the second-order statistic covariance matrices can be able to account for the noisy set data better than the Grassmann manifold formed by subspaces; it reveals the robustness of RGCDL when dealing with noisy set data. 5.4.Performance Comparison to Other Set-Based Classification MethodsWe further evaluated the proposed RGCDL compared with other set-based classification methods. Multiple image set-based classification methods were evaluated for comprehensive comparison. The compared methods include the subspace-based methods DCC7 and ECCA;36 the Grassmann manifold methods of GDA,8 KGDA,38 GGDA,9 MMD,3 GNP,11 and RGDA;2 the SPD Riemannian manifold methods of CDL,10 PPCDL,16 and DARG.41 The parameter settings of different methods are depicted as follows. The final feature dimension of GDA, KGDA, GGDA, and RGDA was established as the recommendation of Ref. 2, and only the projection kernel8 was employed for Grassmannian mapping. The parameters (such as the nonlinearity score and number of NNs of data points) of MMD were tuned to be optimal with the code provided by the authors in our datasets. For CDL and PPCDL, the final feature dimension was set as the recommended value . The dimension of the input covariance matrices of DARG and PPCDL was set to , which is the same as the proposed RGCDL for fairness. The Riemannian kernel induced by LED was applied for CDL, PPCDL, and our RGCDL. We chose kernel-based DARG and the good performance of MD + LED41 distance matrix for evaluation. We fixed the dimension to 150 for DCC by preapplying PCA to the data.7 The number of canonical correlations of DCC and ECCA was set to 20, which is the same as the Grassmannian dimension of the GDA, KGDA, GGDA, and RGDA methods. For RGDA and our RGCDL, the eigenspectrum regularization models of ERE and CDEFE were applied for comparison. Three local Laplacian matrices of , , and were evaluated in the proposed RGCDL. Experiments were evaluated on the ExtYaleB and RGB-D datasets. The experimental results are formed by the average classification rates and standard deviations over 10-fold cross-validations. As shown in Table 1, the proposed RGCDL with regularization models of ERE and CDEFE achieves the best classification results among all methods. The SPD manifold with covariance matrices of CDL, PPCDL, DARG, and our RGCDL approaches usually achieves better performance than other methods in ExtYaleB, which has shown the better accommodative ability of the second-order statistic of covariance matrix on handling the illumination-varying face recognition. The inferior performances of PPCDL and DARG on RGB-D dataset may cause by the conventional problems of discriminative learning and the improperly estimated GMM. Benefitting from the eigenspectrum regularization with the graph-embedded framework, the proposed RGCDL with different models outperforms all other methods. For the evaluation of different local Laplacian graphs, the achieves the best results with the ERE regularization model. However, the performance of adjustable Laplacian matrix is also outstanding in ERE model, and it achieves the best results with the CDEFE model. The adjustable Laplacian matrix performs stable in different regularization models, it has revealed the good locality preserving ability. Obviously, is not the best local Laplacian matrix for locality preserving, better affinity matrix can be designed according to suitable theories. Table 1Average classification rates and standard deviations (%) on ExtYaleB and RGB-D datasets.
Note: The bold values denote the proposed methods and the highest classification rates. 6.ConclusionIn this paper, we proposed a regularized graph-embedded CDL method, which is referred to as RGCDL. The eigenspectrum regularization and graph-embedded framework are collaboratively employed to attenuate the overfitting and poor generalization problems of the original CDL method. Comprehensive mathematical deduction in SPD manifold kernel space is given to exhibit the combination of these techniques. The experimental results of evaluating a different number of extracted features show that the proposed method can maintain stable and lower error rates throughout all dimensions of the extracted features. This result manifests the stability of the eigenspectrum regularization to linear discriminative learning in the SPD manifold kernel space. The graph-embedded framework benefits by preserving compact within-class affinity relations and achieves higher performance. Compared with the more recent RGDA method, our RGCDL achieves higher and steadier performance when different number of features are employed. Moreover, our RGCDL exhibits more robust ability than RGDA when the gallery or probe sets are corrupted by noise. According to the plentiful comparisons with other set-based classification methods, our RGCDL has shown considerable results. The local Laplacian matrix reflects the local structure of intraclass, how to devise the similarity of intraclass vertex pairs to better preserve locality information is one of our future works. AcknowledgmentsThis work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61701126, 61802148, and 61802079, the Research projects in Guangzhou University, China, under Grant No. RP2020123, and the Scientific Research Program of Guangzhou under Grant No. 201904010493. ReferencesH. Hu,
“Face recognition with image sets using locally Grassmannian discriminant analysis,”
IEEE Trans. Circuits Syst. Video Technol., 24
(9), 1461
–1474
(2014). https://doi.org/10.1109/TCSVT.2014.2309834 Google Scholar
H. Tan et al.,
“Eigenspectrum regularization on Grassmann discriminant analysis with image set classification,”
IEEE Access, 7 150792
(2019). https://doi.org/10.1109/ACCESS.2019.2947548 Google Scholar
R. Wang et al.,
“Manifold–manifold distance and its application to face recognition with image sets,”
IEEE Trans. Image Process., 21
(10), 4466
–4479
(2012). https://doi.org/10.1109/TIP.2012.2206039 IIPRE4 1057-7149 Google Scholar
H. Cevikalp and B. Triggs,
“Face recognition based on image sets,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
2567
–2573
(2010). https://doi.org/10.1109/CVPR.2010.5539965 Google Scholar
Y. Q. Hu, A. S. Mian and R. Owens,
“Face recognition using sparse approximated nearest points between image sets,”
IEEE Trans. Pattern Anal. Mach. Intell., 34
(10), 1992
–2004
(2012). https://doi.org/10.1109/TPAMI.2011.283 ITPIDJ 0162-8828 Google Scholar
P. F. Zhu et al.,
“Image set-based collaborative representation for face recognition,”
IEEE Trans. Inf. Forensics Secur., 9
(7), 1120
–1132
(2014). https://doi.org/10.1109/TIFS.2014.2324277 Google Scholar
T. K. Kim, J. Kittler and R. Cipolla,
“Discriminative learning and recognition of image set classes using canonical correlations,”
IEEE Trans. Pattern Anal. Mach. Intell., 29
(6), 1005
–1018
(2007). https://doi.org/10.1109/TPAMI.2007.1037 ITPIDJ 0162-8828 Google Scholar
J. Hamm and D. D. Lee,
“Grassmann discriminant analysis: a unifying view on subspace-based learning,”
in Int. Conf. Mach. Learn.,
376
–383
(2008). Google Scholar
M. T. Harandi et al.,
“Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
2705
–2712
(2011). https://doi.org/10.1109/CVPR.2011.5995564 Google Scholar
R. Wang et al.,
“Covariance discriminative learning: a natural and efficient approach to image set classification,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
2496
–2503
(2012). https://doi.org/10.1109/CVPR.2012.6247965 Google Scholar
H. L. Tan et al.,
“Grassmann manifold for nearest points image set classification,”
Pattern Recognit. Lett., 68 190
–196
(2015). https://doi.org/10.1016/j.patrec.2015.09.008 PRLEDG 0167-8655 Google Scholar
M. Harandi, M. Salzmann and R. Hartley,
“Dimensionality reduction on SPD manifolds: the emergence of geometry-aware methods,”
IEEE Trans. Pattern Anal. Mach. Intell., 40
(1), 48
–62
(2018). https://doi.org/10.1109/TPAMI.2017.2655048 ITPIDJ 0162-8828 Google Scholar
S. Jayasumana et al.,
“Kernel methods on the Riemannian manifold of symmetric positive definite matrices,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
73
–80
(2013). https://doi.org/10.1109/CVPR.2013.17 Google Scholar
D. G. Kendall,
“Shape manifolds, procrustean metrics, and complex projective spaces,”
Bull. London Math. Soc., 16
(2), 81
–121
(1984). https://doi.org/10.1112/blms/16.2.81 Google Scholar
R. Vemulapalli, J. K. Pillai and R. Chellappa,
“Kernel learning for extrinsic classification of manifold features,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
1782
–1789
(2013). https://doi.org/10.1109/CVPR.2013.233 Google Scholar
H. Tan and Y. Gao,
“Patch-based principal covariance discriminative learning for image set classification,”
IEEE Access, 5 15001
–15012
(2017). https://doi.org/10.1109/ACCESS.2017.2733718 Google Scholar
P. N. Belhumeur, J. P. Hespanha and D. J. Kriegman,
“Eigenfaces vs. Fisherfaces: recognition using class specific linear projection,”
IEEE Trans. Pattern Anal. Mach. Intell., 19
(7), 711
–720
(1997). https://doi.org/10.1109/34.598228 ITPIDJ 0162-8828 Google Scholar
G. Baudat and F. Anouar,
“Generalized discriminant analysis using a kernel approach,”
Neural Comput., 12
(10), 2385
–2404
(2000). https://doi.org/10.1162/089976600300014980 NEUCEB 0899-7667 Google Scholar
W. Liu et al.,
“Null space-based kernel Fisher discriminant analysis for face recognition,”
in IEEE Int. Conf. Autom. Face and Gesture Recognit.,
369
–374
(2004). https://doi.org/10.1109/AFGR.2004.1301558 Google Scholar
H. Yu and H. Yang,
“A direct LDA algorithm for high-dimensional data—with application to face recognition,”
Pattern Recognit., 34
(10), 2067
–2070
(2001). https://doi.org/10.1016/S0031-3203(00)00162-X Google Scholar
L. F. Chen et al.,
“A new LDA-based face recognition system which can solve the small sample size problem,”
Pattern Recognit., 33
(10), 1713
–1726
(2000). https://doi.org/10.1016/S0031-3203(99)00139-9 Google Scholar
M. H. Yang,
“Kernel eigenfaces vs. kernel Fisherfaces: face recognition using kernel methods,”
in IEEE Int. Conf. Autom. Face and Gesture Recognit.,
(2002). https://doi.org/10.1109/AFGR.2002.4527207 Google Scholar
J. Lu, K. N. Plataniotis and A. N. Venetsanopoulos,
“Face recognition using kernel direct discriminant analysis algorithms,”
IEEE Trans. Neural Networks, 14 117
–126
(2003). https://doi.org/10.1109/TNN.2002.806629 ITNNEP 1045-9227 Google Scholar
X. D. Jiang, B. Mandal and A. Kot,
“Eigenfeature regularization and extraction in face recognition,”
IEEE Trans. Pattern Anal. Mach. Intell., 30
(3), 383
–394
(2008). https://doi.org/10.1109/TPAMI.2007.70708 ITPIDJ 0162-8828 Google Scholar
X. Wang and X. Tang,
“Dual-space linear discriminant analysis for face recognition,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
(2004). https://doi.org/10.1109/CVPR.2004.1315214 Google Scholar
X. Jiang, B. Mandal and A. Kot,
“Complete discriminant evaluation and feature extraction in kernel space for face recognition,”
Mach. Vision Appl., 20
(1), 35
–46
(2009). https://doi.org/10.1007/s00138-007-0103-1 MVAPEO 0932-8092 Google Scholar
B. Mandal et al.,
“Prediction of eigenvalues and regularization of eigenfeatures for human face verification,”
Pattern Recognit. Lett., 31
(8), 717
–724
(2010). https://doi.org/10.1016/j.patrec.2009.10.006 PRLEDG 0167-8655 Google Scholar
P. Y. Han, A. B. J. Teoh and F. S. Abas,
“Regularized locality preserving discriminant embedding for face recognition,”
Neurocomputing, 77
(1), 156
–166
(2012). https://doi.org/10.1016/j.neucom.2011.09.007 NRCGEO 0925-2312 Google Scholar
Y. H. Pang, A. B. J. Teoh and F. S. Hiew,
“Locality regularization embedding for face verification,”
Pattern Recognit., 48
(1), 86
–102
(2015). https://doi.org/10.1016/j.patcog.2014.07.010 Google Scholar
X. Liu and T. Cheng,
“Video-based face recognition using adaptive hidden markov models,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
340
–345
(2003). https://doi.org/10.1109/CVPR.2003.1211373 Google Scholar
M. Kim et al.,
“Face tracking and recognition with visual constraints in real-world videos,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
1787
–1794
(2008). https://doi.org/10.1109/CVPR.2008.4587572 Google Scholar
O. Yamaguchi, K. Fukui and K. I. Maeda,
“Face recognition using temporal image sequence,”
in IEEE Int. Conf. Autom. Face and Gesture Recognit.,
318
–323
(1998). https://doi.org/10.1109/AFGR.1998.670968 Google Scholar
H. Hotelling,
“Relation between two sets of variable,”
Biometrica, 28 321
–377
(1936). https://doi.org/10.1093/biomet/28.3-4.321 BIJODN 1521-4036 Google Scholar
K. Fukui and O. Yamaguchi,
“Face recognition using multi-viewpoint patterns for robot vision,”
in Int. Symp. Rob. Res.,
192
–201
(2005). Google Scholar
K. Fukui and A. Maki,
“Difference subspace and its generalization for subspace-based methods,”
IEEE Trans. Pattern Anal. Mach. Intell., 37
(11), 2164
–2177
(2015). https://doi.org/10.1109/TPAMI.2015.2408358 ITPIDJ 0162-8828 Google Scholar
O. Arandjelovic,
“Discriminative extended canonical correlation analysis for pattern set matching,”
Mach. Learn., 94
(3), 353
–370
(2014). https://doi.org/10.1007/s10994-013-5380-5 MALEEZ 0885-6125 Google Scholar
R. Wang and X. Chen,
“Manifold discriminant analysis,”
in IEEE Conf. Comput. Vision and Pattern Recognit.,
429
–436
(2009). https://doi.org/10.1109/CVPR.2009.5206850 Google Scholar
T. S. Wang and P. F. Shi,
“Kernel Grassmannian distances and discriminant analysis for face recognition from image sets,”
Pattern Recognit. Lett., 30
(13), 1161
–1165
(2009). https://doi.org/10.1016/j.patrec.2009.06.002 PRLEDG 0167-8655 Google Scholar
O. Tuzel, F. Porikli and P. Meer,
“Pedestrian detection via classification on Riemannian manifolds,”
IEEE Trans. Pattern Anal. Mach. Intell., 30
(10), 1713
–1727
(2008). https://doi.org/10.1109/TPAMI.2008.75 ITPIDJ 0162-8828 Google Scholar
Z. H. Huang et al.,
“Log-Euclidean metric learning on symmetric positive definite manifold with application to image set classification,”
in Int. Conf. Mach. Learn.,
720
–729
(2015). Google Scholar
W. Wang et al.,
“Discriminant analysis on Riemannian manifold of Gaussian distributions for face recognition with image sets,”
IEEE Trans. Image Process., 27 151
–163
(2018). https://doi.org/10.1109/TIP.2017.2746993 IIPRE4 1057-7149 Google Scholar
V. Arsigny et al.,
“Geometric means in a novel vector space structure on symmetric positive definite matrices,”
SIAM J. Matrix Anal. Appl., 29
(1), 328
–347
(2007). https://doi.org/10.1137/050637996 SJMAEL 0895-4798 Google Scholar
B. Scholkopf and A. J. Smola, Learning with Kernels, MIT Press, Cambridge, Massachusetts
(2002). Google Scholar
X. He and P. Niyogi,
“Locality preserving projections,”
in Adv. Neural Inf. Process. Syst.,
234
–241
(2003). Google Scholar
X. He et al.,
“Neighborhood preserving embedding,”
in IEEE Int. Conf. Comput. Vision,
(2005). https://doi.org/10.1109/ICCV.2005.167 Google Scholar
X. D. Jiang,
“Linear subspace learning-based dimensionality reduction,”
IEEE Signal Process Mag., 28
(2), 16
–26
(2011). https://doi.org/10.1109/MSP.2010.939041 Google Scholar
A. S. Georghiades, P. N. Belhumeur and D. J. Kriegman,
“From few to many: illumination cone models for face recognition under variable lighting and pose,”
IEEE Trans. Pattern Anal. Mach. Intell., 23
(6), 643
–660
(2001). https://doi.org/10.1109/34.927464 ITPIDJ 0162-8828 Google Scholar
K. Lai et al.,
“A large-scale hierarchical multi-view RGB-D object dataset,”
in IEEE Int. Conf. Rob. and Autom.,
1817
–1824
(2011). https://doi.org/10.1109/ICRA.2011.5980382 Google Scholar
P. Viola and M. J. Jones,
“Robust real-time face detection,”
Int. J. Comput. Vision, 57
(2), 137
–154
(2004). https://doi.org/10.1023/B:VISI.0000013087.49260.fb IJCVEQ 0920-5691 Google Scholar
BiographyHengliang Tan received his BE degree from Foshan University, Foshan, China, in 2006, and his ME and PhD degrees from Sun Yat-sen University, Guangzhou, China, in 2011 and 2016, respectively. He joined the School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, in 2016. His current research interests include machine learning, pattern recognition, and manifold learning. Ying Gao received his PhD from the South China University of Technology, Guangzhou, China, in 2002. He is currently a professor at the School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou. His main research interests include intelligent optimization algorithms, pattern recognition, and signal processing. Jiao Du received her MS and PhD degrees from the Chongqing University of Posts and Telecommunications in 2013 and 2017, respectively. She is currently a lecturer at the School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China. Her research interests include pattern recognition and image processing. Shuo Yang received his master’s degree in software engineering from Dalian Jiaotong University, China, in 2013. He was awarded a doctorate degree in software engineering, University of Macau, in 2017. He is currently a lecturer at the School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou, China. His research interests include semantic interoperability and semantic inference with artificial intelligence technology, mainly applied to the fields of e-commerce, e-marketplace, and clinical area. |