Learning a subspace for face image clustering via trace ratio criterion

Chenping Hou; Feiping Nie; Changshui Zhang; Yi Wu

doi:10.1117/1.3149850

1 June 2009 Learning a subspace for face image clustering via trace ratio criterion

Chenping Hou, Feiping Nie, Changshui Zhang, Yi Wu

Author Affiliations +

Optical Engineering, Vol. 48, Issue 6, 060501 (June 2009). https://doi.org/10.1117/1.3149850

Abstract

Face clustering is gaining ever-increasing attention due to its importance in optical image processing. Because traditional clustering methods do not specify the particular characters of the face image, they are not suitable for face image clustering. We propose a novel approach that employs the trace ratio criterion and specifies that the face images should be spatially smooth. The graph regularization technique is also applied to constrain that nearby images have similar cluster indicators. We alternately learn the optimal subspace and the clusters. Experimental results demonstrate that the proposed approach performs better than other learning methods for face image clustering.

1. Introduction

Human face clustering is an important research direction in the field of optical image processing. It has been successfully used in many fields. For example, in face image retrieval, if we can automatically cluster different kinds of face images, then it will beneficial to discover the latent similarities among various face images. The computational cost of searching for an interested face image will also reduce considerably.

Currently, a lot of research has been proposed for face image processing. However, most of it focuses on learning a subspace for face classification (i.e., classify the face images in the learned subspace with a certain number of labeled samples). The most prominent approach is called the Fisherface.¹ It employs the supervised method, linear discriminant analysis (LDA),² for face classification.

There is little work dedicated to learning a subspace for face image clustering. One direct way is to employ the traditional clustering techniques (e.g., K-means³) for clustering. Nevertheless, it is difficult for K-means to achieve a satisfied accuracy because the vector representations of face images have very high dimensionality (commonly, the number of pixels of a image). Then, one of the most famous works, which is named $PCA + K$ -means, is proposed. It employs the unsupervised principle component analysis (PCA)² technique to compute the famous Eigenface ¹ and then applies the K-means approach for clustering. Although it has achieved prominent performances in many applications, $PCA + K$ -means seems unsuitable for face image clustering. It takes into account few considerations about the special characters and manifold structures of face images.⁴ Recently, Ding extended the traditional supervised LDA method for clustering.⁵ The method, which is called LDA-Km, combines LDA with K-means and learns the subspace and clusters alternately. It has been reported that LDA-Km performs better than the K-means and the $PCA + K$ -mean approaches.⁵ However, it also gives no considerations to the manifold structure and the spatial smooth character⁶ of face images. Additionally, LDA-Km applies the traditional ratio trace criterion, not the more prominent trace ratio criterion.⁷ The accuracy of LDA-Km is not as high as expected for face image clustering.

In this paper, we propose a new method, which is named as subspace clustering via trace ratio (SCTR) for face image clustering. First, it applies the trace ratio criterion, which has been shown more effective for discriminative subspace learning.⁷ Second, we employ the spatial smooth regularizer, which represents the particular character of images. Finally, we consider the manifold structure by intentionally adding the graph-smooth constraint. After computing the optimal clustering and learning the subspace alternately, we can finally derive the subspace and clustering results simultaneously. Experiments show that SCTR performs better than the state-of-the-art approaches.

2. Leaning a Subspace for Face Image Clustering

2.1.

Problem Formulation

Assume $X = [x_{1}, x_{2}, \dots, x_{l}]$ are vector representations of $l$ face images, which are all of $D_{1} \times D_{2}$ resolutions. Here $x_{i} ∊ R^{D}$ for $i = 1, 2, \dots, N$ and $D = D_{1} \times D_{2}$ . The number of clusters is predefined as $C$ . The purpose of our method is to find the subspace in which these face images can be optimally clustered. More concretely, under a given criterion, we plan to find the transformation matrix $W ∊ R^{D \times d}$ and simultaneously, the indicator matrix $F ∊ R^{l \times C}$ . Here, $d$ is the dimensionality of the subspace. $F_{i j} = 1 ∕ \sqrt{l_{j}}$ if $x_{i}$ belong to the $j th$ cluster, and $F_{i j} = 0$ otherwise, where $l_{j}$ is the number of samples in the $j th$ cluster.

It has been shown⁷ that the trace ratio criterion is much more effective than the ratio trace, which is widely used in the traditional LDA approach and its variants. We employ the trace ratio criterion in our approach. Mathematically, assume $Y = [y_{1}, y_{2}, \dots, y_{l}]$ are the embddings of $X$ (i.e., $Y = W^{T} X$ ). The within scatter matrix $S_{w}$ and the total-scatter matrix $S_{t}$ are defined as follows:

1.

S_{w} = \sum_{j = 1}^{C} \sum_{i = 1}^{l_{j}} (y_{i} - m_{j}) {(y_{i} - m_{j})}^{T} = Y L_{w} Y^{T},

S_{t} = \sum_{i = 1}^{l} (y_{i} - m) {(y_{i} - m)}^{T} = Y L_{t} Y^{T},

where

m_{j} = (1 ∕ l_{j}) \sum_{i = 1}^{l_{j}} y_{i}

(j = 1, 2, \dots, C)

is the mean of the samples in the

j

’th cluster,

m = (1 ∕ l) \sum_{i = 1}^{l} y_{i}

is the mean of all samples. The corresponding

L_{w}

and

L_{t}

are

Eq. 2

L_{w} = I - F F^{T}, L_{t} = I - \frac{1}{l} e e^{T},

where

e

is an

l

-dimensional column vector with all ones. The trace ratio criterion proposed in Ref. 7 is

Eq. 3

\arg \min_{W^{T} W = I} \frac{tr (W^{T} S_{w} W)}{tr (W^{T} S_{t} W)} .

We now consider the spatial character of images. Because an image represented in the plane is intrinsically a matrix. The pixels spatially close to each other may be correlated. Although we have $D_{1} \times D_{2}$ pixels per image, this spatial correlation suggests the real number of freedom is far less. We employ a spatial regularizer (i.e., the Laplacian penalty) to constrain the coefficients to be spatially smooth.⁶ In brief, we define the $D_{j} \times D_{j}$ $(j = 1, 2)$ matrices that yield the discrete approximation of the second-order derivation as follows:

Eq. 4

M_{j} = \frac{1}{h_{j}^{2}} (\begin{matrix} - 1 & 1 & 0 \\ 1 & - 2 & 1 \\ \dots & \dots & \dots \\ 1 & - 2 & 1 \\ 0 & 1 & - 1 \end{matrix}) .

Here,

h_{j} = 1 ∕ D_{j}

for

j = 1, 2

. The discrete approximation for the two-dimensional Laplacian is a

D \times D

matrix

Eq. 5

Δ = M_{1} \otimes I_{2} + I_{1} \otimes M_{2},

where

I_{j}

is the

D_{j} \times D_{j}

identity matrix for

j = 1, 2

. ⊗ represents the Kronecker product of two matrixes.

For a $D_{1} \times D_{2}$ image vector $x$ , ${‖ Δ x ‖}^{2}$ is proportional to the sum of squared differences between nearby grid points in that image. It can be used to measure the smoothness of an image on a $D_{1} \times D_{2}$ lattice. Because each column vector of $W$ can be regarded as a basis image, we add $α Δ^{T} Δ$ to $S_{w}$ where $0 < α < 1$ is a balance parameter that controls the smoothness of the estimator.

Finally, as in most learning-based face image processing approaches¹ we assume that the face images, which belong to the same cluster, are nearly lying on a low dimensional manifold. The graph Laplacian⁴ (i.e., $L$ ) is employed to represent this character. Intuitively, the cluster indicators of face images belonging to the same cluster should be identical. In other words, if we regard that the cluster indicators are the output of a function defined on all face images, this intuition indicates that the predefined function should be smooth on these manifolds. Mathematically, it is equal to minimize $tr (F^{T} L F)$ .

In summary, we define SCTR as the solution to the following problem:

Eq. 6

\arg \min_{W^{T} W = I, F^{T} F = I} (1 - β) \frac{tr [(1 - α) W^{T} S_{w} W + α W^{T} Δ^{T} Δ W]}{tr (W^{T} S_{t} W)} + β tr (F^{T} L F) = \arg \min_{W^{T} W = I, F^{T} F = I} (1 - β) \frac{tr {(1 - α) W^{T} [X (I - F F^{T}) X^{T} + α Δ^{T} Δ] W}}{tr (W^{T} X L_{t} X^{T} W)} + β tr (F^{T} L F),

where

0 < β < 1

is also a balance parameter.

2.2.

Solution

There are totally two different kinds of variables that should be optimized (i.e., $W$ and $F$ ). It is difficult to compute them simultaneously. We alternately optimize them.

2.2.1.

Stage one: fixing $F$ and optimizing $W$

In this situation, the optimization problem in Eq. 6 becomes

Eq. 7

\arg \min_{W^{T} W = I} \frac{tr {W^{T} [(1 - α) X (I - F F^{T}) X^{T} + α Δ^{T} Δ] W}}{tr (W^{T} X L_{t} X^{T} W)} .

It has the same form as the problem proposed in Ref. 7, except for a little differences in the formulation of the numerator. Thus, we can directly employ the same technique. It is an iterative procedure. For our problem, in the $p$ ’th step, we solve a trace difference problem

\arg \min_{W^{T} W = I} tr {W^{T} [(1 - α) X (I - F F^{T}) X^{T} + α Δ^{T} Δ - λ^{p} X L_{t} X^{T}] W},

where

λ^{p}

is the trace ratio value calculated from the previous projection matrix

W^{p - 1}

in the previous step. Please see Ref. 7 for more details.

2.2.2.

Stage two: fixing $W$ and optimizing $F$

In this case, because $L_{t}$ and $Δ$ have no relation with $F$ , the optimization problem in Eq. 6 is equivalent to

Eq. 8

\arg \max_{F^{T} F = I} tr {F^{T} [(1 - β) (1 - α) X^{T} W W^{T} X - β tr (W^{T} X L_{t} X^{T} W) L] F} .

This problem can be effectively solved by spectral decomposition technique. In fact, the optimal $F$ to the problem in Eq. 8 is formed by the eigenvectors corresponding to the $C$ largest eigenvalues of $(1 - β) (1 - α) X^{T} W W^{T} X - β tr (W^{T} X L_{t} X^{T} W) L$ .

2.3.

Discussions

There are two essential parameters (i.e., $0 < α < 1$ and $0 < β < 1$ ) in our method. They control the smoothness of estimator and the balance of two objectives. When $α = 0$ , SCTR totally ignores the spatial character of the face image. When $α \to 1$ , SCTR tends to only choose a spatial smoothest basis and ignore the discriminative information. When $β = 0$ , we ignore the smooth constraint on $F$ . If $β = 1$ , SCTR only concerns the smoothness of $F$ .

Parameter determination is an essential task in most of the learning algorithms.⁶ Among various kinds of methods, grid search is probably the simplest and most widely used one for unsupervised learning. Because we have constrained $0 < α < 1$ and $0 < β < 1$ , we apply the grid search technique in this paper. More concretely, we set $α$ and $β$ by searching the grid ${0.1, 0.2, \dots, 0.9} \times {0.1, 0.2, \dots, 0.9}$ .

There are another three problems that should be indicated here. (i) Because SCTR is an iterative approach, a direct way to initialize the iteration is to first compute $F$ by $K$ -means and then optimize the problems in Eqs. 7, 8 alternately. (ii) The optimal $F$ that maximizes Eq. 8 may not have the form of the indicator matrix that we specified in the previous part. We apply the discretization procedure in Ref. 8 to solve this problem. (iii) Because we have derived $W$ and $F$ , it is easy to compute the embedding and the cluster of a new image by using $W$ and $F$ directly.

3. Experiments

We employ four different kinds of face image data sets for illustration. They are the Yale data, the ORL data, the Umist data, and the first 20 classes of Feret data. All these images are rescaled to $16 \times 16$ resolutions.

We compare SCTR with K-means, $PCA + K$ -means, and LDA-Km. Two standard measurements—the accuracy (Acc) and normalized mutual information (NMI)—are used. The parameters $α$ and $β$ are determined by grid search and $d = C - 1$ . We randomly initialize K-means and repeat for 100 times. The average values of these Accs are listed in Table 1. Figure 1 shows the corresponding mean of these NMIs.

Fig. 1

Mean normalized mutual information (NMI) on the Yale, the ORL, the Umist and the Feret data sets.

Table 1

Mean accuracy (Acc) results of different methods on four face image data sets.

	K-means	PCA+K -means	LDA-Km	SCTR
Yale	0.379	0.385	0.459	0.491
ORL	0.497	0.513	0.607	0.674
Umist	0.381	0.383	0.468	0.501
Feret	0.374	0.372	0.526	0.551

As seen from Table 1 and Fig. 1, SCTR performs the best. It achieves both the highest mean Acc and the largest mean NMI in all cases. Because all the images are rescaled to be low resolution and the dimensionality is not so high, it seems that K-means and $PCA + K$ -means have the similar performances on these data sets. LDA-Km performs better than K-means and $PCA + K$ -means.

We have also done some experiments on the Yale data set with different parameters, which are set by the grid search. The Acc results versus different $α$ and $β$ are shown in Fig. 2. It seems that for Yale, smaller $α$ and $β$ are more suitable. The best parameter $α = 0.3$ and $β = 0.1$ .

Fig. 2

Acc versus $α$ and $β$ on Yale by grid search.

4. Conclusions

We propose a new subspace learning method for face image clustering. It integrates the spatial characters and the manifold structures of face images. By alternately computing the subspace and clusters, it performs best among all the involved approaches.

Acknowledgments

This work is partly supported by NSF China (Grants No. 60673090 and No. 60721003), the Hunan Provincial Innovation Foundation for Postgraduate and the Graduate School of National University of Defense Technology.

References

1.

P. Belhumeur, J. Hepanha, and D. Kriegman, “Eigenfaces vs. fisherfaces: recognition using class specific linear projection,” IEEE Trans. Pattern Anal. Mach. Intell., 19 (7), 711 –720 (1997). https://doi.org/10.1109/34.598228 0162-8828 Google Scholar

2.

R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed.Wiley-Interscience, Hoboken, NJ (2000). Google Scholar

3.

L. Parsons, E. Haque, and H. Liu, “Subspace clustering for high dimensional data: a review,” SIGKDD Explorations, 6 90 –105 (2004). https://doi.org/10.1145/1007730.1007731 Google Scholar

4.

M. Belkin, P. Niyogi, and V. Sindhwani, “Manifold regularization: a geometric framework for learning from examples,” J. Mach. Learn. Res., 7 2399 –2434 (2006). 1532-4435 Google Scholar

5.

C. Ding and T. Li, “Adaptive dimension reduction using discriminant analysis and K-means clustering,” Proc. 24th Int. Conf. on Machine Learning, 521 –528 (2007). https://doi.org/10.1145/1273496.1273562 Google Scholar

6.

T. Hastie, A. Buja, and R. Tibshirani, “Penalized discriminant analysis,” Ann. Stat., 23 73 –102 (1995). https://doi.org/10.1214/aos/1176324456 0090-5364 Google Scholar

7.

H. Wang, S. Yan, D. Xu, X. Tang, and T. Huang, “Trace ratio vs. ratio trace for dimensionality reduction,” (2007). Google Scholar

8.

S. Yu and J. Shi, “Multiclass spectral clustering,” 313 –319 (2003). Google Scholar

Citation Download Citation

Chenping Hou, Feiping Nie, Changshui Zhang, and Yi Wu "Learning a subspace for face image clustering via trace ratio criterion," Optical Engineering 48(6), 060501 (1 June 2009). https://doi.org/10.1117/1.3149850

Published: 1 June 2009

Access the abstract

JOURNAL ARTICLE
3 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

CITATIONS

Cited by 8 scholarly publications.

Explore citations on Lens.org

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Image processing

Image retrieval

Control systems

Defense technologies

Image resolution

Optical engineering

Image classification

1.

Introduction

2.

Leaning a Subspace for Face Image Clustering