NPDA/CS: IMPROVED NON-PARAMETRIC ...

Viewer
Transcript

Proceedings of 2010 IEEE 17th International Conference on Image Processing

September 26-29, 2010, Hong Kong

NPDA/CS: IMPROVED NON-PARAMETRIC DISCRIMINANT ANALYSIS WITH CS DECOMPOSITION AND ITS APPLICATION TO FACE RECOGNITION Qingsong Zeng, Changdong Wang School of Information Science and Technology Sun Yat-sen University, Guangzhou, 510006, China [email protected], [email protected] ABSTRACT Fisher’s Linear Discriminant Analysis (FLDA) uses the parametric form of the scatter matrix which is based on the Gaussian distribution assumption, and requires the scatter matrices to be nonsingular, which can not always be satisﬁed. To overcome this problem, many scholars have recently proposed Non-parametric Discriminant Analysis (NPDA), addressing the non-Gaussian aspects of sample distributions. In this paper, from the nearest neighborhood perspective, a new formulation of scatter matrices is presented to improve the NPDA, simultaneously emphasizing the boundary information and local structure contained in the training set. Then, CS decomposition is incorporated to improve its performance. Experimental results on 4 databases demonstrate the effectiveness of the improved method. Index Terms— LDA, boundary information, local structure, non-parametric discriminant analysis, CS decomposition 1. INTRODUCTION Recently, discriminant feature extraction has received great attention in many pattern recognition applications, such as face recognition, action recognition, etc. Lots of linear feature extraction methods have been proposed. Among these algorithms, Fisher’s Linear Discriminant Analysis (FLDA) is the most popular one. It uses the parametric form of the scatter matrices based on the Gaussian distribution assumption, so it can be also regarded as one of the Parametric Discriminant Analysis (PDA). Moreover, FLDA requires the total scatter matrix Sw to be nonsingular. Unfortunately, this condition can not always be satisﬁed, since in many small sample size problem (SSSP) applications, all scatter matrices may be singular. By far, at least three main methods have been proposed to overcome this problem. The ﬁrst one is to apply PCA to reduce the dimension of the original data before classical LDA is performed [1]. The second one is called Regularized LDA [2], which incorporates mechanism of regularization to This project was partially supported by the NSF-Guangdong (U083500).

978-1-4244-7994-8/10/$26.00 ©2010 IEEE

4537

deal with the singularity of Sw . The last one is Uncorrelated LDA(ULDA) [3], which extracts feature vectors with uncorrelated attribution, while uncorrelated features are desirable in many application, since they contain minimum redundancy. In deriving the FLDA’s formulation, there is an assumption that the class empirical mean is equal to its expectation. However, this assumption may not be valid in practice. FLDA makes data of the same class close to their corresponding class means. Since the number of samples for each class is always limited, the estimates of class means are not accurate, and this would degrade the effectiveness of Fisher criterion. Hence, Fukunaga [4] presented Non-parametric Discriminant Analysis (NPDA) to overcome the problem by introducing a new deﬁnition for the between-class scatter matrix, which explicitly emphasizes the samples near boundary. Then Bressan M. et al. [5] and Zhifeng Li et al. [6] improved the NPDA, in which propose a new formulation of scatter matrices to extend the two-class NPDA to multiclass cases. Based on their research, observing NPDA from a nearest neighborhood perspective, we introduce a modiﬁcation of the original algorithm called Non-parametric Discriminant Analysis with Cosine-sine Decomposition (NPDA/CS). In this paper, we ﬁrst propose a new formulation of between/within-class and total scatter matrix by emphasizing the boundary information and local structure contained in the training set to improve the NPDA algorithm, in which the three half scatter matrices have nonparametric form. Then we investigate the idea of simultaneously diagonalizing matrices Sb and Sw by Cosine-sine decomposition (CSD) [7, 8], so as to handle the problems that LDA suffers. The rest of this paper is organized as follows. In section 2, we introduce the related work prior to our algorithm. The improved NPDA/CS algorithm for dimensionality reduction is described in Section 3. The experimental results are presented in Section 4. Section 5 draws the conclusions of the work. Some words about our notation. Lower-case letters such as i, j, k, l, and c represent the numbers or indexes. Capital letters such as A, G, and X represent matrices. Lower-case bold letters such as x, y represent vectors (samples). Script letters such as C, X represent sets.

ICIP 2010

neighbor mean of xki from class Cl as follows: β(xki , xlj )xlj μ(xki , l, p) =

2. RELATED WORK 2.1. Fisher’s LDA Consider the problems of training a classiﬁer with c classes. Suppose the data space is a compact vector space of dimension d, and a training set X = {xi : i = 1, . . . , n} consists of n samples with each point xi already being assigned to some class, say, xi ∈ Ck . Thus it can be further denoted as xkj , which means it is the jth sample from class Ck . Let nk denote the number of samples in class Ck , that is, nk = |Ck |. In FLDA, the between/within-class scatter matrices respectively c and total scatter matrix are cdeﬁned nkas Skb = 1 1 T n (μ − μ)(μ − μ) , S = w k k k=1 k k=1 i=1 (xi − n n n μk )(xki −μk )T , St = i=1 (xi −μ)(xi −μ)T , where μk and μ denote the mean of the class Ck and total samples respectively. The goal of FLDA is to compute the optimal transformation matrix G that can ﬁnd the most discriminative features by maximizing the ration of the determinant of the betweenclass scatter matrix to that of the within-class scatter matrix. −1

G = arg max tr((GT Sw G) G

(GT Sb G))

(1)

The optimal transformation can be readily computed by ﬁnding all the eigenvectors that satisfy Sb w = λSw w, for λ = 0.

(2)

xlj ∈Np (xk i ,l)

where β(xki , xlj ) is a weight function between xki and xlj sat isfying xl ∈Np (xk ,l) β(xki , xlj ) = 1. j i The non-parametric between-class scatter matrix for multi-class problem deﬁned as follow [6]: SbN =

nk c c−1

w(xki , l, p)(μ(xki , k, p)−

k=1 l=k+1 i=1

(3)

μ(xki , l, p))(μ(xki , k, p) − μ(xki , l, p))T where w(xki , l, p) is a weighting function, deﬁned as min{dα (xki , μ(xki , k, p)), dα (xki , μ(xki , l, p))} dα (xki , μ(xki , k, p)) + dα (xki , μ(xki , l, p)) (4) with α ∈ (0, ∞) controlling the changing speed of the weight with respect to the distance ratio, and d(u, v) being the Euclidean distance between u and v. For samples near the classiﬁcation boundary, the weight approaches 0.5; if the samples are far away from the classiﬁcation boundary the weight drops off to zero. w(xki , l, p) =

2.2. Alternative Expression of Scatter Matrices It should be pointed out that the above described FLDA has an alternative expression for their scatter matrices. Let’s ﬁrst stack the samples in data set X into a partitioned matrix according to their class labels, that is X = [X1 , . . . , Xc ] with Xk ∈ Rd×nk being the data matrix consisting of all samples from Ck . Half between-class scatter matrix Hb , half withinclass scatter matrix Hw , and half total scatter matrix Ht are 1 , · · · , X c , Hb = deﬁned respectively as: Hw = √1n X √ √ √1 n1 (μ1 − μ), · · · , nc (μc − μ) , Ht = √1n (X − n k = Xk − μk (ek )T ,ek = [1, . . . , 1]T ∈ Rnk , μeT ), where X and e = [1, . . . , 1]T ∈ Rn . Thus the three scatter matrix can be expressed as Sb = Hb HbT ,Sw = Hw HwT and St = Ht HtT . 2.3. Non-parametric Discriminant Analysis Non-parametric Discriminant Analysis (NPDA) addresses the non-Gaussian aspects of sample distributions, which introduce a non-parametric between-class matrix measuring between-class scatter on local basis in the neighborhood of the decision boundary. Fukunaga et al. [4] presented twoclass NPDA to overcome the problem by introducing a new deﬁnition for the between-class scatter matrix, which explicitly emphasizes the samples near boundary. And Zhifeng Li et al. [6] extended the two-class NPDA to multi-class cases by propose a new formulation of scatter matrices. Let Np (xki , l) denotes the subset consisting of p nearest neighbors of xki from class Cl . We deﬁne the local nearest

4538

3. NON-PARAMETRIC DISCRIMINANT ANALYSIS WITH COSINE-SINE DECOMPOSITION In this section, from nearest neighborhood perspective, a new formulation of scatter matrices is presented to improve the NPDA, simultaneously emphasizing the boundary information and local structure contained in the training set. Then, CS decomposition is incorporated to simultaneously diagonalize matrices Sb and Sw so as to handle the problems that LDA suffers. 3.1. Construction of the scatter matrices The main idea of the proposed scatter matrices lies in that if vectors near the classiﬁcation boundary are only selected, then the scatter matrix constructed can specify the subspace into which the boundary information is embedded. Samples that are far away from the boundary may exert a considerable inﬂuence on the scatter matrix and distort the information of boundary structure. Let Np (xki ) denotes the subset consisting of p nearest neighbors of xki from any class. We deﬁne the local nearest neighbor mean of xki as follows: β(xki , x)x (5) μ(xki , p) = x∈Np (xk i)

If we deﬁned two partitioned matrix A = [A1 , . . . , Ac ] and B = [B1 , . . . , Bc ] with Ak (:, i) = xki − μ(xki , k, p), and

Fig. 1. Non-parametric between-class scatter matrix. v: the local nearest neighbor mean connection vector between μ(x1i , 1, p) and μ(x1i , 2, p). Bk (:, i) = xki − μ(xki , p), then generalized non-parametric half between/within-class scatter matrices and half total scatter matrix are deﬁned as nk c nk Hb (:, k) = w(xki , l, p)(μ(xki , k, p)−μ(xki , l, p)) n i=1 l=1 (6) 1 (7) Hw = √ [A1 , . . . , Ac ] n 1 Ht = √ [B1 , . . . , Bc ] n

(8)

where w(xki , l, p) deﬁned in Eq. 4. As illustrated in Fig. 1, there are two advantages in the new design. The nonparametric between-class scatter matrix spans a subspace where the local structure is embedded. The other is the new weighting function can help emphasize the sample near the boundary of two classes and thus can capture the boundary structure information more effectively. 3.2. Non-parametric Discriminant Analysis with Cosinesine Decomposition With the help of generalized non-parametric half scatter matrices computed above, we can derive the Non-parametric discriminant analysis algorithm as summarized in Algorithm 1. The improved NPDA/CS algorithm is essentially an application of CS decomposition of half scatter matrices.

Algorithm 1 Non-parametric discriminant analysis with CSD Input: partitioned data matrix X = [X1 , . . . , Xc ]. Output: transformation matrix G. 1. Construct Hb , Hw , and Ht . 2. Compute SVD decomposition of Ht , that is Ht = U1 D1 V1T and let Upca = U1 (:, 1 : rank(U1 )), save Upca . 3. Apply PCA to Hb and Hw : T T Hb = Upca Hb , Hw = Upca Hw . T T T 4. Let F = [Hb ; Hw ] and apply QL decomposition to F F = QL, save Q and L. 5. Apply CS decomposition to Q so as to get matrix W . 6. Let Y = LT W , and compute orthogonal matrix Φ by QL decomposition of Y , that is Z = ΦL1 . And so we get Z = Φ(LT1 )−1 . 7. Let q = rank(Hb ). 8. Let G∗ = [Z1 , Z2 , . . . , Zq ], where Zi is the i-th column of Z. 9. Output the transformation matrix G = Upca G∗ .

OLDA [9] and ULDA [10]. The 4 databases used are Orl, Yale, YaleB and CMU PIE. The experimental results demonstrate the effectiveness of our improved method. 4.1. Methodology In all experiments, original images were normalized (in scale and orientation) by ﬁxing the locations of two eyes. Then, the facial were cropped into the ﬁnal image for matching. The size of each cropped image in the experiments was 32 × 32, with 256 gray levels per pixel. Thus each image can be represented by a 1024-dimensional vector in image space. The simple k-nearest neighbor (k-NN) classiﬁer was used on all algorithms with k = 1. The number of nearest neighbors used to construct the half scatter matrices in NPDA is 5. On databases of Orl and Yale, 7 face images for each person were used as training set, while the remaining 3 face images were used as test set. On database of YaleB, 10 face images per person were used as training set, while the remaining face images were used as test set. Since the number of face images in PIE databases is larger than others, we conducted two experiments on it, namely, PIE1 and PIE2. In PIE1, 20 face images per person were used as training set, while in PIE2, the number of training face images was 60. 4.2. Results

All experiments were repeated 20 times. The means and standard deviations of 4 algorithms on all databases are shown in Table. 1. It is clear that the improved NPDA/CS algorithm achieves the best performance among the compared alIn this section, experimental results are presented to comgorithms on all databases. This may due to the advantages pare the improved NPDA method to 3 algorithms, over 4 databases. The three compared methods include PCA+LDA [1], of NPDA/CS in emphasizing boundary information and local 4. EXPERIMENTAL RESULTS

4539

Table 1. The means and standard deviations of 4 algorithms running 20 times on all databases. Methods Orl Yale YaleB PIE1 PIE2 PCA+LDA [1] 95.67 ± 1.79 80.25 ± 4.72 73.34 ± 6.73 78.47 ± 0.66 94.53 ± 0.23 OLDA [9] 97.50 ± 1.60 82.50 ± 4.17 79.02 ± 1.51 81.17 ± 0.64 94.80 ± 0.21 ULDA [10] 95.88 ± 1.94 80.75 ± 5.45 69.14 ± 9.53 77.38 ± 0.70 93.57 ± 0.26 NPDA/CS 97.63 ± 1.51 82.67 ± 3.99 79.31 ± 1.45 83.94 ± 0.64 96.17 ± 0.22

face database: ORL

face database: Yale

1

6. REFERENCES

0.9

0.9

0.8

0.8 0.7

[1] Yang J. and Yang J-Y., “Why can lda be performed in pca transformed space?,” Pattern Recognition, vol. 36, pp. 563–566, 2003.

recognition rate

recognition rate

0.7 0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.1 0

0.3

OLDA ULDA LDA NPDA/CS

0.2

0

5

10

15 20 25 discriminant vectors

30

35

OLDA ULDA LDA NPDA/CS

0.2 0.1

40

0

2

(a) Orl database

4

12

[2] Dai D-Q and Yuen P. C., “Regularized discriminant analysis and its application to face recognition,” Pattern Recognition, vol. 36, pp. 845–847, 2003.

14

face database: PIE 0.9

0.7

0.8

[3] PARK C H Ye J. YE J, JANARDAN R, Janardan R., and Park C. H., “An optimization criterion for generalized discriminant analysis on undersampled problems,” IEEE Trans. PAMI, vol. 26, pp. 982–994, 2004.

0.7

0.6

0.6

0.5

recognition rate

recognition rate

10

(b) Yale database

face database: YaleB 0.8

0.4 0.3

0.5 0.4 0.3

0.2

OLDA ULDA LDA NPDA/CS

0.1 0

6 8 discriminant vectors

0

5

10

15 20 25 discriminant vectors

30

35

0.1

40

0

[4] Fukunaga K., Introduction to statistical pattern recognition, Boston: Academic Press, 1990.

OLDA ULDA LDA NPDA/CS

0.2

0

10

(c) YaleB database

20

30 40 discriminant vectors

50

60

70

(d) Pie database

Fig. 2. Performance of 4 algorithms on all databases with different number of discriminant features used.

structure contained in the training set. We made a further investigation in the performance of the compared algorithms when different number of discriminant features was used. Fig. 2 plots the compared results on all databases. It’s obvious that the improved NPDA/CS achieves the best results in all size of discriminant features used.

[5] Bressan M. and Vitria J., “Nonparametric discriminant analysis and nearest neighbor classiﬁcation,” Pattern Recognition, vol. 24, pp. 2743–2749, 2003. [6] Li Z., Lin D., and Tang X., “Nonparametric discriminant analysis for face recognition,” IEEE Trans. PAMI, vol. 31, pp. 755–761, 2009. [7] Paige C. C. and Saunders M. A., “Towards a generalized singular value decomposition,” SIAM Journal on Numerical Analysis., vol. 18, pp. 298–405, 1981. [8] Golub G. and Van Loan C., Matrix computations, Johns Hopkins University Press, 1996. [9] Ye J., “Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems,” J. Mach. Learn. Res., vol. 6, pp. 483–502, 2005.

5. CONCLUSIONS In this paper, we have presented a new algorithm using Cosine-sine decomposition to improve the Non-parametric Discriminant Analysis algorithm for discriminant feature analysis and applied it to face recognition. The improved method explicitly emphasizes boundary information and local structure contained in the training set. Moreover, CS decomposition of the half scatter matrices is adopted to improve the discriminant effectiveness. Compared with 3 state-of-the-art algorithms on 4 databases, experimental results on NPDA/CS is encouraging.

4540

[10] Ye J., Li T., and Xiong T., “Using uncorrelated discriminant analysis for tissue classiﬁcation with gene expression data,” IEEE Trans. Computational Biology and Bioinformatics, vol. 1, pp. 181–190, 2004.

NPDA/CS: IMPROVED NON-PARAMETRIC ...

per, from the nearest neighborhood perspective, a new formu- lation of scatter ... imental results on 4 databases demonstrate the effectiveness of the improved ...

Download PDF

164KB Sizes 1 Downloads 140 Views

Report

NPDA/CS: IMPROVED NON-PARAMETRIC ...

Recommend Documents