Proceedings of 2010 IEEE 17th International Conference on Image Processing

September 26-29, 2010, Hong Kong

NPDA/CS: IMPROVED NON-PARAMETRIC DISCRIMINANT ANALYSIS WITH CS DECOMPOSITION AND ITS APPLICATION TO FACE RECOGNITION Qingsong Zeng, Changdong Wang School of Information Science and Technology Sun Yat-sen University, Guangzhou, 510006, China [email protected], [email protected] ABSTRACT Fisher’s Linear Discriminant Analysis (FLDA) uses the parametric form of the scatter matrix which is based on the Gaussian distribution assumption, and requires the scatter matrices to be nonsingular, which can not always be satisfied. To overcome this problem, many scholars have recently proposed Non-parametric Discriminant Analysis (NPDA), addressing the non-Gaussian aspects of sample distributions. In this paper, from the nearest neighborhood perspective, a new formulation of scatter matrices is presented to improve the NPDA, simultaneously emphasizing the boundary information and local structure contained in the training set. Then, CS decomposition is incorporated to improve its performance. Experimental results on 4 databases demonstrate the effectiveness of the improved method. Index Terms— LDA, boundary information, local structure, non-parametric discriminant analysis, CS decomposition 1. INTRODUCTION Recently, discriminant feature extraction has received great attention in many pattern recognition applications, such as face recognition, action recognition, etc. Lots of linear feature extraction methods have been proposed. Among these algorithms, Fisher’s Linear Discriminant Analysis (FLDA) is the most popular one. It uses the parametric form of the scatter matrices based on the Gaussian distribution assumption, so it can be also regarded as one of the Parametric Discriminant Analysis (PDA). Moreover, FLDA requires the total scatter matrix Sw to be nonsingular. Unfortunately, this condition can not always be satisfied, since in many small sample size problem (SSSP) applications, all scatter matrices may be singular. By far, at least three main methods have been proposed to overcome this problem. The first one is to apply PCA to reduce the dimension of the original data before classical LDA is performed [1]. The second one is called Regularized LDA [2], which incorporates mechanism of regularization to This project was partially supported by the NSF-Guangdong (U083500).

978-1-4244-7994-8/10/$26.00 ©2010 IEEE

4537

deal with the singularity of Sw . The last one is Uncorrelated LDA(ULDA) [3], which extracts feature vectors with uncorrelated attribution, while uncorrelated features are desirable in many application, since they contain minimum redundancy. In deriving the FLDA’s formulation, there is an assumption that the class empirical mean is equal to its expectation. However, this assumption may not be valid in practice. FLDA makes data of the same class close to their corresponding class means. Since the number of samples for each class is always limited, the estimates of class means are not accurate, and this would degrade the effectiveness of Fisher criterion. Hence, Fukunaga [4] presented Non-parametric Discriminant Analysis (NPDA) to overcome the problem by introducing a new definition for the between-class scatter matrix, which explicitly emphasizes the samples near boundary. Then Bressan M. et al. [5] and Zhifeng Li et al. [6] improved the NPDA, in which propose a new formulation of scatter matrices to extend the two-class NPDA to multiclass cases. Based on their research, observing NPDA from a nearest neighborhood perspective, we introduce a modification of the original algorithm called Non-parametric Discriminant Analysis with Cosine-sine Decomposition (NPDA/CS). In this paper, we first propose a new formulation of between/within-class and total scatter matrix by emphasizing the boundary information and local structure contained in the training set to improve the NPDA algorithm, in which the three half scatter matrices have nonparametric form. Then we investigate the idea of simultaneously diagonalizing matrices Sb and Sw by Cosine-sine decomposition (CSD) [7, 8], so as to handle the problems that LDA suffers. The rest of this paper is organized as follows. In section 2, we introduce the related work prior to our algorithm. The improved NPDA/CS algorithm for dimensionality reduction is described in Section 3. The experimental results are presented in Section 4. Section 5 draws the conclusions of the work. Some words about our notation. Lower-case letters such as i, j, k, l, and c represent the numbers or indexes. Capital letters such as A, G, and X represent matrices. Lower-case bold letters such as x, y represent vectors (samples). Script letters such as C, X represent sets.

ICIP 2010

neighbor mean of xki from class Cl as follows:  β(xki , xlj )xlj μ(xki , l, p) =

2. RELATED WORK 2.1. Fisher’s LDA Consider the problems of training a classifier with c classes. Suppose the data space is a compact vector space of dimension d, and a training set X = {xi : i = 1, . . . , n} consists of n samples with each point xi already being assigned to some class, say, xi ∈ Ck . Thus it can be further denoted as xkj , which means it is the jth sample from class Ck . Let nk denote the number of samples in class Ck , that is, nk = |Ck |. In FLDA, the between/within-class scatter matrices respectively c and total scatter matrix are cdefined nkas Skb = 1 1 T n (μ − μ)(μ − μ) , S = w k k k=1 k k=1 i=1 (xi − n n  n μk )(xki −μk )T , St = i=1 (xi −μ)(xi −μ)T , where μk and μ denote the mean of the class Ck and total samples respectively. The goal of FLDA is to compute the optimal transformation matrix G that can find the most discriminative features by maximizing the ration of the determinant of the betweenclass scatter matrix to that of the within-class scatter matrix. −1

G = arg max tr((GT Sw G) G

(GT Sb G))

(1)

The optimal transformation can be readily computed by finding all the eigenvectors that satisfy Sb w = λSw w, for λ = 0.

(2)

xlj ∈Np (xk i ,l)

where β(xki , xlj ) is a weight function between xki and xlj sat isfying xl ∈Np (xk ,l) β(xki , xlj ) = 1. j i The non-parametric between-class scatter matrix for multi-class problem defined as follow [6]: SbN =

nk c c−1   

w(xki , l, p)(μ(xki , k, p)−

k=1 l=k+1 i=1

(3)

μ(xki , l, p))(μ(xki , k, p) − μ(xki , l, p))T where w(xki , l, p) is a weighting function, defined as min{dα (xki , μ(xki , k, p)), dα (xki , μ(xki , l, p))} dα (xki , μ(xki , k, p)) + dα (xki , μ(xki , l, p)) (4) with α ∈ (0, ∞) controlling the changing speed of the weight with respect to the distance ratio, and d(u, v) being the Euclidean distance between u and v. For samples near the classification boundary, the weight approaches 0.5; if the samples are far away from the classification boundary the weight drops off to zero. w(xki , l, p) =

2.2. Alternative Expression of Scatter Matrices It should be pointed out that the above described FLDA has an alternative expression for their scatter matrices. Let’s first stack the samples in data set X into a partitioned matrix according to their class labels, that is X = [X1 , . . . , Xc ] with Xk ∈ Rd×nk being the data matrix consisting of all samples from Ck . Half between-class scatter matrix Hb , half withinclass scatter matrix Hw , and half total scatter matrix  Ht are 1 , · · · , X  c , Hb = defined respectively as: Hw = √1n X  √ √ √1 n1 (μ1 − μ), · · · , nc (μc − μ) , Ht = √1n (X − n k = Xk − μk (ek )T ,ek = [1, . . . , 1]T ∈ Rnk , μeT ), where X and e = [1, . . . , 1]T ∈ Rn . Thus the three scatter matrix can be expressed as Sb = Hb HbT ,Sw = Hw HwT and St = Ht HtT . 2.3. Non-parametric Discriminant Analysis Non-parametric Discriminant Analysis (NPDA) addresses the non-Gaussian aspects of sample distributions, which introduce a non-parametric between-class matrix measuring between-class scatter on local basis in the neighborhood of the decision boundary. Fukunaga et al. [4] presented twoclass NPDA to overcome the problem by introducing a new definition for the between-class scatter matrix, which explicitly emphasizes the samples near boundary. And Zhifeng Li et al. [6] extended the two-class NPDA to multi-class cases by propose a new formulation of scatter matrices. Let Np (xki , l) denotes the subset consisting of p nearest neighbors of xki from class Cl . We define the local nearest

4538

3. NON-PARAMETRIC DISCRIMINANT ANALYSIS WITH COSINE-SINE DECOMPOSITION In this section, from nearest neighborhood perspective, a new formulation of scatter matrices is presented to improve the NPDA, simultaneously emphasizing the boundary information and local structure contained in the training set. Then, CS decomposition is incorporated to simultaneously diagonalize matrices Sb and Sw so as to handle the problems that LDA suffers. 3.1. Construction of the scatter matrices The main idea of the proposed scatter matrices lies in that if vectors near the classification boundary are only selected, then the scatter matrix constructed can specify the subspace into which the boundary information is embedded. Samples that are far away from the boundary may exert a considerable influence on the scatter matrix and distort the information of boundary structure. Let Np (xki ) denotes the subset consisting of p nearest neighbors of xki from any class. We define the local nearest neighbor mean of xki as follows:  β(xki , x)x (5) μ(xki , p) = x∈Np (xk i)

If we defined two partitioned matrix A = [A1 , . . . , Ac ] and B = [B1 , . . . , Bc ] with Ak (:, i) = xki − μ(xki , k, p), and

Fig. 1. Non-parametric between-class scatter matrix. v: the local nearest neighbor mean connection vector between μ(x1i , 1, p) and μ(x1i , 2, p). Bk (:, i) = xki − μ(xki , p), then generalized non-parametric half between/within-class scatter matrices and half total scatter matrix are defined as nk  c nk  Hb (:, k) = w(xki , l, p)(μ(xki , k, p)−μ(xki , l, p)) n i=1 l=1 (6) 1 (7) Hw = √ [A1 , . . . , Ac ] n 1 Ht = √ [B1 , . . . , Bc ] n

(8)

where w(xki , l, p) defined in Eq. 4. As illustrated in Fig. 1, there are two advantages in the new design. The nonparametric between-class scatter matrix spans a subspace where the local structure is embedded. The other is the new weighting function can help emphasize the sample near the boundary of two classes and thus can capture the boundary structure information more effectively. 3.2. Non-parametric Discriminant Analysis with Cosinesine Decomposition With the help of generalized non-parametric half scatter matrices computed above, we can derive the Non-parametric discriminant analysis algorithm as summarized in Algorithm 1. The improved NPDA/CS algorithm is essentially an application of CS decomposition of half scatter matrices.

Algorithm 1 Non-parametric discriminant analysis with CSD Input: partitioned data matrix X = [X1 , . . . , Xc ]. Output: transformation matrix G. 1. Construct Hb , Hw , and Ht . 2. Compute SVD decomposition of Ht , that is Ht = U1 D1 V1T and let Upca = U1 (:, 1 : rank(U1 )), save Upca . 3. Apply PCA to Hb and Hw : T T Hb = Upca Hb , Hw = Upca Hw . T T T 4. Let F = [Hb ; Hw ] and apply QL decomposition to F F = QL, save Q and L. 5. Apply CS decomposition to Q so as to get matrix W . 6. Let Y = LT W , and compute orthogonal matrix Φ by QL decomposition of Y , that is Z = ΦL1 . And so we get Z = Φ(LT1 )−1 . 7. Let q = rank(Hb ). 8. Let G∗ = [Z1 , Z2 , . . . , Zq ], where Zi is the i-th column of Z. 9. Output the transformation matrix G = Upca G∗ .

OLDA [9] and ULDA [10]. The 4 databases used are Orl, Yale, YaleB and CMU PIE. The experimental results demonstrate the effectiveness of our improved method. 4.1. Methodology In all experiments, original images were normalized (in scale and orientation) by fixing the locations of two eyes. Then, the facial were cropped into the final image for matching. The size of each cropped image in the experiments was 32 × 32, with 256 gray levels per pixel. Thus each image can be represented by a 1024-dimensional vector in image space. The simple k-nearest neighbor (k-NN) classifier was used on all algorithms with k = 1. The number of nearest neighbors used to construct the half scatter matrices in NPDA is 5. On databases of Orl and Yale, 7 face images for each person were used as training set, while the remaining 3 face images were used as test set. On database of YaleB, 10 face images per person were used as training set, while the remaining face images were used as test set. Since the number of face images in PIE databases is larger than others, we conducted two experiments on it, namely, PIE1 and PIE2. In PIE1, 20 face images per person were used as training set, while in PIE2, the number of training face images was 60. 4.2. Results

All experiments were repeated 20 times. The means and standard deviations of 4 algorithms on all databases are shown in Table. 1. It is clear that the improved NPDA/CS algorithm achieves the best performance among the compared alIn this section, experimental results are presented to comgorithms on all databases. This may due to the advantages pare the improved NPDA method to 3 algorithms, over 4 databases. The three compared methods include PCA+LDA [1], of NPDA/CS in emphasizing boundary information and local 4. EXPERIMENTAL RESULTS

4539

Table 1. The means and standard deviations of 4 algorithms running 20 times on all databases. Methods Orl Yale YaleB PIE1 PIE2 PCA+LDA [1] 95.67 ± 1.79 80.25 ± 4.72 73.34 ± 6.73 78.47 ± 0.66 94.53 ± 0.23 OLDA [9] 97.50 ± 1.60 82.50 ± 4.17 79.02 ± 1.51 81.17 ± 0.64 94.80 ± 0.21 ULDA [10] 95.88 ± 1.94 80.75 ± 5.45 69.14 ± 9.53 77.38 ± 0.70 93.57 ± 0.26 NPDA/CS 97.63 ± 1.51 82.67 ± 3.99 79.31 ± 1.45 83.94 ± 0.64 96.17 ± 0.22

face database: ORL

face database: Yale

1

6. REFERENCES

0.9

0.9

0.8

0.8 0.7

[1] Yang J. and Yang J-Y., “Why can lda be performed in pca transformed space?,” Pattern Recognition, vol. 36, pp. 563–566, 2003.

recognition rate

recognition rate

0.7 0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.1 0

0.3

OLDA ULDA LDA NPDA/CS

0.2

0

5

10

15 20 25 discriminant vectors

30

35

OLDA ULDA LDA NPDA/CS

0.2 0.1

40

0

2

(a) Orl database

4

12

[2] Dai D-Q and Yuen P. C., “Regularized discriminant analysis and its application to face recognition,” Pattern Recognition, vol. 36, pp. 845–847, 2003.

14

face database: PIE 0.9

0.7

0.8

[3] PARK C H Ye J. YE J, JANARDAN R, Janardan R., and Park C. H., “An optimization criterion for generalized discriminant analysis on undersampled problems,” IEEE Trans. PAMI, vol. 26, pp. 982–994, 2004.

0.7

0.6

0.6

0.5

recognition rate

recognition rate

10

(b) Yale database

face database: YaleB 0.8

0.4 0.3

0.5 0.4 0.3

0.2

OLDA ULDA LDA NPDA/CS

0.1 0

6 8 discriminant vectors

0

5

10

15 20 25 discriminant vectors

30

35

0.1

40

0

[4] Fukunaga K., Introduction to statistical pattern recognition, Boston: Academic Press, 1990.

OLDA ULDA LDA NPDA/CS

0.2

0

10

(c) YaleB database

20

30 40 discriminant vectors

50

60

70

(d) Pie database

Fig. 2. Performance of 4 algorithms on all databases with different number of discriminant features used.

structure contained in the training set. We made a further investigation in the performance of the compared algorithms when different number of discriminant features was used. Fig. 2 plots the compared results on all databases. It’s obvious that the improved NPDA/CS achieves the best results in all size of discriminant features used.

[5] Bressan M. and Vitria J., “Nonparametric discriminant analysis and nearest neighbor classification,” Pattern Recognition, vol. 24, pp. 2743–2749, 2003. [6] Li Z., Lin D., and Tang X., “Nonparametric discriminant analysis for face recognition,” IEEE Trans. PAMI, vol. 31, pp. 755–761, 2009. [7] Paige C. C. and Saunders M. A., “Towards a generalized singular value decomposition,” SIAM Journal on Numerical Analysis., vol. 18, pp. 298–405, 1981. [8] Golub G. and Van Loan C., Matrix computations, Johns Hopkins University Press, 1996. [9] Ye J., “Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems,” J. Mach. Learn. Res., vol. 6, pp. 483–502, 2005.

5. CONCLUSIONS In this paper, we have presented a new algorithm using Cosine-sine decomposition to improve the Non-parametric Discriminant Analysis algorithm for discriminant feature analysis and applied it to face recognition. The improved method explicitly emphasizes boundary information and local structure contained in the training set. Moreover, CS decomposition of the half scatter matrices is adopted to improve the discriminant effectiveness. Compared with 3 state-of-the-art algorithms on 4 databases, experimental results on NPDA/CS is encouraging.

4540

[10] Ye J., Li T., and Xiong T., “Using uncorrelated discriminant analysis for tissue classification with gene expression data,” IEEE Trans. Computational Biology and Bioinformatics, vol. 1, pp. 181–190, 2004.

NPDA/CS: IMPROVED NON-PARAMETRIC ...

per, from the nearest neighborhood perspective, a new formu- lation of scatter ... imental results on 4 databases demonstrate the effectiveness of the improved ...

164KB Sizes 1 Downloads 109 Views

Recommend Documents

Bicycle with improved frame configuration
Jul 24, 2006 - Page 10 .... While some bicycle frame builders have merely substi tuted tubes made .... combination, at the top of said airfoil seat tube, and rear.

Nonparametric Hierarchical Bayesian Model for ...
results of alternative data-driven methods in capturing the category structure in the ..... free energy function F[q] = E[log q(h)] − E[log p(y, h)]. Here, and in the ...

Nonparametric Euler Equation Identification and ... - Boston College
Sep 24, 2015 - the solution of equation (1) has a well-posed generalized inverse, ...... Together, these allow us to establish nonparametric global point iden-.

Nonparametric Euler Equation Identification and ... - Boston College
Sep 24, 2015 - (1997), Newey and Powell (2003), Ai and Chen (2003) and Darolles .... estimation problems include Carrasco and Florens (2000), Ai and Chen.

Nonparametric Estimation of Triangular Simultaneous ...
Oct 6, 2015 - penalization procedure is also justified in the context of design density. ...... P0 is a projection matrix, hence is p.s.d, the second term of (A.21).

Incremental Learning of Nonparametric Bayesian ...
Jan 31, 2009 - Mixture Models. Conference on Computer Vision and Pattern Recognition. 2008. Ryan Gomes (CalTech). Piero Perona (CalTech). Max Welling ...

A Tail-Index Nonparametric Test
Feb 1, 2010 - In our application, the tail index gives a measure of bid clustering around .... data. Collusion implies that bids will be close to the reserve. To the ...

A Tail−Index Nonparametric Test
that with common values, an open auction is revenue superior to the first−price, .... the asymptotic variance designed for auction data, and compare it to Hill ...

A Nonparametric Variance Decomposition Using Panel Data
Oct 20, 2014 - In Austrian data, we find evidence that heterogeneity ...... analytical standard errors for our estimates without imposing functional forms on Fi, we.

Polyoxypropylene/polyoxyethylene copolymers with improved ...
Dec 9, 1999 - 307—310 (1973). Grover, F.L., et al., “The Effect of Pluronic® F—68 On ..... National Institute of Health, Final Report: Supercritical. Fluid Fractionation of ... the Chemistry, Physics, and Technology of High Polymeric. Substanc

Bicycle with improved frame configuration
Jul 24, 2006 - support, and may include a fork croWn, tWo front Wheel support structures or blades running from said fork croWn to the center of front Wheel, ...

Nonparametric/semiparametric estimation and testing ...
Mar 6, 2012 - Density Estimation Main Results Examples ..... Density Estimation Main Results Examples. Specification Test for a Parametric Model.

Robust Nonparametric Confidence Intervals for ...
results are readily available in R and STATA using our companion software ..... mance in finite samples by accounting for the added variability introduced by.

Supplement to "Robust Nonparametric Confidence ...
Page 1 ... INTERVALS FOR REGRESSION-DISCONTINUITY DESIGNS”. (Econometrica ... 38. S.2.6. Consistent Bandwidth Selection for Sharp RD Designs .

Variational Nonparametric Bayesian Hidden Markov ...
[email protected], [email protected]. ABSTRACT. The Hidden Markov Model ... nite number of hidden states and uses an infinite number of Gaussian components to support continuous observations. An efficient varia- tional inference ...

Scalable Dynamic Nonparametric Bayesian ... - Research at Google
cation, social media and tracking of user interests. 2 Recurrent Chinese .... For each storyline we list the top words in the left column, and the top named entities ...

Polyoxypropylene/polyoxyethylene copolymers with improved ...
Dec 9, 1999 - tionation”, Transfusion, vol. 28, pp. 375—378 (1987). Lane, T.A., et al., Paralysis of phagocyte migration due to an arti?cial blood substitute, Blood, vol. 64, pp. 400—405. (1984). Spiess, B.D., et al., “Protection from cerebra

IMPROVED DISCRIMINATIVE TRAINING ...
Techniques for improving lattice-based Maximum Mu- ... 2. MMIE OBJECTIVE FUNCTION. MMIE training was proposed in [1] as an alternative to .... This stage.

Nonparametric Hierarchical Bayesian Model for ...
employed in fMRI data analysis, particularly in modeling ... To distinguish these functionally-defined clusters ... The next layer of this hierarchical model defines.

Nonparametric/semiparametric estimation and testing ...
Mar 6, 2012 - Consider a stochastic smoothing parameter h with h/h0 p−→ 1. We want to establish the asymptotic distribution of ˆf(x, h). If one can show that.