Biometric Score Fusion through Discriminative Training

Viewer
Transcript

Biometric Score Fusion through Discriminative Training Vivek Tyagi

Nalini Ratha

IBM Research - India

IBM T J Watson Research Center

Plot 4, Block-C, Vasant Kunj

Hawthorne, NY 10532

New Delhi India

[email protected]

[email protected]

Abstract

of the K matcher scores/modalities for both the genuine and imposter classes followed by the likelihood ratio test (LRT). They also note that as per the Neymann-Pearson theorem, the LRT is the optimal test in the sense that it maximizes the true accept rate (TAR) given a fixed false accept rate (FAR). However this optimality is guaranteed, only when the "true" underlying probability densities of the genuine and the imposter classes are known[1]. And, in practice, the "true" densities are never known - they are estimated from the training data. Typically, the maximum likelihood (ML) methods are used for the density estimation in all of the pattern recognition applications and they perform rea sonably well. In [1], the authors model the imposter and the genuine probability distributions by the Gaussian mixture models (GMM - a generative model) and they use the ML criteria for the parameter estimation. Further, they show that the ML likelihood ratio test (LRT) outperformed sev eral other score fusion techniques such as those based on the SVM and the neural networks.

In the multibiometric systems, various matcher/modality scores are fused together to provide better performance than the individual matcher scores. In [1 J the authors have proposed a likelihood ratio test (LRT) based fusion tech nique for the biometric verification task that outperformed several other classifiers. They model the genuine and the imposter densities by the finite Gaussian mixture models (GMM, a generative model) whose parameters are esti mated using the maximum likelihood (ML) criteria. Lately, the discriminative training methods and models have been shown to provide additional accuracy gains over the gener ative models, in multiple applications such as the speech recognition, verification and text analytics[5, 7J. These gains are based on the fact that the discriminative models are able to partially compensate for the unavoidable mis match, which is always present between the specified sta tistical model (GMM in this case) and the true distribution of the data which is unknown. In this paper, we propose to use a discriminative method to estimate the GMM density parameters using the maximum accept and reject (MARS) criteria[8}. The test results using the proposed method on the NIST-BSSRI multimodal dataset indicate improved veri fication performance over a very competitive maximum like lihood (ML) trained system proposed in [1 J.

In the past one decade significant research has been done in developing and analyzing the discriminative train ing techniques/models and their comparison done with the generative models[S, 7, 8, 9]. The maximum likelihood (ML) estimated generative models such as the GMMs are the major workhorse in the various pattern recognition ap plications including speech recognition, text analytics, com puter vision and biometric recognition and remain the state of-the-art in these diverse application achieving very good results. However, lately it has been found that the discrim inative models or the discriminatively trained generative models (for example GMM) can provide additional perfor mance gains over the pure generative models[S, 6, 7, 8, 9] which already are the state-of-the-art. This is due to the fact that the discriminative training can compensate for the un avoidable mismatch between the specified models (for ex ample GMMs in this case) and the true underlying distri bution of the data which is never known, The most popular and widely used generative model is the Gaussian mixture model (GMM). Its parameters are usually estimated using

1. Introduction Multimodal biometric systems fuse the information from the several biometric modalities to obtain better verifica tion/identification performance. Typically the fusion can be performed at the feature level, match score level or the decision level. Significant research continues to be per formed on the fusion problem at the various levels[1, 2, 3]. In this work, we study the fusion of the matcher scores. In [1] the authors have used a likelihood ratio test that is widely used in the hypothesis testing, for the verification task. They achieve fusion by learning the joint distributions 145

reciprocal of the incorrect class likelihood (rejection) of

the maximum likelihood (ML)[lO] technique where the op timization function is the likelihood of the data given the correct class labels. For discussion, we assume that the training set has genuine score vectors and im poster score vectors (yfI). Let us further denote their dis tributions (which we will estimate from the training data) by (.) and (.) and let' ()' denote the GMM param eters set. The ML criteria seeks the parameters set ' ()' that maximizes the following optimization function[10].

N

Pgen

(xf)

(

Pimp Xi

FML(()) =

M

log

II Pgen(Xi) II Pimp(Yj) i=l j=l

(1)

Our next task is to estimate the parameters of the GMM models that maximize the MARS criteria in (2). For the sake of clarity let us model the densities by a sin = gle component GMM (a single Gaussian) i.e. = This re sults in,

(3) Taking the derivative of FMARS(()) with respect to and setting it to zero, we get,

� -(Xi- lLgen) �fl,-(Yj- lLgen) 2=1

V

J=l

((J"gen)2

=

. ",M ",N 62=1 x2 fl, 6J=1 yJ' gen = IL N- fl,M _

0

(4)

and similarly we get,

ILimp =

x

FMARS (())

�

ILgen

which gives,

As in [1] we estimate the joint distributions of the K matcher scores/modalities of both the genuine and the im poster classes. A test vector consisting of the K matcher scores is then assigned to the genuine or the imposter class based on the likelihood ratio test. The genuine and the im poster distributions are modeled by the GMM whose pa rameters are estimated using a discriminative training crite rion: maximum accept and reject (MARS) instead of the ML criterion that was used in [1]. Let us consider the MARS[8] optimization function which is defined as,

M

_

((J"gen)2

�

Based Fusion

N

Pgen(x) N(xllLimp,(J"imp).

N(xllLgen,(J"gen), Pimp(X)

MARS Training for the Likelihood Ratio

= Iog II Pgen(Xi)) II Pimp(Yj)" i=l Pimp(Xi j=l Pgen(Yj)

v andfl, are the empirical factors to con

2.1. MARS parameter estimation

However, it is well known that increasing the likelihood does not necessarily result into the increased recogni tion/verification accuracies[5, 7]. Therefore new optimiza tion functions, such as the maximum mutual informa tion (MMIIMPE)[7], large margin hidden Markov models (HMM)[6] and the maximum accept and reject (MARS)[8] were introduced which optimize the different discrimina tive criterion. These techniques have provided consistent additional gains over the ML trained GMMs in the speech recognition systems. In this work we have used the Maximum Accept and Re ject (MARS)[8] criterion to discriminatively train the GMM parameters of the imposter and the genuine densities fol lowed by the likelihood ratio based score fusion for the veri fication task. In the next few sections we explain the MARS technique for the GMM parameter estimation, followed by the experiments comparing it to the ML trained GMM sys tem. In both the techniques the same log likelihood ratio test (LLRT) is used for the verification task.

2.

In (2),

trol the influence of the rejection likelihoods. Comparing (1) and (2), we find that the only difference between the two objective functions are the denominator terms which form the rejection likelihoods from the incorrect classes. We note that it is these terms which bring in the discriminative nature to the model parameter estimation.

M

Pimp

N

t.

2:�1 Yj 2:!1 Xi M- vN - V

W hereas, the ML mean re-estimation formulas are,

(5)

Let us discuss these formulas a bit intuitively. Consider a case when is in fact a genuine class sample but its gen uine likelihood is less than its imposter likelihood. In other words < In that case it would be an error in the training set. However the ML re-estimation for mulas in (5) have no mechanism to "correct" these kinds of errors in the training. This would then lead to more errors

Xi

(2)

Pgen(Xi)

The denominator terms in (2) are the new discriminative components. Instead of just taking into account the ac cept (emission) likelihood it also considers the

Pgen(Xi),

146

Pimp(Xi).

densities. Let us denote the posterior probability that the genuine density's Gaussian mixture component "emit and the genuine density's by ted" the vector Gaussian mixture component "emitted" the vector by Similarly we define Then the update formulas of the component's mean vec tor takes up the following form.

in the testing phase. MARS technique collects the statis tics of such errors in the training and then corrects the re estimation formulas to nullify the effect of such errors to the extent possible. The negative terms in the numerators of (4) are the rejection moments or the correction terms which subtract these errors from the ML moments. Intu itively, this correction shifts the Gaussian probability clouds a bit away from their ML means to reduce the classification error in the training phase. This would then generalize to the test set and reduce the errors in the testing too to the extent possible. The empirical parameters '" and control the influence of the correction terms. If '" and are set to zero then the MARS mean formulas (4) become equivalent to the ML mean formulas (5). This is not surprising as with the", 0, the MARS optimization function in (2) also becomes equivalent to the ML optimization function in (1). Further, we use", "'0 and Vo where "'0 E (0,0.5) and Vo E (0,0.5). These values ensure that the "rejection" moment terms never dominate the usual ML moment terms.

v

=

v

lth Pgen(lIYj).

v

ft

=

v

The above formulas are analogous to the single Gaussian mean formulas in (4) and we have used these formulas to update the mean vectors of the genuine and the imposter densities' GMM in our experiments.

3. Experiments

4: 5:

6:

7:

We have evaluated the performance of the MARS trained likelihood ratio test (LRT) on the NIST-BSSR1 database[ll]. It consists of three partitions - (a) NIST Multimodal with 517 samples and 4 modalities (2 jinger prints, 2 faces), (b) Fingerprint with 6000 samples and 2 modalities (2jingerprints), (c) Face with 3000 samples and 2 modalities (2 faces). In order to have a reasonably large train-set, we took the first 3000 x 2 fingerprint scores from the (b) and the 3000 x 2 face scores from the (c) to cre ate a 3000 x 4 multimodal (2faces, 2jingerprints) train-set which we denote by "NIST-3000". We have used the exist ing 517 x 4 datatset i.e. partition (a) as our test-set. From the description of the NIST-BSSR1 databset[11], we were not able to figure out whether the subset (a) is completely disjoint from the subsets (b) and (c). We are assuming them to be disjoint and even if there is a certain degree of overlap between them, it will benefit both our ML-LLRT baseline, min-max normalized sum-rule baseline and the proposed MARS-LLRT verification system in a similar way. We note that the other authors [1] have used the two halves of (a) with 258 x 4 samples each for the training and the testing along with the 20 repetitions with replace ment. However, making 20 draws of 258 x 4 score vectors to form the train/test subsets from a super set of size 517 x 4 is always going to have a huge overlap of the train/test score vectors in each repetition of the experiment. Therefore av eraging the results over each repetition does not necessarily lead to a more statistically significant result. In these exper iments we decided to take an alternative approach by using the largest possible independent training set (3000 x 4) and the largest possible independent test set (517 x 4) within the limitations of the NIST-BSSR1 dataset. This way we hope to effectively evaluate the performance of the MARS

xf

8 E!lXi,8rej 0,n j E (1, M) do if Pimp(Yj) < Pgen(Yj) 3: 1:

=

=

=

0

then

AN ERROR IN TRAINING OCCURRED

8rej 8rej + Yj =

n

=

end if

n +

1

8: end for

9:

- '" x8rej p,gen - 8N -", xn _

(8rej) (8 E!lXi) (p,gen) (Yj

The rejection moment term

is subtracted from the

of the genuine class in order usual moment to shift the genuine mean away from the samples which are in error belongs to the imposter class but its imposter class likelihood is less then its genuine class likeli hood i.e. We note that this is an im < portant condition used in collecting the rejection moments. Finally, the rejection moments are used as a correction term as in (4) to estimate the genuine and the imposter density's mean vectors through the MARS criterion. =

Yj

Pimp(Yj) pgen(Yj)).

Similarly, we have derived update rules for the means of the GMM with more than one Gaussian mixture component. We model the genuine and imposter den sities by the GMM of ' 0 ' mixture components i.e.

Pgen(x) Empf=lCimpkN(xlp,ken, uken), Pimp(X) prior probabil Ef=lf3kN(xlp,�en , u� ) where mixture en parametrize the kth ity Cik, mean P,k and variance uf =

(6)

=

For further illustration, we present the pseudo code for es timating as per the MARS criterion. Recall that belong to genuine class and yIt belong to the imposter class. 2: for all

=

V

�

p,gen

Yj Pimp(llxi) andpimp(lIYj). lth

E!lXiPgen(llxi)- '"E�lYjPgen(lIYj) N Ei= lPgen(llxi)- "'EjM=lPgen(lIYj) imp E�lYjPimp(lIYj)- E!lXiPimp(llxi) � N EjM=lPimP(lIYj)- vEi= lPimp(llxi) P,lgen

=

=

lth Xi Pgen(lIXi)

=

Gaussian mixture component and so forth. Consider the Gaussian component of the genuine and the imposter

lth

147

example in our experiments we only have 5000 imposter samples in error out of roughly 9 million imposter training samples. Similarly we only had about 30 genuine samples in error out of 3000 genuine training samples. Finally, the MARS mean updated GMM densities are used in the log likelihood ratio test (LLRT). For a given test vector' we compute its imposter and genuine likelihoods from the corresponding densities and then compare their ra tio to a threshold. If the ratio is greater than a threshold 5, is assigned to the genuine class and if it is less than 5, it is assigned to the imposter class.

discriminatve training criteria and hence formed the "NIST3000" trainset as described above. We have modeled the genuine and the imposter distribu tions using 20 component GMMs which were trained us ing the ML criteria and seeded from the single component Gaussian densities. The mixture components were progres sively increased from 1 to 20. We decided to use 20 mixture

t',

�

components as it would lead to approximately ::=::;j 150 samples per Gaussian component of the "genuine" score density - a reasonable number of samples. For the "im poster density", the number of samples was still larger. The authors in [1] have also used a similarly trained 20 mix ture component GMM using the ML criteria for the LRT. Finally, we apply 3 iterations of the MARS training to the ML trained 20 component genuine and imposter GMM re estimating the mean vectors in each iteration using (6). We note that the number of the genuine samples (N) in far smaller than the number of the imposter samples (M) in = the train set (N 3000, M = 3000 x 2999) which leads to a peculiar situation. The rejection (correction) term for the genuine density is based on the errors coming from the imposter samples and vice versa for the imposter rejection = term. Therefore we have empirically set Ii, 0.45 and = l/ 0.00 in the view of these proportions. Throughout the experiments, the factor l/ has been set to zero as there were too few samples (sometimes as low as just 1 or 2 sample for a Gaussian mixture component) in the imposter rejection term, thereby making the corresponding rejection moment in (6) very unreliable.

t

>= 5 t: Genuine, if Pgen(t) Pimp(t) < 5 t: Imposter, If. Pgen(t) Pimp(t)

where 5 is the decision threshold. As we vary 5 we ob tain the receiver operating characteristic (ROC) curve with different values of true accept rate (TAR) and false accept rate (FAR). For comparison we also trained the maximum likelihood (ML) LLRT[I] on the same NIST-3000 training set. In [4], it was shown that the simple sum rule based mutibiometric fusion after the min-max score normaliza tion achieved the best accuracy among a range of other fu sion techniques. Therefore, we also compared with the sum rule where each modalities's score was normalized using min-max normalization to lie in the range (0,1) and then they were combined using the equal weights. Testing for all the three fusion techniques was done on the same 517 x 4 NIST-BSSRI multimodal dataset. Finally, we compare the performance of the MARS system with the ML trained sys tem and the sum rule based fusion in Fig. 1. As can be seen from the figure, the MARS system has lower false accept rates (FAR) over a broad range of values as compared to the ML trained system for the same true accept rates (TAR). As can be seen from this figure, the sum rule performance is also quite good and comparable to the ML LLRT. Similar observations were also reported in [4].

3.1. Using ML variance estimates In these experiments we only update the Gaussian mean vectors according to the MARS criterion and we continue to use the ML estimates for the Gaussian variance param eters. This can be explained as follows. Typically in the discriminative training of the Gaussian mixture models, it has been observed that the discriminative estimation of the means of the Gaussian densities provides the maximum improvement[6, 7]. Discriminative estimation of the means of the Gaussian densities leads to a shift of the Gaussian clouds in the feature space so that the overlap between the two different classes (genuine and imposter) is reduced, thereby leading to the reduced errors. However, the dis criminative estimation of the variance parameters leads to a change in the shape of the Gaussian cloud and it should be done only when one has sufficient training samples in er ror (which form the subset for the discriminative training). Otherwise, it generally leads to a degradation and hence the ML estimates of variance parameters are used instead. This is attributed to the following reason - discriminative training uses only those training samples which are in error. This subset is typically quite small as compared to the en tire training set on which the ML estimates are trained. For

Table 1. True accept rate (TAR) at a FAR of 0.01 % using the MARS LLRT, ML LLRT and the sum rule based fusion.

MARS LLRT 99.42

Sum Rule Fusion 99.00

We present the true accept rate (TAR) of the proposed MARS trained LLRT, the ML LLRT and the sum rule based fusion at a false accept rate (FAR) of 0.01 % in Tab. 1. From the table, we note that the proposed MARS LRT compares favorably with the other results. We also note that the ML LLRT is a very competitive fusion system as reported in [1] and any additional gains over this competitive system is a useful result. 148

Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.), Oxford University Press, 2007.

[6] H. Jiang, X. Li, and C. Liu, "Large Margin Hidden Markov

Models for speech recognition ", in IEEE Trans. on Audio

t. 99.5

Speech and Language processing, Vol. 14, No.5, 2006

!

a:

[7] D. Povey and P.C. Woodland, "Minimum phone error and

1i.

3 99

I-smoothing for improved discriminative training", In the

u C m �

Proc. of IEEE ICASSP, 2002. [8] V. Tyagi, "Maximum accept and reject (MARS) training of

� 98.5

the HMM-GMM speech recognition systems, " In the pro ceedings of Interspeech 2008, Brisbane.

98 _. 10 Figure 1.

2 10False Accept Rate %

' 10-

[9] A. Y. Ng and M. I. Jordan, "On discriminative vs. genera

0

tive classifiers: a comparison of logistic regression and naive

10

Bayes, " In the Proc of NIPS 14, 2001.

ROC of the MARS LLRT, ML LLRT and the mean

[10] A. P. Demspter, N. M. Laird and D. B. Rubin, "Maximum

score (sum rule after score normalization) based fusion technique.

likelihood estimation from incomplete data", Journal of the Royal Statistical Society (B), vol. 39, no. 1, ppl-38, 1979. [11] National

4. Conclusions

ogy,

Ins.

NIST

of

Biometric

Standards

Scores

and

Set

Technol-

Release

1,

http://www.itl.nist.gov/iad/894.03/biometricscores,

A Gaussian mixture model (GMM) and the likelihood ratio test (LRT) based fusion technique was proposed in [1] that was shown to outperform several other techniques for the verification task. We have extended that work by proposing a discriminative training technique (MARS) to estimate the GMM mean parameters. The experimental re sults on the NIST-BSSR1 multimodal dataset, indicate that the discriminatively trained (MARS) GMM based LLRT can provide additional gains over a very competitive max imum likelihood (ML) trained GMM based LLRT. Our fu ture work will focus on the comparison of the MARS-LLRT and the ML-LLRT based verification systems on a much larger training/test set than what was availabale with the NIST-BSSR1 dataset.

2004.

References [1] K. Nandkumar, Y. Chen, S. C. Dass, A. K. Jain, "Likelihood

ratio-based biometric score fusion", IEEE Trans on Pattern Analysis and Machine Intelligence, Vol.30, No.2, Feb 2008.

[2] J. Kittler, M. Hatef, R. P.Duin, and J. G. Matas, "On Com bining Classifers", IEEE Trans. Pattern Analysis and Ma chine Intelligence, Vol. 20, No.3, pp 226-239, March 1998. [3] S. Prabhakar and A. K. Jain, "Decision level fusion in fingerprint verification", Pattern Recognition, vol. 35, No. 4, pp. 861-874, April 2002. [4] R. Snelick, U. Uludag, A. Mink, M. Indovina and A. Jain,"Large-scale evaluation of multimodal-biometric au thentication using state-of-the-art systems," in IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 27, No. 3, March 2005. [5] C. M. Bishop, J. Lasserre, "Generative or Discriminative? Getting the best of both worlds ", in Bayesian Statistics 8, pp3-24, J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P.

149

Biometric Score Fusion through Discriminative Training

recognition, verification and text analytics[5, 7J. These gains are based on ..... [10] A. P. Demspter, N. M. Laird and D. B. Rubin, "Maximum likelihood estimation ...

Download PDF

238KB Sizes 0 Downloads 224 Views

Report

Biometric Score Fusion through Discriminative Training

Recommend Documents