Minimum Hypothesis Phone Error as a Decoding Method for Speech Recognition Haihua Xu,Daniel Povey

August 28, 2009

Maximum a Posteriori (MAP) The standard decoding formula normally used in speech recognition is Maximum A Posteriori (MAP) as follows: W∗

= argmaxW P(W |O) = argmaxW P(W )p(O|W )

Known limitation Such a criterion guarantees a sentence that minimizes sentence error can be decoded; however, it is usually the Word Error Rate (WER) not the sentence error used as the evaluation criterion for the recognition system performance. To make up for this mismatch, Minimum Bayes Risk (MBR) criterion is a natural alternative.

Minimum Bayes Risk (MBR)

W ∗ = argminWi

N X

P(Wj |O)E (Wi |Wj )

j=1

where E (Wi |Wj ) is the number of errors (Levenshtein distance) given Wi as a reference.

Problems Direct calculating the criterion in a subspace of W (generally represented as word graph/lattice) is prohibitive, thus many approximated strategies are attempted.

Known approaches to WER minimization

I

I I

N-best sentence list based decoding scheme proposed by A.Stockle et al. Consensus network proposed by L.Mangu et al. Time-frame word error proposed by F.Wessel et al. al.

MPE/MWE as a criterion for lattice rescoring

We approximate MBR with Minimum Phone Error (MPE) discriminative training criterion as decoding criterion, the approach is X W ∗ = argmaxW P κ (W 0 |O)Acc(W 0 |W ) W0

where argmaxW is taken over P an N-best list that we derive from the decoding lattice, and so is W 0 . In other words we find the hypothesis W ∗ to maximize the objective criterion.

Advantages of the proposed method

• Explicitly optimizing the objective criterion, the correctness of which has been proofed by MPE/MWE discriminative training. • Conceptually simple and clear, as simplified forward-backward algorithm is performed on the decoding lattice. • Much flexible on the accuracy criterion Acc(W 0 |W ) calculating, such as on phone error, time-frame phone error, time-frame word error, and word error criteria can be implemented under the same framework.

WER(%) of the hypothetical reference

How large N for a desired W ∗ ? (1)

26.4

26.2

MAP MHPE

26

25.8

0

20

40 60 N-best sentence number (N )

80

Figure: WER versus N on MSRA data (trained with MPE)

100

WER(%) of the hypothetical reference

How large N for a desired W ∗ ? (2)

33.5

MAP MHPE 33

32.5 0

20

40 60 N-best sentence number (N )

80

Figure: WER versus N on broadcast data(trained with MAP+MPE)

100

How large N for a desired W ∗ ? (3)

As illustrated from the figures, the desired WER can be gained when N is ranged from 20 to 40. Therefore with a very limited N, we can approximate WER minimization. Similar experiments has also been performed on English test data, and we reach the same conclusions.

Experimental results

Recognition system (A) trained with the MLE criterion Table: Baseline vs. MHPE on MSRA test data.

Methods Base MHPE ∆

#ins 24 51 +27

MSRA (MLE, N=40) #sub #del SER 2333 171 95.40% 2319 107 95.40% -12 -64 -0.0%

WER 26.41% 25.88% -0.53%

Experimental results

Recognition system (B) trained with the MAP criterion Table: Baseline vs. MHPE on BDC test data.

Methods Base MHPE ∆

#ins 114 187 +73

BDC (MAP,N=40) #sub #del SER 7065 963 98.02% 7014 651 98.18% -51 -312 +0.16%

WER 33.83% 32.62% -1.21%

Experimental results

Recognition system (C) trained with the MPE criterion Table: Baseline versus MHPE on MSRA test data.

Methods Base MHPE ∆

#ins 26 47 +21

MSRA (MPE, N=40) #sub #del SER WER 2074 183 94.60% 23.85% 2035 108 93.40% 22.90% -39 -75 -1.20% -0.95%

Experimental results

Recognition system (D) trained with MAP+MPE criterion Table: Baseline versus MHPE on BDC test data.

Methods Base MHPE ∆

#ins 109 178 +69

BDC (MPE,N=40) #sub #del SER 6296 959 96.45% 6254 568 96.20% -42 -391 -0.25%

WER 30.60% 29.08% -1.52%

Experimental results

MHPE versus Consensus Network on lattice decoding Table: MHPE versus CN

Test sets MSRA(MLE) BDC(MAP) MSRA(MPE) BDC(MPE)

Base 26.41% 33.83% 23.85% 30.60%

CN 25.92% 32.80% 23.42% 29.41%

MHPE 25.88% 32.62% 22.90% 29.08%

Conclusions and future work

We have introduced a new decoding method for lattice rescoring that aims to get closer to the Minimum Bayes Risk decision rule with respect to the Word Error Rate. Future work will be focused on • To have a full comparison on other criteria to implement Acc(Wi , W ). • More sophisticated approach will be investigated to take the place of N-best sentence list. • Based on the proposed criterion, new system combination scheme will be studied, not just Confusion Network Combination.

Thanks !

Minimum Hypothesis Phone Error as a Decoding ...

Aug 28, 2009 - Minimum Hypothesis Phone Error as a Decoding ... sentence error used as the evaluation criterion for the recognition system ... 33. 33.5. W. E. R. (%. ) o f th e h y p o th e tic a l re fe re n c e. 0. 20. 40. 60. 80. 100. N-best sentence number (N). MAP. MHPE. Figure: WER versus N on broadcast data(trained ...

97KB Sizes 0 Downloads 178 Views

Recommend Documents

Minimum Phone error and I-Smoothing for improved ...
May 8, 2001 - Povey & Woodland: Minimum Phone Error ... Minimum Phone Error (MPE) is a new criterion .... HTK large vocabulary recognition system.

Minimum Phone Error and I-Smoothing for Improved ...
Optimising the MPE criterion: Extended Baum-Welch. • I-smoothing for ... where λ are the HMM parameters, Or the speech data for file r, κ a probability scale and P(s) the .... Smoothed approximation to phone error in word recognition system.

Efficient Minimum Error Rate Training and Minimum Bayes-Risk ...
Aug 2, 2009 - Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, ..... operation and it is identical to the algorithm de-.

Lattice Minimum Bayes-Risk Decoding for Statistical Machine ...
In statistical machine translation, MBR decoding ... a range of translation experiments to analyze lattice ..... Statistics computed over these data sets are re-.

Error Restricted Fast MAP Decoding of VLC - Semantic Scholar
For example, when used in the codeword set C1 (described in the beginning of Section 3), previous decoders project all branches ranging from c1 to c9 at every.

Correcting Erasure Bursts with Minimum Decoding Delay - IEEE Xplore
Email: [email protected]. Abstract—Erasure correcting codes are widely used in upper layers of packet-switched networks, where the packet erasures.

Lattice-based Minimum Error Rate Training for ... - Research at Google
Compared to N-best MERT, the number of ... and moderate BLEU score gains over N-best. MERT. ..... in-degree is zero are combined into a single source.

Characterization of minimum error linear coding with ...
[IM − (−IM + C−1)C]σ−2 δ VT. (20). = √ P. M. ΣsHT ED. −1. 2 x Cσ−2 δ VT. (21) where. C = (. IN +. P. M σ−2 δ VT V. ) −1. (22) and we used the Woodbury matrix identity in eq. 18. Under a minor assumption that the signal covari

Efficient Minimum Error Rate Training and ... - Research at Google
39.2. 3.7. Lattice MBR. FSAMBR. 54.9. 65.2. 40.6. 39.5. 3.7. LatMBR. 54.8. 65.2. 40.7. 39.4. 0.2. Table 3: Lattice MBR for a phrase-based system. BLEU (%). Avg.

Iterative Decoding vs. Viterbi Decoding: A Comparison
probability (APP). Even though soft decision is more powerful than hard decision decoders, many systems can not use soft decision algorithms, e.g. in GSM.

minimum
May 30, 1997 - Webster's II NeW College Dictionary, Houghton Mif?in,. 1995, p. .... U.S. Patent. Oct. 28,2003. Sheet 10 0f 25. US RE38,292 E. Fl 6. I4. 200. 220.

Iterative Decoding vs. Viterbi Decoding: A Comparison
hard and soft decision Viterbi decoders (we use hard decision type decoders only, for the channel, where data is in the binary format only), and convert the hard ...

should this hypothesis seeking as child be removed ...
with an object by their parents, the figure rises to approximately 6.5 million (Straus & Kantor,. 1987). Although ... clinically-oriented interview techniques that, while fostering support for the child, may under- mine its .... social worker's decis

On the measurement of privacy as an attacker's estimation error
... ESAT/SCD/IBBT-COSIC,. Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium ... tions with a potential privacy impact, from social networking platforms to ... We show that the most widely used privacy metrics, such as k-anonymity, ... between da

DIETARY HYPOTHESIS
Note: The world map in Figure 3 is from “The World Factbook”, operated by the ... Thomas F. Spande received a Ph.D. in chemistry from Princeton University in ...

Hypothesis
which is crucially based on Move-F, is provided in section 3, Some theoretical consequences of the proposed analysis are discussed in section 4, followed by concluding remarks in section 5. 2. Takahashi (1993) and Some Problems. 2.1 Takahashi (1993).

Upstate NY Cell Phone Dead Zones, As Reported By Constituents.pdf
Schumer Map 9.7.16 - Upstate NY Cell Phone Dead Zones, As Reported By Constituents.pdf. Schumer Map 9.7.16 - Upstate NY Cell Phone Dead Zones, ...

PhoneNet- a Phone-to-Phone Network for Group Communication ...
PhoneNet- a Phone-to-Phone Network for Group Communication within an Administrative Domain.pdf. PhoneNet- a Phone-to-Phone Network for Group ...

The Social Brain Hypothesis
hypothesis, though I present the data only for the first of ..... rather the ''software programming'' that occurs .... Machiavellian intelligence hypothesis, namely to ...

Hypothesis Testing.pdf
... mean weight of all bags of pretzels equals 5 oz. Ha : The mean weight of all bags of chips is less than 5 oz. Reject H0 in favor of Ha if the sample mean is sufficiently less than 5 oz. Matt Jones (APSU) Hypothesis Testing for One Mean and One Pr

Maintenance" Hypothesis
detached, thus we call the predetermining flakes themselves ventral flakes. .... results in a plunging termination that ruins the core. ACKNOWLEDGEMENTS.

Hypothesis testing.pdf
Whoops! There was a problem loading more pages. Retrying... Hypothesis testing.pdf. Hypothesis testing.pdf. Open. Extract. Open with. Sign In. Main menu.

Riemann Hypothesis
Mar 1, 2003 - shops are collaborating on the website (http:// · www.aimath.org/WWN/rh/) ..... independently discovered some of the develop- ments that had ...