STRUCTURED LANGUAGE MODELING FOR SPEECH RECOGNITION ERRATAy Ciprian Chelba and Frederick Jelinek Abstract We present revised Wall Street Journal (WSJ) lattice rescoring experiments using the structured language model (SLM).

1 Experiments We repeated the WSJ lattice rescoring experiments reported in [1] in a standard setup. We chose to work on the DARPA'93 evaluation HUB1 test set | 213 utterances, 3446 words. The 20kwds open vocabulary and baseline 3-gram model are the standard ones provided by NIST. As a rst step we evaluated the perplexity performance of the SLM relative to that of a deleted interpolation 3-gram model trained under the same conditions: training data size 20Mwds (a subset of the training data used for the baseline 3-gram model), standard HUB1 open vocabulary of size 20kwds; both the training data and the vocabulary were re-tokenized such that they conform to the Upenn Treebank tokenization. We have linearly interpolated the SLM with the above 3-gram model: P () =   P3gram () + (1 ; )  PSLM ()

showing a 10% relative reduction over the perplexity of the 3-gram model. The results are presented in Table 1. The SLM parameter reestimation procedure1 reduces the PPL by 5% ( 2% after interpolation with the 3-gram model). The main reduction in PPL comes however from the interpolation with the 3-gram model showing that although overlapping, the two models successfully complement each other. The interpolation weight was determined on a held-out set to be  = 0:4. Both language models operate in the UPenn Treebank text tokenization. A second batch of experiments evaluated the performance of the SLM for 3-gram2 lattice decoding. The lattices were generated using the standard baseline 3-gram language model This work was funded by the NSF IRI-19618874 grant STIMULATE Due to the fact that the parameter reestimation procedure for the SLM is computationally expensive we ran only a single iteration 2 In the previous experiments reported on WSJ we have accidentally used bigram lattices y

1

Trigram(20Mwds) + SLM

0.0 0.4 1.0 PPL, initial SLM, iteration 0 152 136 148 PPL, reestimated SLM, iteration 1 144 133 148 Table 1: Test Set Perplexity Results 

trained on 40Mwds and using the standard 20kwds open vocabulary. The best achievable WER on these lattices was measured to be 3.3%, leaving a large margin for improvement over the 13.7% baseline WER. For the lattice rescoring experiments we have adjusted the operation of the SLM such that it assigns probability to word sequences in the CSR tokenization and thus the interpolation between the SLM and the baseline 3-gram model becomes valid. The results are presented in Table 2. The SLM achieved an absolute improvement in WER of 0.8% (5% relative) over the baseline despite the fact that it used half the amount of training data used by the baseline 3-gram model. Training the SLM does not yield an improvement in WER when interpolating with the 3-gram model, although it improves the performance of the SLM by itself. Lattice Trigram(40Mwds) + SLM  0.0 0.4 1.0 WER, initial SLM, iteration 0 14.5 12.9 13.7 WER, reestimated SLM, iteration 1 14.2 13.2 13.7 Table 2: Test Set Word Error Rate Results

References [1] Ciprian Chelba and Frederick Jelinek. Structured language modeling for speech recognition. In Proceedings of NLDB99. Klagenfurt, Austria, 1999.

structured language modeling for speech ... - Semantic Scholar

20Mwds (a subset of the training data used for the baseline 3-gram model), ... it assigns probability to word sequences in the CSR tokenization and thus the ...

106KB Sizes 1 Downloads 429 Views

Recommend Documents

STRUCTURED LANGUAGE MODELING FOR SPEECH ...
A new language model for speech recognition is presented. The model ... 1 Structured Language Model. An extensive ..... 2] F. JELINEK and R. MERCER.

Geo-location for Voice Search Language Modeling - Semantic Scholar
guage model: we make use of query logs annotated with geo- location information .... million words; the root LM is a Katz [10] 5-gram trained on about 695 billion ... in the left-most bin, with the smallest amounts of data and LMs, either before of .

Leveraging Speech Production Knowledge for ... - Semantic Scholar
the inability of phones to effectively model production vari- ability is exposed in the ... The GP theory is built on a small set of primes (articulation properties), and ...

Leveraging Speech Production Knowledge for ... - Semantic Scholar
the inability of phones to effectively model production vari- ability is exposed in .... scheme described above, 11 binary numbers are obtained for each speech ...

Czech-Sign Speech Corpus for Semantic based ... - Semantic Scholar
Marsahll, I., Safar, E., “Sign Language Generation using HPSG”, In Proceedings of the 9th International Conference on Theoretical and Methodological Issues in.

Czech-Sign Speech Corpus for Semantic based ... - Semantic Scholar
Automatic sign language translation can use domain information to focus on ... stance, the SPEECH-ACT dimension values REQUEST-INFO and PRESENT-.

Sparse Non-negative Matrix Language Modeling - Semantic Scholar
Gradient descent training for large, distributed models gets expensive. ○ Goal: build computationally efficient model that can mix arbitrary features (a la MaxEnt).

Language Constructs for Data Locality - Semantic Scholar
Apr 28, 2014 - Licensed as BSD software. ○ Portable design and .... specify parallel traversal of a domain's indices/array's elements. ○ typically written to ...

Semantic Language Models for Topic Detection ... - Semantic Scholar
Ramesh Nallapati. Center for Intelligent Information Retrieval, ... 1 Introduction. TDT is a research ..... Proc. of Uncertainty in Artificial Intelligence, 1999. Martin, A.

Efficient Inference and Structured Learning for ... - Semantic Scholar
1 Introduction. Semantic role .... speech tag).1 For each lexical unit, a list of senses, or frames ..... start, and then subtract them whenever we assign a role to a ...

Automatic Speech and Speaker Recognition ... - Semantic Scholar
7 Large Margin Training of Continuous Density Hidden Markov Models ..... Dept. of Computer and Information Science, ... University of California at San Diego.

SPAM and full covariance for speech recognition. - Semantic Scholar
tied covariances [1], in which a number of full-rank matrices ... cal optimization package as originally used [3]. We also re- ... If we change Pj by a small amount ∆j , the ..... context-dependent states with ±2 phones of context and 150000.

Learning improved linear transforms for speech ... - Semantic Scholar
class to be discriminated and trains a dimensionality-reducing lin- ear transform .... Algorithm 1 Online LTGMM Optimization .... analysis,” Annals of Statistics, vol.

CASA Based Speech Separation for Robust ... - Semantic Scholar
and the dominant pitch is used as a main cue to find the target speech. Next, the time frequency (TF) units are merged into many segments. These segments are ...

Estimation for Speech Processing with Matlab or ... - Semantic Scholar
This shows the dimension of M. Try to find the size of y and z. What does the answer tell you about how Octave represents vectors internally? Type size(1) ... how ...

Acoustic Modeling Using Exponential Families - Semantic Scholar
For general exponential models, there is no analytic solution for maximizing L(θ) and we use gradient based numerical op- timization methods. This requires us ...

MODELING OF SPIRAL INDUCTORS AND ... - Semantic Scholar
50. 6.2 Inductor. 51. 6.2.1 Entering Substrate and Layer Technology Data. 52 ... Smith chart illustration the effect the of ground shield. 75 with the outer circle ...

ACOUSTIC MODELING IN STATISTICAL ... - Semantic Scholar
The code to test HMM-based SPSS is available online [61]. 3. ALTERNATIVE ..... Further progress in visualization of neural networks will be helpful to debug ...

MODELING OF SPIRAL INDUCTORS AND ... - Semantic Scholar
ground shield (all the coupling coefficients are not shown). 8. Lumped ... mechanisms modeled( all coupling coefficients not shown). 21. ...... PHP-10, June 1974,.

Affective Modeling from Multichannel Physiology - Semantic Scholar
1 School of Electrical and Information Engineering, University of Sydney, Australia ..... Andre, E.: Emotion Recognition Based on Physiological Changes in Music.

ACOUSTIC MODELING IN STATISTICAL ... - Semantic Scholar
a number of natural language processing (NLP) steps, such as word ..... then statistics and data associated with the leaf node needs to be checked. On the other ...

Modeling cardiorespiratory interaction during ... - Semantic Scholar
Nov 17, 2014 - Towards a porous media model of the human lung. AIP Conf. Proc. ... University, PauwelsstraЯe 20, 52074 Aachen, Germany. (Received 1 .... world networks such as biological,36 neural,37 and social net- works.38 For a ...