P(W ) P ( X | W ) ˆ W arg max P(W | X ) arg max P( X ) w w
Cost of Developing a New Language • Transcribed audio data ▫ Subspace acoustic models (UBM’s) need less data
• Text data for language modeling ▫ Obtain from the web if possible
• Pronunciation Lexicon ▫ Qualified phoneticians are expensive ▫ Phoneticians may make mistakes ▫ Conversational (callhome) English has 4.6% OOV rate for a 5K lexicon and 0.4% for a 62K lexicon ▫ Try to guess pronunciation given a limited lexicon and audio
Estimating Pronunciations • Ideal Situation will be to just estimate all the pronunciations for the word that maximize the likelihood given the audio
Prˆn arg max P( X | Prn ) Prn
• There are words for which spoken audio is not available but they need to exist in the recognizer. • Multiple pronunciations have not yet significantly inproved the performance • This objective function needs a lot of regularization
Estimating Pronunciation from Graphemes • One way is to guess the pronunciation from the orthography of the word (e.g. Bisani & Ney) • Iterative process based on grapheme/phoneme alignment ▫ Start with an initial set of graphone probabilities. a
t t
e
n
t I o n
Prˆn arg max P(W , Prn) Prn
x
t
e
n
S
x n
▫ Use the probabilities to realign graphones with phones on training data. ▫ Re-estimate graphone probabilities from the alignments.
Training a Pronunciation Dictionary Training
Initial Pronunciation Dictionary
Prediction
G2P Training
Out of Vocabulary Words
G2P
Predicted Phoneme Sequence
Model for Predicting P from G
G2P Plot for English 100 90 80 % Error
70 60 50
% String errors % Symbol errors
40 30
20 10 0 1
2
3
4
Model Context Size
5
6
Estimating Pronunciations… • If the audio recording is also available, that can be used to augment the estimates
Prˆn arg max P( X | Prn) P( Prn | W ) Prn
• We use an approximation to the above
Prˆn
arg max P( X | Prn) Prn {Top 5 Prn}
Estimating Pronunciations… Start with a handmade phone set and a dictionary
Train g2p with dictionary
Train acoustic models
Force align the training data with multiple pronunciations
Create new dictionary with selected pronunciation
Estimating Pronunciations… Start with a handmade phone set and a dictionary Pick pronunciations Match with word level transcripts
Free Phonetic Recognition
Train g2p with dictionary
Train acoustic models
Force align the training data with multiple pronunciations
Create new dictionary with selected pronunciation
Estimating Pronunciations…
Pick pronunciations Match with word level transcripts Free Phonetic Recognition
Start with a handmade phone set and a dictionary
Train g2p with dictionary Train acoustic models Force align the training data with multiple pronunciations
Create new dictionary with selected pronunciation
Introduce new pronunciation from unsupervised learning Force align and create pronunciations
Pick words with high confidence
Create lattices on similar acoustic datasets
Training Procedure - Bootstrapping 1000 most frequently occuring words of training data Remaining Training data Words used for building LM which covers the test data
• • • •
Train g2p
Trained g2p model
Multiple pronunciation dictionary Dictionary
Callhome training lexicon size – 5 K LM vocabulary size – 62 K Training acoustic data without partial words – 6 hrs Complete training data – 15 hrs
Training Procedure – Bootstrapping 1000 most frequently occuring words
Training data Words used for building LM which covers the test data
Test data
Train g2p
Trained g2p model
Trained acoustic model
Recognition
Multiple pronunciation dictionary Dictionary
Train acoustic model
Training Procedure – Building Up Multiple pronunciation dictionary
Train data
Force alignment
Acoustic Models from previous iteration
Best pronunciation for training words
Training Procedure – Building Up Multiple pronunciation dictionary
Train data
Best pronunciation for training words
Force alignment
Acoustic Models from previous iteration Words used for building the LM which covers the test data
Train g2p
Dictionary
Test data
Acoustic model
Recognition
Training Procedure – Building Up Start with a handmade phone set and a dictionary
Train g2p with dictionary
Train acoustic models
Force align the training data with multiple pronunciations
Create new dictionary with selected pronunciation
Results Results
%
Accuracy with full dictionary available
44.35
Accuracy if 5K manual lexicon is available
40.53
Accuracy with 1000 words available
37.58
After retraining acoustic models
39.37
2nd iteration of g2p & acoustic re-train
41.60
3rd iteration of g2p & acoustic re-train
42.11
After increasing the amount of data to 15 hrs
43.56
Unsupervised Learning Start with a handmade phone set and a dictionary
Train g2p with dictionary Train acoustic models Force align the training data with multiple pronunciations
Create new dictionary with selected pronunciation
Introduce new pronunciation from unsupervised learning Force align and create pronunciations
Pick words with high confidence
Create lattices on similar acoustic datasets
Unsupervised Lexicon Learning Results
Baseline accuracy 6 Hrs of 42.11 training data 15 Hrs of 43.56 training data
After Unsupervised Learning 42.33 43.44
WER dilemma for Spanish Callhome • Spanish pronunciation is very graphemic • Accuracy for Spanish are about 31.13% (about 13% lower than callhome english) • Phone recognition accuracy is better than callhome english English: 45.13% Spanish: 53.77% • LM Perplixity is not too bad: 127 • Can learning alternate pronunciations of reduced words help?
Possible lexicon training paths…
Pick pronunciations Match with word level transcripts Free Phonetic Recognition
Start with a handmade phone set and a dictionary
Train g2p with dictionary Train acoustic models Force align the training data with multiple pronunciations
Create new dictionary with selected pronunciation
Introduce new pronunciation from unsupervised learning Force align and create pronunciations
Pick words with high confidence
Create lattices on similar acoustic datasets
Lexicon Enhancement for Spanish G2P accuracies after augmenting with phone recognition based pronunciations % String Errors
English Results and Spanish Results with unconstrained phonetic recognition approach Baseline
After adding pronunciations
Spanish
31.13
30.71
English
43.54
42.71
• Log likelihood of training data increases with the new lexicon.
Lexicon Enhancement • Keep the manual Lexicon but augment with most likely pronunciation in the training data • Affected about 250 pronunciations • Accuracy improved from 44.33 to 45.01% • Multiple Pronunciations had no significant impact: 45.02%
Summary • G2p based lexicon retraining method helps in achieving accuracies close to hand made lexicons • It can also help in improving an existing lexicon • Unsupervised lexicon learning approach and phonetic recognition based lexicon learning approaches hold promise and need to be explored with a wider variety of smoothing and pronunciation extraction scenarios
Training Procedure • Train g2p to generate pronunciations using your best baseline lexicon • Generate multiple pronunciations using the g2p • Use the training data to select the best pronunciation out of these multiple choices
• Retrain the acoustic models and iterate over the above process
Extraction. Features x. Acoustic Model k a t. R a n. Lexicon or. Pronunciation. Dictionary ... Subspace acoustic models (UBM's) need less data. ⢠Text data for ...
vehicle maintenance, the extent of tyre damage and ... The second class of instrument is the dynamic profile .... the mark at its closest proximity to the road.
Jun 18, 2004 - Actuator For Teleoperator Robot Control,â Bachelor of Sci ence Thesis, MIT, May ... lssues in Force Display,â Computer Science Dept. Univer-.
Jun 18, 2004 - facing a user with a host computer. ...... The host computer (console unit) provides com ..... such as used for the Internet and World Wide Web.
... below to open or edit this item. Comcast Low Cost Internet Flyer.pdf. Comcast Low Cost Internet Flyer.pdf. Open. Extract. Open with. Sign In. Main menu.
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Low cost internet .16.17.pdf. Low cost internet .16.17.pdf. Open.
Dec 24, 2004 - While the third possibility is implausible, only further data can decide definitely between the ..... even Ik o ó Æ Æ âbig gourd, used as a potâ, f Ì .... âpythonâ. motit: arrow. Origins unclear; error for the same etymolog
May 8, 2008 - 1. Myanmar Lexicon. Thin Zar Phyo, Wunna Ko Ko ... It is an open source software and can run on Windows OS. â. It is a product of SIL, ...
based on a GFSK modem and a single-board ARM computer running an open-source Linux kernel ... ground station TNC using TCP/IP and even permits scientists to remotely control their own experiments onboard ... built by a team of graduate students withi
the system, and a specially designed cavity was re- ... limited beam profiles from single-mode diodes en- ... pump source (HL6545MG, Hitachi), each providing.
[12][13]), using DB9 cable and the mobile phone is interfaced to the program .... Generally Short Message Service (SMS) uses 7 bit characters. Therefore a ..... Our design is scalable, flexible and economically cheap. Since we are using our ...
a very simple structure, consequence of the wings movements. .... crosses the laser, its wings partially occlude the light, causing small ..... The roots of P (z) and.
and accessibility on digital medium like the Internet,. PDA and mobile phones. Optical Character Recognition (OCR) technology has been used for decades to ...
Jul 2, 2006 - body stays in the âfight or flightâ mode longer. This in turn inhibits immune .... that affect residential and business zoning, parks and recreation,.
data collection trials were performed to acquire GPS outputs and vehicle ..... We further visualize the segmented maneuvers on the north-east coordinate frame, ...
May 8, 2008 - Myanmar Unicode & NLP Research center. â Myanmar ... Export a dictionary to print as a text document, or html format for web publication.
Jan 6, 2011 - optical ?ber, a probing light source, a poWer supply, a detector means, a signal ..... An alternate embodiment uses an excitation optical ?ber to.
In-cabin occupant tracking using a low-cost infrared system. Abstract â Vehicles in future will be safer and more intelligent, able to make appropriate and ...
Asst. Professor, Water Management Research unit, Kerala Agrl. University, ... the advantages of high rainfall. .... based on geographical information systems.
surements. The logger firmware supports both USB HID and. Mass Storage protocols, such that it can be configured prior to dispatch (setting sampling rates and serial number), and then appear as a conventional flash drive (with the recorded data in a