Low Cost Lexicon

Nagendra Kumar Goel, Samuel Thomas, Pinar Akyazi

Speech Recognition Feature Extraction

Speech

cat

Features

ran

Lexicon or Pronunciation Dictionary

k a t R a n

x

arg max

p (W ) p ( x | W )

Language Model

P(wn | wn 1 , wn 2 )

Acoustic Model

P(W ) P ( X | W ) ˆ W arg max P(W | X ) arg max P( X ) w w

Cost of Developing a New Language • Transcribed audio data ▫ Subspace acoustic models (UBM’s) need less data

• Text data for language modeling ▫ Obtain from the web if possible

• Pronunciation Lexicon ▫ Qualified phoneticians are expensive ▫ Phoneticians may make mistakes ▫ Conversational (callhome) English has 4.6% OOV rate for a 5K lexicon and 0.4% for a 62K lexicon ▫ Try to guess pronunciation given a limited lexicon and audio

Estimating Pronunciations • Ideal Situation will be to just estimate all the pronunciations for the word that maximize the likelihood given the audio

Prˆn arg max P( X | Prn ) Prn

• There are words for which spoken audio is not available but they need to exist in the recognizer. • Multiple pronunciations have not yet significantly inproved the performance • This objective function needs a lot of regularization

Estimating Pronunciation from Graphemes • One way is to guess the pronunciation from the orthography of the word (e.g. Bisani & Ney) • Iterative process based on grapheme/phoneme alignment ▫ Start with an initial set of graphone probabilities. a

t t

e

n

t I o n

Prˆn arg max P(W , Prn) Prn

x

t

e

n

S

x n

▫ Use the probabilities to realign graphones with phones on training data. ▫ Re-estimate graphone probabilities from the alignments.

Training a Pronunciation Dictionary Training

Initial Pronunciation Dictionary

Prediction

G2P Training

Out of Vocabulary Words

G2P

Predicted Phoneme Sequence

Model for Predicting P from G

G2P Plot for English 100 90 80 % Error

70 60 50

% String errors % Symbol errors

40 30

20 10 0 1

2

3

4

Model Context Size

5

6

Estimating Pronunciations… • If the audio recording is also available, that can be used to augment the estimates

Prˆn arg max P( X | Prn) P( Prn | W ) Prn

• We use an approximation to the above

Prˆn

arg max P( X | Prn) Prn {Top 5 Prn}

Estimating Pronunciations… Start with a handmade phone set and a dictionary

Train g2p with dictionary

Train acoustic models

Force align the training data with multiple pronunciations

Create new dictionary with selected pronunciation

Estimating Pronunciations… Start with a handmade phone set and a dictionary Pick pronunciations Match with word level transcripts

Free Phonetic Recognition

Train g2p with dictionary

Train acoustic models

Force align the training data with multiple pronunciations

Create new dictionary with selected pronunciation

Estimating Pronunciations…

Pick pronunciations Match with word level transcripts Free Phonetic Recognition

Start with a handmade phone set and a dictionary

Train g2p with dictionary Train acoustic models Force align the training data with multiple pronunciations

Create new dictionary with selected pronunciation

Introduce new pronunciation from unsupervised learning Force align and create pronunciations

Pick words with high confidence

Create lattices on similar acoustic datasets

Training Procedure - Bootstrapping 1000 most frequently occuring words of training data Remaining Training data Words used for building LM which covers the test data

• • • •

Train g2p

Trained g2p model

Multiple pronunciation dictionary Dictionary

Callhome training lexicon size – 5 K LM vocabulary size – 62 K Training acoustic data without partial words – 6 hrs Complete training data – 15 hrs

Training Procedure – Bootstrapping 1000 most frequently occuring words

Training data Words used for building LM which covers the test data

Test data

Train g2p

Trained g2p model

Trained acoustic model

Recognition

Multiple pronunciation dictionary Dictionary

Train acoustic model

Training Procedure – Building Up Multiple pronunciation dictionary

Train data

Force alignment

Acoustic Models from previous iteration

Best pronunciation for training words

Training Procedure – Building Up Multiple pronunciation dictionary

Train data

Best pronunciation for training words

Force alignment

Acoustic Models from previous iteration Words used for building the LM which covers the test data

Train g2p

Dictionary

Test data

Acoustic model

Recognition

Training Procedure – Building Up Start with a handmade phone set and a dictionary

Train g2p with dictionary

Train acoustic models

Force align the training data with multiple pronunciations

Create new dictionary with selected pronunciation

Results Results

%

Accuracy with full dictionary available

44.35

Accuracy if 5K manual lexicon is available

40.53

Accuracy with 1000 words available

37.58

After retraining acoustic models

39.37

2nd iteration of g2p & acoustic re-train

41.60

3rd iteration of g2p & acoustic re-train

42.11

After increasing the amount of data to 15 hrs

43.56

Unsupervised Learning Start with a handmade phone set and a dictionary

Train g2p with dictionary Train acoustic models Force align the training data with multiple pronunciations

Create new dictionary with selected pronunciation

Introduce new pronunciation from unsupervised learning Force align and create pronunciations

Pick words with high confidence

Create lattices on similar acoustic datasets

Unsupervised Lexicon Learning Results

Baseline accuracy 6 Hrs of 42.11 training data 15 Hrs of 43.56 training data

After Unsupervised Learning 42.33 43.44

WER dilemma for Spanish Callhome • Spanish pronunciation is very graphemic • Accuracy for Spanish are about 31.13% (about 13% lower than callhome english) • Phone recognition accuracy is better than callhome english English: 45.13% Spanish: 53.77% • LM Perplixity is not too bad: 127 • Can learning alternate pronunciations of reduced words help?

Possible lexicon training paths…

Pick pronunciations Match with word level transcripts Free Phonetic Recognition

Start with a handmade phone set and a dictionary

Train g2p with dictionary Train acoustic models Force align the training data with multiple pronunciations

Create new dictionary with selected pronunciation

Introduce new pronunciation from unsupervised learning Force align and create pronunciations

Pick words with high confidence

Create lattices on similar acoustic datasets

Lexicon Enhancement for Spanish G2P accuracies after augmenting with phone recognition based pronunciations % String Errors

60 50 40 30

Dev Eval

20

10 0 1

2

3

Iteration (Model) #

4

Lexicon Enhancement for Spanish

% String Errors

G2P Plot for Spanish 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0

Dev Eval

2

3 Iteration (Model) #

4

English Results and Spanish Results with unconstrained phonetic recognition approach Baseline

After adding pronunciations

Spanish

31.13

30.71

English

43.54

42.71

• Log likelihood of training data increases with the new lexicon.

Lexicon Enhancement • Keep the manual Lexicon but augment with most likely pronunciation in the training data • Affected about 250 pronunciations • Accuracy improved from 44.33 to 45.01% • Multiple Pronunciations had no significant impact: 45.02%

Summary • G2p based lexicon retraining method helps in achieving accuracies close to hand made lexicons • It can also help in improving an existing lexicon • Unsupervised lexicon learning approach and phonetic recognition based lexicon learning approaches hold promise and need to be explored with a wider variety of smoothing and pronunciation extraction scenarios

Training Procedure • Train g2p to generate pronunciations using your best baseline lexicon • Generate multiple pronunciations using the g2p • Use the training data to select the best pronunciation out of these multiple choices

• Retrain the acoustic models and iterate over the above process

Low Cost Lexicon

Extraction. Features x. Acoustic Model k a t. R a n. Lexicon or. Pronunciation. Dictionary ... Subspace acoustic models (UBM's) need less data. • Text data for ...

2MB Sizes 0 Downloads 248 Views

Recommend Documents

low-cost road roughness machine
vehicle maintenance, the extent of tyre damage and ... The second class of instrument is the dynamic profile .... the mark at its closest proximity to the road.

Low-cost haptic mouse implementations
Jun 18, 2004 - Actuator For Teleoperator Robot Control,” Bachelor of Sci ence Thesis, MIT, May ... lssues in Force Display,” Computer Science Dept. Univer-.

Low-cost haptic mouse implementations
Jun 18, 2004 - facing a user with a host computer. ...... The host computer (console unit) provides com ..... such as used for the Internet and World Wide Web.

Low Cost Brochure Russian2014 rev.pdf
Whoops! There was a problem loading this page. Low Cost Brochure Russian2014 rev.pdf. Low Cost Brochure Russian2014 rev.pdf. Open. Extract. Open with.

Comcast Low Cost Internet Flyer.pdf
... below to open or edit this item. Comcast Low Cost Internet Flyer.pdf. Comcast Low Cost Internet Flyer.pdf. Open. Extract. Open with. Sign In. Main menu.

Low cost internet .16.17.pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Low cost internet .16.17.pdf. Low cost internet .16.17.pdf. Open.

Oropom Etymological Lexicon
Dec 24, 2004 - While the third possibility is implausible, only further data can decide definitely between the ..... even Ik o ó ƙ ƙ “big gourd, used as a pot”, f ́ .... “python”. motit: arrow. Origins unclear; error for the same etymolog

Myanmar Lexicon
May 8, 2008 - 1. Myanmar Lexicon. Thin Zar Phyo, Wunna Ko Ko ... It is an open source software and can run on Windows OS. ○. It is a product of SIL, ...

Low Cost Ground Station Design for Nanosatellite Missions - CiteSeerX
based on a GFSK modem and a single-board ARM computer running an open-source Linux kernel ... ground station TNC using TCP/IP and even permits scientists to remotely control their own experiments onboard ... built by a team of graduate students withi

Highly efficient, low-cost femtosecond Cr3+:LiCAF laser ...
the system, and a specially designed cavity was re- ... limited beam profiles from single-mode diodes en- ... pump source (HL6545MG, Hitachi), each providing.

Implementation of a Low Cost Wireless Distributed ...
[12][13]), using DB9 cable and the mobile phone is interfaced to the program .... Generally Short Message Service (SMS) uses 7 bit characters. Therefore a ..... Our design is scalable, flexible and economically cheap. Since we are using our ...

Exploring Low Cost Laser Sensors to Identify Flying ...
a very simple structure, consequence of the wings movements. .... crosses the laser, its wings partially occlude the light, causing small ..... The roots of P (z) and.

Low Cost Correction of OCR Errors Using ... - Research at Google
and accessibility on digital medium like the Internet,. PDA and mobile phones. Optical Character Recognition (OCR) technology has been used for decades to ...

Healthy Naturally Occurring Retirement Communities: A Low-Cost ...
Jul 2, 2006 - body stays in the “fight or flight” mode longer. This in turn inhibits immune .... that affect residential and business zoning, parks and recreation,.

Low-cost loosely-coupled GPS/odometer fusion: a ...
data collection trials were performed to acquire GPS outputs and vehicle ..... We further visualize the segmented maneuvers on the north-east coordinate frame, ...

Myanmar Lexicon
May 8, 2008 - Myanmar Unicode & NLP Research center. – Myanmar ... Export a dictionary to print as a text document, or html format for web publication.

Reversible, low cost, distributed optical fiber sensor with high spatial ...
Jan 6, 2011 - optical ?ber, a probing light source, a poWer supply, a detector means, a signal ..... An alternate embodiment uses an excitation optical ?ber to.

In-cabin occupant tracking using a low-cost infrared system
In-cabin occupant tracking using a low-cost infrared system. Abstract – Vehicles in future will be safer and more intelligent, able to make appropriate and ...

A low cost subsurface dyke using bentonite clay
Asst. Professor, Water Management Research unit, Kerala Agrl. University, ... the advantages of high rainfall. .... based on geographical information systems.

A Scalable Low-Cost Solution to Provide Personalised ... - Oliver Parson
surements. The logger firmware supports both USB HID and. Mass Storage protocols, such that it can be configured prior to dispatch (setting sampling rates and serial number), and then appear as a conventional flash drive (with the recorded data in a