Challenges in Automatic Speech Recognition 2010-2020: Speech Technology for the Next Decade - Visions from Academia and Industry Ciprian Chelba, Michiel Bacchiani, Johan Schalkwyk {ciprianchelba,michiel,johans}


09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 1

Case Study:Google Search by Voice

Carries 25% of USA Google mobile search queries! What contributed to success: clearly set user expectation by existing text app excellent language model built from query stream clean speech: users are motivated to articulate clearly phones do high quality speech capture speech tranferred error free to server over IP Challenges: Making and measuring progress: manually transcribing data is at about same word error rate as system (15%) 09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 2

Case Study: Google Labs GAudi Demo This was the study for the YouTube feature that is now launched for all and integrated with translation. Main challenge: lack of coverage due to ASR limitations: noise-robustness speaker/accent/channel variability language model mismatches web is multi-lingual

09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 3

ASR for Retrieval and Ranking

On large document collections search is truly about Precision@N. There is seldom a good reason to replace a result in the top-N with one that has hits in the (noisy) ASR transcript. Future directions: improve retrieval for "hard queries" which return very few documents based strictly on keyword hits in the text metadata speech-rich sub-domains such as lectures/talks in English recorded in a controlled setup where current ASR capabilities are adequate after manual tuning to the sub-domain. 09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 4

Core Technology

Current state: automatic speech recognition is incredibly complex problem is fundamentally unsolved data availability and computing have changed significantly since the mid-nineties Challenges and Directions: re-visit (simplify!) modeling choices made on corpora of modest size; 2-3 orders of magnitude more data is available multi-linguality built-in from start noise-robustness and speaker/channel variability

09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 5

Challenges in Automatic Speech Recognition - Research at Google

Case Study:Google Search by Voice. Carries 25% of USA Google mobile search queries! ... speech-rich sub-domains such as lectures/talks in ... of modest size; 2-3 orders of magnitude more data is available multi-linguality built-in from start.

45KB Sizes 13 Downloads 482 Views

Recommend Documents

Large Vocabulary Automatic Speech ... - Research at Google
Sep 6, 2015 - child speech relatively better than adult. ... Speech recognition for adults has improved significantly over ..... caying learning rate was used. 4.1.

as email and short message dictation. Without pre-specifying the ..... gual Education”, CMU, 1999.

Automatic Speech and Speaker Recognition ... - Semantic Scholar
7 Large Margin Training of Continuous Density Hidden Markov Models ..... Dept. of Computer and Information Science, ... University of California at San Diego.

A Study of Automatic Speech Recognition in Noisy ...
each class session, teachers wore a Samson AirLine 77 'True Diversity' UHF wireless headset unidirectional microphone that recorded their speech, with the headset .... Google because it has an easier to use application programming interface (API –

Word Embeddings for Speech Recognition - Research at Google
to the best sequence of words uttered for a given acoustic se- quence [13, 17]. ... large proprietary speech corpus, comparing a very good state- based baseline to our ..... cal speech recognition pipelines, a better solution would be to write a ...

Robust Speech Recognition Based on Binaural ... - Research at Google
degrees to one side and 2 m away from the microphones. This whole setup is 1.1 ... technology and automatic speech recognition,” in International. Congress on ...

model components of a traditional automatic speech recognition. (ASR) system ... voice search. In this work, we explore a variety of structural and optimization improvements to our LAS model which significantly improve performance. On the structural

this case, analysing the contents of the audio or video can be useful for better categorization. ... large-scale data set with 25000 music videos and 25 languages.

Speech Recognition for Mobile Devices at Google
phones running the Android operating system like the Nexus One and others becoming ... decision-tree tied 3-state HMMs with currently up to 10k states total.

Automatic generation of research trails in web ... - Research at Google
Feb 10, 2010 - thematic exploration, though the theme may change slightly during the research ... add or rank results (e.g., [2, 10, 13]). Research trails are.

matched training speech corpus to better match target domain utterances. This paper addresses the problem of determining the distribution of perturbation levels ...

Speech Recognition in reverberant environments ...
suitable filter-and-sum beamforming [2, 3], i.e. a combi- nation of filtered versions of all the microphone signals. In ... microphonic version of the well known TI connected digit recognition task) and Section 9 draws our ... a Recognition Directivi

Optimizations in speech recognition
(Actually the expected value is a little more than $5 if we do not shuffle the pack after each pick and you are strategic). • If the prize is doubled, you get two tries to ...

and a class-based language model that uses both the SPINE-1 and. SPINE-2 training data ..... that a class-based language model is the best choice for addressing these problems .... ing techniques for language modeling,” Computer Speech.

Adaptive, Selective, Automatic Tonal ... - Research at Google
Oct 24, 2009 - Most commer- cial photo editing software tools provide the ability to automati- ... We select the best reference face automatically given a collection ... prototypical face selection procedure on a YouTube video blog, with the ...

AutoFDO: Automatic Feedback-Directed ... - Research at Google
about the code's runtime behavior to guide optimization, yielding improvements .... 10% faster than binaries optimized without AutoFDO. 3. Profiling System.

Loop Recognition in C++/Java/Go/Scala - Research at Google
ploys many language features, in particular, higher-level data structures (lists ... The benchmark points to very large differences in all examined dimensions of ...

Large Scale Language Modeling in Automatic ... - Research at Google
The test set was gathered using an Android application. People were prompted to speak a set of random queries selected from a time period that ...

automatic pronunciation verification - Research at Google
Further, a lexicon recorded by experts may not cover all the .... rently containing interested words are covered. 2. ... All other utterances can be safely discarded.

Statistical Parametric Speech Synthesis - Research at Google
Jun 9, 2014 - Text analysis. Model training x y x y λˆ. • Large data + automatic training. → Automatic voice building. • Parametric representation of speech.

Challenges And Opportunities In Media Mix ... - Research at Google
Media mix models (MMMs) are statistical models used by advertisers to .... The ads exposure data is more challenging to collect, as ad campaigns are often ... publication can be provided, it is not always a good proxy for the actual ... are well-esti

Data Management Challenges in Production ... - Research at Google
Issues propagate through the system (bad serving data ⇒ bad training data ⇒ bad models). ○ Re-training .... Provides automatic control of false discoveries (multiple hypothesis testing error) for visual, interactive .... A system to help users

Challenges in Building Large-Scale Information ... - Research at Google
Page 24 ..... Frontend Web Server query. Cache servers. Ad System. News. Super root. Images. Web. Blogs. Video. Books. Local. Indexing Service ...