Challenges in Automatic Speech Recognition 2010-2020: Speech Technology for the Next Decade - Visions from Academia and Industry Ciprian Chelba, Michiel Bacchiani, Johan Schalkwyk {ciprianchelba,michiel,johans}@google.com
Google
09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 1
Case Study:Google Search by Voice
Carries 25% of USA Google mobile search queries! What contributed to success: clearly set user expectation by existing text app excellent language model built from query stream clean speech: users are motivated to articulate clearly phones do high quality speech capture speech tranferred error free to server over IP Challenges: Making and measuring progress: manually transcribing data is at about same word error rate as system (15%) 09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 2
Case Study: Google Labs GAudi Demo This was the study for the YouTube feature that is now launched for all and integrated with translation. Main challenge: lack of coverage due to ASR limitations: noise-robustness speaker/accent/channel variability language model mismatches web is multi-lingual
09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 3
ASR for Retrieval and Ranking
On large document collections search is truly about Precision@N. There is seldom a good reason to replace a result in the top-N with one that has hits in the (noisy) ASR transcript. Future directions: improve retrieval for "hard queries" which return very few documents based strictly on keyword hits in the text metadata speech-rich sub-domains such as lectures/talks in English recorded in a controlled setup where current ASR capabilities are adequate after manual tuning to the sub-domain. 09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 4
Core Technology
Current state: automatic speech recognition is incredibly complex problem is fundamentally unsolved data availability and computing have changed significantly since the mid-nineties Challenges and Directions: re-visit (simplify!) modeling choices made on corpora of modest size; 2-3 orders of magnitude more data is available multi-linguality built-in from start noise-robustness and speaker/channel variability
09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 5
Challenges in Automatic Speech Recognition - Research at Google
Case Study:Google Search by Voice. Carries 25% of USA Google mobile search queries! ... speech-rich sub-domains such as lectures/talks in ... of modest size; 2-3 orders of magnitude more data is available multi-linguality built-in from start.
as email and short message dictation. Without pre-specifying the ..... gual Educationâ, CMU, 1999. http://www.cal.org/resources/Digest/digestglobal.html.
7 Large Margin Training of Continuous Density Hidden Markov Models ..... Dept. of Computer and Information Science, ... University of California at San Diego.
each class session, teachers wore a Samson AirLine 77 'True Diversity' UHF wireless headset unidirectional microphone that recorded their speech, with the headset .... Google because it has an easier to use application programming interface (API â
to the best sequence of words uttered for a given acoustic se- quence [13, 17]. ... large proprietary speech corpus, comparing a very good state- based baseline to our ..... cal speech recognition pipelines, a better solution would be to write a ...
degrees to one side and 2 m away from the microphones. This whole setup is 1.1 ... technology and automatic speech recognition,â in International. Congress on ...
model components of a traditional automatic speech recognition. (ASR) system ... voice search. In this work, we explore a variety of structural and optimization improvements to our LAS model which significantly improve performance. On the structural
this case, analysing the contents of the audio or video can be useful for better categorization. ... large-scale data set with 25000 music videos and 25 languages.
phones running the Android operating system like the Nexus One and others becoming ... decision-tree tied 3-state HMMs with currently up to 10k states total.
Feb 10, 2010 - thematic exploration, though the theme may change slightly during the research ... add or rank results (e.g., [2, 10, 13]). Research trails are.
matched training speech corpus to better match target domain utterances. This paper addresses the problem of determining the distribution of perturbation levels ...
suitable filter-and-sum beamforming [2, 3], i.e. a combi- nation of filtered versions of all the microphone signals. In ... microphonic version of the well known TI connected digit recognition task) and Section 9 draws our ... a Recognition Directivi
(Actually the expected value is a little more than $5 if we do not shuffle the pack after each pick and you are strategic). ⢠If the prize is doubled, you get two tries to ...
and a class-based language model that uses both the SPINE-1 and. SPINE-2 training data ..... that a class-based language model is the best choice for addressing these problems .... ing techniques for language modeling,â Computer Speech.
Oct 24, 2009 - Most commer- cial photo editing software tools provide the ability to automati- ... We select the best reference face automatically given a collection ... prototypical face selection procedure on a YouTube video blog, with the ...
about the code's runtime behavior to guide optimization, yielding improvements .... 10% faster than binaries optimized without AutoFDO. 3. Profiling System.
ploys many language features, in particular, higher-level data structures (lists ... The benchmark points to very large differences in all examined dimensions of ...
The test set was gathered using an Android application. People were prompted to speak a set of random google.com queries selected from a time period that ...
Further, a lexicon recorded by experts may not cover all the .... rently containing interested words are covered. 2. ... All other utterances can be safely discarded.
Jun 9, 2014 - Text analysis. Model training x y x y λË. ⢠Large data + automatic training. â Automatic voice building. ⢠Parametric representation of speech.
Media mix models (MMMs) are statistical models used by advertisers to .... The ads exposure data is more challenging to collect, as ad campaigns are often ... publication can be provided, it is not always a good proxy for the actual ... are well-esti
Issues propagate through the system (bad serving data â bad training data â bad models). â Re-training .... Provides automatic control of false discoveries (multiple hypothesis testing error) for visual, interactive .... A system to help users