Paolo Baggia Director of International Standards March 6th, 2009
Google TechTalk
Google TechTalk – Mar 6th, 2009
Paolo Baggia
11
Overview A Bit of History W3C Speech Interaction Framework Today ASR/DMTF TTS Lexicons Voice Dialog and Call Control Voice Platforms and Next Evolutions
W3C Multimodal Interaction Today MMI Architecture EMMA and InkML A language for Emotions
Next Future Google TechTalk – Mar 6th, 2009
Paolo Baggia
2
Company Profile
Privately held company (fully owned by Telecom Italia), founded in 2001 as spin-off from Telecom Italia Labs, capitalizing on 30yrs experience and expertise in voice processing.
Global Company, leader in Europe and South America for award-winning, high quality voice technologies (synthesis, recognition, authentication and identification) available in 26 languages and 62 voices.
Multilingual, proprietary technologies protected over 100 patents worldwide London
Financially robust, break-even reached in 2004, revenues and earnings growing year on year
Growth-plan investment approved for the evolution of products and services.
Offices in New York. Headquarters in Torino, local representative sales offices in Rome, Madrid, Paris, London, Munich
Munich
Paris
Madrid Torino New York Rome
Flexible: About 100 employees, plus a vibrant ecosystem of local freelancers.
Google TechTalk – Mar 6th, 2009
Paolo Baggia
3
International Awards “2008 Frost & Sullivan European Telematics and Infotainment Emerging Company of the Year” Award Winner of “Market leader-Best Speech Engine” Speech Industry Award 2007 and 2008 Loquendo MRCP Server: Winner of 2008 IP Contact Center Technology Pioneer Award “Best Innovation in Automotive Speech Synthesis” Prize AVIOS-SpeechTEK West 2007 “Best Innovation in Expressive Speech Synthesis” Prize AVIOS-SpeechTEK West 2006 “Best Innovation in Multi-Lingual Speech Synthesis” Prize AVIOS-SpeechTEK West 2005 Google TechTalk – Mar 6th, 2009
Paolo Baggia
4
A Bit of History
Google TechTalk – Mar 6th, 2009
Paolo Baggia
5
Standard Bodies Two main standard bodies: W3C – World Wide Web Consortium Founded in 1994, by Tim Berners-Lee with a mission to lead the Web to its full potential. Staff based in MIT (USA), ERCIM (France), Keio Univ (Japan). 400 members all over the world, 50 Working, Interest and Coordination Groups. W3C is where the framework of today’s Web is developed (HTML, CSS, XML, DOM, SOAP, RDF, OWL, VoiceXML, SVG, XSLT, P3P, XML, Internationalization, Web Accessibility, Device Independence)
IETF – Internet Engineering Task Force Founded in 1986, but growth in 1991as Internet Society. 1300 members. HTTP, SIP, RTP and many others protocols. Media Resource Control Protocol (MRCP) is very relevant for speech platforms.
Two industrial forums: VoiceXML Forum (www.voicexml.org) Inventors of VoiceXML 1.0, then submitted to W3C for standardization. Current goal is to promote, disseminate and support VoiceXML and related standards.
SALT Forum (www.saltforum.org) Supported by Microsoft to define a lightweight markup for telephony and multimodal applications.
Other relevant bodies: 3GPP, OMA, ETSI, NIST Google TechTalk – Mar 6th, 2009
Paolo Baggia
6
The (r)evolution of VoiceXML 1998 - 2004
W3C charters Voice Browser WG
W3C charters Multimodal Interaction WG VoiceXML Forum Birth
2000
1998
By AT&T, IBM, Lucent, Motorola,
1999 W3C Voice Browser Workshop
SALT Forum Birth
EMMA 1.0 W3C Rec
By Cisco, Comverse, Intel, Microsoft, Philips, SpeechWorks,
Preparing to announce VoiceXML 1.0 Friday Feb. 25th, 2000 Lucent, Naperville, Illinois Left to right: Gerald Karam (AT&T), Linda Boyer (IBM), Ken Rehor (Lucent), Bruce Lucas (IBM), Pete Danielsen (Lucent), Jim Ferrans (Motorola), Dave Ladd (Motorola).
Google TechTalk – Mar 6th, 2009
Paolo Baggia
7
Speech Interface Framework in 2000 (by Jim Larson)
Semantic Interpretation for Speech Recognition (SISR) N-gram Grammar ML Speech Recognition Grammar Spec. (SRGS)
ASR
Language Understanding
EMMA Natural Language Semantics ML
VoiceXML 2.1 VoiceXML 2.0
Context Interpretation
World Wide Web
DTMF Tone Recognizer Pronunciation Lexicon Specification (PLS)
User
Dialog Manager
Pre-recorded Audio Player
TTS
Language Generation
Speech Synthesis Markup Language (SSML)
Google TechTalk – Mar 6th, 2009
Media Planning
Reusable Components
Telephone System
Call Control XML (CCXML)
Paolo Baggia
8
Speech Interface Framework - Today (by Jim Larson) Semantic Interpretation for Speech Recognition (SISR)
N-gram Grammar ML Speech Recognition Grammar Spec. (SRGS)
ASR
Language Understanding
EMMA 1.0 Natural Language Semantics ML
VoiceXML 2.1 VoiceXML 2.0
Context Interpretation
World Wide Web
DTMF Tone Recognizer Pronunciation Lexicon Specification (PLS)
User
Dialog Manager
Pre-recorded Audio Player
TTS
Language Generation
Speech Synthesis Markup Language (SSML)
Google TechTalk – Mar 6th, 2009
Media Planning
Reusable Components
Telephone System
Call Control XML (CCXML)
Paolo Baggia
9
Speech Interface Framework - End of 2009 (by Jim Larson) Semantic Interpretation for Speech Recognition (SISR)
N-gram Grammar ML Speech Recognition Grammar Spec. (SRGS)
ASR
Language Understanding
EMMA 1.0 Natural Language Semantics ML
VoiceXML 2.1 VoiceXML 2.0
Context Interpretation
World Wide Web
DTMF Tone Recognizer Pronunciation Lexicon Specification (PLS)
User
Dialog Manager
Pre-recorded Audio Player
TTS
Language Generation
Speech Synthesis Markup Language (SSML)
Google TechTalk – Mar 6th, 2009
Media Planning
Reusable Components
Telephone System
Call Control XML (CCXML)
Paolo Baggia
10
W3C Process
Google TechTalk – Mar 6th, 2009
Paolo Baggia
11
Architectural Changes Traditional (proprietary) architecture
ASR / DTMF
Speech Applic.
User
Proprietary SCE
TTS / Audio Proprietary platform
VoiceXML architecture
.grxml/.gram, .pls
ASR / DTMF
VoiceXML Browser
User TTS / Audio
.vxml HTTP
Web Applic.
VoiceXML platform
.ssml, .wav/.mp3, .pls
Google TechTalk – Mar 6th, 2009
Paolo Baggia
12
The VoiceXML Impact VoiceXML changed the landscape of IVRs and speech application creation From proprietary to standard-based speech applications Before • Proprietary platforms (HW & SW) • Proprietary applications (by proprietary SCE) • Mainly DTMF and pre-recorded prompts • First attempts to add speech into IVR
Google TechTalk – Mar 6th, 2009
After • Standard VoiceXML platforms • Standards for Speech Technologies • Standard tools for VoiceXML applications • Integration of DTMF and ASR • Still predominance of DTMF, but more and more speech applications
Paolo Baggia
13
Overview A Bit of History W3C Speech Interaction Framework Today ASR/DMTF TTS Lexicons Voice Dialog and Call Control Voice Platforms and Next Evolutions
W3C Multimodal Interaction Today MMI Architecture EMMA and InkML A language for Emotions
Next Future Google TechTalk – Mar 6th, 2009
Paolo Baggia
14
Standards for ASR and DTMF SRGS 1.0, SISR 1.0
Google TechTalk – Mar 6th, 2009
Paolo Baggia
15
W3C Standards for Speech/DTMF Grammars
SYNTAX
Speech Defines constraints on admissible sentences for grammar a specific recognition turn
SEMANTICS Describes how to produce results after an utterance is recognized
SRGS SRGS ABNF ABNF
XML XML
voice voice
dtmf dtmf
http://www.w3.org/TR/speech-grammar/ Google TechTalk – Mar 6th, 2009
SISR SISR literal literal
script script
http://www.w3.org/TR/semantic-interpretation/ Paolo Baggia
16
SRGS/SISR Grammars for “Torino” SRGS XML
SISR literal
Torino10100
SRGS ABNF
#ABNF 1.0 iso-8859-1; mode voice; tag-format ; public $main = Torino {10100} ;
SISR script
var unused=7;Torinoout="10100";
#ABNF 1.0 iso-8859-1; mode voice; tag-format ; {var unused=7;}; public $main = Torino {out="10100";} ;
Google TechTalk – Mar 6th, 2009
Paolo Baggia
17
SRGS/SISR Standards – Pros Powerful syntax (CFG) and very powerful semantics (ECMA) DMTF and Voice input are transparent to the application Wide and consistent adoption among technology vendors Two syntax XML and ABNF are great! Developers can choose (XML validation vs. compact format) Transformations are possible XML ABNF (easy, simple XSLT) ABNF XML (requires a ABNF parser) Open Source tools might be created to:
Validate grammar syntax Transform grammars Debug grammars on written input Coverage tests: explode covered sentences, GenSem, SemTester, etc.
Google TechTalk – Mar 6th, 2009
Paolo Baggia
18
SRGS/SISR Standards – Small Issues Semantics declaration: tag-format attribute If value “semantics/1.0”? Mandate SISR Script semantics inside semantic tags If value “semantics/1.0-literal”? Mandate SISR Literal semantics inside semantic tags If missing? Unclear! Risk of interoperability troubles
SISR Script Semantics Clumsy default assignment: returns last referenced rule only Developer must properly pop-up results Be careful to redefine “out” Assign a scalar value might result in errors
SISR Literal Semantics Only useful for very simple word-list rules No support for encapsulating rules SISR Literal grammars as external references ONLY! Google TechTalk – Mar 6th, 2009
Paolo Baggia
19
SRGS/SISR – Encapsulated Grammars
Gr2.gram Literal Gr41.grxml Literal
Gr1.grxml Script Gr3.grxml Script
Gr42.gram Script
Google TechTalk – Mar 6th, 2009
Paolo Baggia
20
SRGS/SISR Standards – Rich XML Results Section 7 of SISR 1.0 specification http://www.w3.org/TR/semantic-interpretation/#SI7
Serialization rules from SISR ECMA results into XML Edge cases: Arrays Special variable “_attribute” and “_value” Creation of namespaces and prefixes { drink: { _nsdecl: { _prefix:"n1", _name:"http://www.example.com/n1" }, _nsprefix:"n1", liquid: { _nsdecl: { _prefix:"n2", _name:"http://www.example.com/n2" }, _attributes: { color: { _nsprefix:"n2", _value:"black" } }, _value:"coke" }, size:"medium" }
cokemedium
}
Google TechTalk – Mar 6th, 2009
Paolo Baggia
21
SRGS/SISR Standards – Next Steps Adoption of the PLS 1.0 lexicon Clear entry point into PLS lexicons, element Missing role attribute in to allow homographs disambiguation
Next extensions via Errata XML 1.1 support and IR Update normative references
No Major Extensions are needed!
Google TechTalk – Mar 6th, 2009
Paolo Baggia
22
Speech Synthesis SSML 1.0/1.1
Google TechTalk – Mar 6th, 2009
Paolo Baggia
23
TTS – Functional Architecture and Markup/Non-Markup support Structure Analysis
Text Normalization
Markup support:
, Non-Markup support: infer the structure by automatic text analysis
Text-toPhoneme Conversion
Markup support: , Non-Markup support: look up in pronunciation dictionary
Markup support: for date, time, phone number, numbers for acronyms and transliterations Non-Markup support: automatically identify and convert constructs
Mothers Day 12. BG. BK. BBC 4 - Diamond Hill Nth Vic MTBO series #1. Paddy's Swamp, Winchelsea 13. World Cup 1 - Switzerland. May 5-13. 19 Fri 18 May - Vic Sec Schools Champs BG 20 BK. Aus MTBO Champs -. Qld May 19-20. British O Champs - Scotland. Ma
Please complete all information below for the STUDENT. Student. Name. Student ID. Number. Grade. Level. School. Name. 2. Please complete all information ...
foresee and account for every possible situation. ... This rulebook edition is authorized by Amtgard International, and meets their ... Equipment Checking 20.
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. SFPS REFUSAL FORM v8 - Spanish.pdf. SFPS REFUSAL FORM v8 - Spanish.pdf. Open. Extract. Open with. Sign In. M
variés : industrie, médecine, sport, transport, aérospatiale, ... Les dispositifs de mesure de vitesse sont généralement. appelés cinémomètres. Les cinémomètres les plus courants peuvent être classés en deux catégories : les. « cinémomètres Doppler »
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 07-Infraestructura Vial V8.pdf. 07-Infraestructura Vial V8.pdf. Open. Extract. Open with. Sign In. Main menu
El Cerebro. & La Toma de Decisiones. ⢠Duke University. Argumentación. & Retórica. ⢠Chicago University. NeurobiologÃa. ⢠Chris Anderson (CREADO de TED).
May 3, 2016 - Statistical. analysis results indicated a good correlation between the. values estimated by the ANN model and the actual data,. with higher ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 15-ComponenteAmbiental V8.pdf. 15-ComponenteAmbiental V8.pdf. Open. Extract. Open with. Sign In. Main menu.
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. IJCS 2017 V8.
Page 1. Whoops! There was a problem loading more pages. Retrying... 12-Plan de Ordenamiento Logistico V8.pdf. 12-Plan de Ordenamiento Logistico V8.pdf.