Anatom y of an e xtremely fast LVCSR decoder Abstract ...

Viewer
Transcript

Abstract

AE

G T

T

D

DOG

CAT ATE

AGED

George Saon, Daniel Povey and Geoffrey Zweig

3

t-1

t

4

THE CAT 6 A CAT

2 29.4

ATE

5 451.0

t

t+1

RT04 16.4% 15.2%

= Word trace

= Token

THE CAT ATE 9 3

DEV04 14.5% 13.0%

10 1709.7

ATE

RT03 17.4% 16.1%

32

31

31.5

30

30.5

29

29.5

28

28.5

27

27.5

26.5 0.1

0.2

0.3

0.7

on-demand hierarchical decoupled hierarchical on-demand

0.4 0.5 0.6 Real-time factor (cpu time/audio time)

Phonetic context Number of leaves Number of words Number of n-grams Number of states Number of arcs

Search statistics:

Word error rate Search errors Run-time factor Likelihood/search ratio Avg. Gaussians/frame Max. states/frame

SA 19.0% 0.3% 0.55xRT 55/45 43.5K 15.0K

SA 3 21.5K 32.9K 4.2M 26.7M 68.7M

SI 28.7% 2.2% 0.14xRT 60/40 7.5K 5.0K

Decoding graph statistics: SI 2 7.9K 32.9K 3.9M 18.5M 44.5M

0.8

EARS 2004 evaluation submission in the one times real-time (or 1xRT) category. Two-pass decoding scheme with three adaptation passes inbetween (VTLN, FMLLR, MLLR).

Experimental setup (1xRT system)

IBM T.J. Watson Research Center phone (914)-945-2985, email [email protected]

Viterbi search speed-ups Graph memory layout: graph stored as a linear array of arcs sorted by origin state Successor look-up table: maps static to dynamic state indices Running beam pruning: pruning based on current maximum score estimate

Lattice generation

1

2

1

Keep track of the N-best distinct word sequences arriving at every state

A CAT

TIME

A DOG

ONE CAT 1

THE CAT 4

N-best degree Lattice link density

Speaker-adapted decoding LM rescoring + consensus

Likelihood computation Hierarchical decoupled On-demand Hierarchical on-demand

WER (%)

Anatomy of an extremely fast LVCSR decoder

We report in detail the decoding strategy that we used for the past two Darpa Rich Transcription evaluations (RT’03 and RT’04) which is based on finite state automata (FSA). We discuss the format of the static decoding graphs, the particulars of our Viterbi implementation, the lattice generation and the likelihood evaluation. Experimental results are given on the EARS database (English conversational telephone speech) with emphasis on our faster than real-time system.

They are acceptors (instead of transducers) Arcs in graph have three different types of labels: – leaf labels (context-dependent output distributions), – word labels and – epsilon labels (e.g. due to LM back-off states).

AW

AO

D AW AO G G K AE T

JH D

= null state

Two different types of states:

K

D

– emitting states for which all incoming arcs are labeled by the same leaf and – null states which have incoming arcs labeled by words or epsilon.

EY

EY T JH

= emitting state

Static decoding graphs

Design of LVCSR Decoder for Czech Language

An LDPC Decoder Chip Based on Self-Routing ...

An efficient video decoder design for MPEG-2 MP@ ML

GOVERNMENT OF KERALA Abstract

Theoretical study of an abstract bubble vibration model

GOVERNMENT OF KERALA Abstract

Abstract

VIN Decoder .pdf

1-E&Y Cloud computing.pdf

GOVERNMENT OF ANDHRAPRADESH ABSTRACT School ...

GOVERNMENT OF ANDHRA PRADESH ABSTRACT

I M f y R E

Padres e hijos y viceversa.pdf

1-E&Y Cloud computing.pdf

GOVERNMENT OF ANDHRA PRADESH ABSTRACT ...

4Âº BilingÃ¼e y EOI.pdf

GOVERNMENT OF ANDHRA PRADESH ABSTRACT ...

INVESTIGACION ACERCAD E LA EMPRESA AMAZON Y ...

GOVERNMENT OF ANDHRA PRADESH ABSTRACT

Application of an AMR Strategy to an Abstract Bubble Vibration Model

Implementation of Viterbi decoder for a High Rate ...

Implementation of H.264 Decoder on General-Purpose ...

2017-ESTANCIA-SEMINARIO EN FLORENCIA-Relatori e abstract ...

A study on soft margin estimation for LVCSR