Phonotac(cs
in
Word
 Recogni(on
 Robert
Daland
 (1st
Annual?)
SCULC
 4/10/2010


Goal
of
Phonology
 •  Classical
concep(on
of
grammar
 –  From
enormous
space
of
all
logically
possible
 combina(ons,
picks
out
well‐formed
items
 •  Categorical:
toma
>
xkcd


–  Typically
derive
surface
form
from
underlying


•  Issues
 –  Gradience:
poik,
shmammer 
probabilis)c
 –  Percep(on:
phone(c
cat’s,
word
recogni)on


The
Arg
 1.  All
current
theories
of
word
recg’n
are
prob’x
 2.  Open‐vocabulary
word
recg’n
requires
a
 component
that
assigns
probabili(es
to
new
 forms
 3.  Independent
evidence
for
necessity
of
scalar/ gradient
phonotac(c
theory
 4.  Scalar
phonotac(c
theories
effec(vely
assign
 probabili(es
already
 Already
have
what
we
need!


Word
recogni(on
–
Bayesian
 •  (H)ypothesis:
sequence
of
words
(ω)
 •  Observed
(D)ata:
sequence
of
phonemes
(φ)
 Language
model
 Pronuncia)on
model
 (prior)
 (likelihood
model)
 •  Bayes’
Theorem:
p(ω
|
φ)
=
p(ω)p(φ
|
ω)
/
p(φ)
 *
 How
well
does
the
 –  Hypothesis
space:
all
possible
sequences
Ω How
likely
is
the
speaker
 hypothesis
that
the
 –  Speaker
only
intended
one
 to
produce
sentence
ω?
 speaker
said
ω
explain
 –  Ra(onal
listener:
pick
the
most
likely
 the
observed
data
φ?
 –  p(D)
constant
w.r.t.
H,
so
just
ignore
 Posterior
likelihood
The
maximum
likelihood
hyp.
 ωbest
=
arg
max{ω∊Ω*}
p(ω)p(φ
|
ω)
 is
intrinsically
likely,
and
explains
the
data
well.


Example:
[ɑɪlɑɪktəplænt]
 ω

 
ω1

 
 
ω2

 Φ

 
 


|




|


|





| 
 
 
 
 


|





|




|




|
 φ

 
 
[ɑɪ

lɑɪk

tə

plænt]
 
 
 

[ɑɪ

lɑɪkt

ə

plænt]



 

 
ω3

 
 

 
 




|




|


|






|
 
 

 
 

[ɑɪl

ɑɪk

tə

plænt]


Whence
probabili(es?
 Language
model

(unigram)
 •  Frequency:
p(ωi)
=
freq(ωi)/F
 –  Probability
of
a
word
propor(onal
to
its
frequency
 –  F
is
total
frequency
in
training
corpus


•  Independence:
p(ω1ω2…ωn)
=
p(ω1)p(ω2)…p(ωn)
 –  Probt’y
of
a
sequence
is
product
of
probt’ys
of
its
words


Whence
probabili(es?
 Pronuncia(on
model
 •  Canonical
pronuncia)on:
Φ(ωj)
=
φj1φj2…φjm

 –  Each
word
has
a
single,
canonical
pronuncia(on


•  Concatena)on:
Φ(ω1ω2…ωn)
=
Φ(ω1)Φ(ω2)…Φ(ωn)
 –  A
sentence
is
pronounced
by
concatena(ng
the
 pronuncia(ons
of
each
of
its
words.


•  Explains
observed:
p(ω
|
φ)
=
(Φ(ω)
==
φ)
 –  0
if
ω
cannot
be
pronounced
like
φ,
1
if
it
can


Example:
[ɑɪlɑɪktəplænt]
 ω

 
ω1

 Φ

 
 


|




|


|





|
 φ

 
 
[ɑɪ

lɑɪk

tə

plænt]
 p()
=
 p(

[ɑɪlɑɪktəplænt])
=
 p()p()p()p()
=
 (Φ()
==
[ɑɪlɑɪktəplænt])
=
 .01288.003806.02037.000399
=♪

 (Φ()Φ()()Φ()
==
[ɑɪlɑɪktəplænt])
=
 3.97710‐10
 ([ɑɪ][lɑɪk][tə][plænt]
==
[ɑɪlɑɪktəplænt])
=
1
 Language
model
 ♪
Probabili(es
es(mated
from
Google
hits
circa
Jan.
28,
2010


Pronuncia(on
model


Example:
[ɑɪlɑɪktəplænt]
 ω

 
ω1

 
 
ω2

 Φ

 
 


|




|


|





| 
 
 
 
 


|





|




|




|
 φ

 
 
[ɑɪ

lɑɪk

tə

plænt]
 
 
 

[ɑɪ

lɑɪkt

ə

plænt]
 p(ω
|
φ)
α
p(ω)p(φ
|
ω)
=
 
3.97710‐101
=
3.97710‐10



 

 
ω3

 
 

 
 




|




|


|






|
 
 

 
 

[ɑɪl

ɑɪk

tə

plænt]


Hyp w1 
 w2 
 w3 



p(ω) 
4e‐10 
2e‐11 
1e‐14

p(φ
|
ω) 
 
1 
 
 
1 
 
 
1 



p(ω|
φ)
 
.9505
 
.0493
 
.0002


Open‐vocabulary:
[ɑɪlɑɪktətətoʊv]
 ω

 
ω1
 
 
 
ω2

 Φ

 
 


|




|



|




|
 
 
 
 
 


|




|







|
 φ

 
 
[ɑɪ

lɑɪk

tə

toʊv] 
 
 
 






[ɑɪ

lɑɪk


tətoʊv]
 Need
to
know
the
likelihood
that
a
new
word
would
take
 Problem
1:
freq()
=
0

so
unparsable
 the
form
[toʊv]
vs.
[tətoʊv]


•  Solu(on:
Set
aside
probability
mass
for
new
words
 Component:
given
a
wordform
‘candidate’,
give
the
 Problem
2:
don’t
know
pronuncia(on
 probability
that
a
new
word
would
have
this
form


•  To
compute
p(ω1),
need
p(Φ()
==
[toʊv])
 • Entailed
by
probabilis(c
open‐vocabulary
word
recogni(on
 To
compute
p(ω2),
need
p(Φ()
==
[tətoʊv])


PhonotacVcs in Word RecogniVon

PronunciaVon model. • Canonical pronuncia on: Φ(ωj) = φj1φj2…φjm. – Each word has a single, canonical pronunciaVon. • Concatena on: Φ(ω. 1 ω. 2 …ω n. ) = Φ(ω. 1. )Φ(ω. 2. )…Φ(ω n. ) – A sentence is pronounced by concatenaVng the. pronunciaVons of each of its words. • Explains observed: p(ω | φ) = (Φ(ω) == φ).

950KB Sizes 2 Downloads 105 Views

Recommend Documents

Model in Word - Microsoft
ground domain), for which large amounts of train- ing data are available, to a different domain (the adaptation domain), for which only small amounts of training ...

Model in Word - Microsoft
training data effectively. However, such training data is not always available for many search do- mains, such as non-English search markets or per- son name ...

Reexamining the word length effect in visual word recognition ... - crr
yielded a mixture of null effects and inhibitory length ef- fects. .... although the effect was stronger in German than in En- ..... Perry, C., & Ziegler, J. C. (2002). A ...

Predicting Word Pronunciation in Japanese
dictionary using the web; (2) Building a decoder for the task of pronunciation prediction, for which ... the word-pronunciation pairs harvested from unannotated text achieves over. 98% precision ... transliteration, letter-to-phone. 1 Introduction.

pdf opens in word
File: Pdf opens in word. Download now. Click here if your download doesn't start automatically. Page 1 of 1. pdf opens in word. pdf opens in word. Open. Extract.

Word formation-Wildlife in danger.pdf
Word formation-Wildlife in danger.pdf. Word formation-Wildlife in danger.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Word formation-Wildlife in ...

Word-formation in English
(10b) the verb buy does not have an overtly expressed subject. The logical ... In sum, there is a host of possibilities speakers of a language have at their disposal ...

WORD OPPOSITE RHYMING WORD
Aug 24, 2014 - Social Development. Let us ensure our children use magic words on a regular basis. Be a role model for our children and use these words yourself as well around our children. Magic words like : Please, Thank You, Sorry, Etc. Also, greet

word by word pdf
Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. word by word pdf. word by wor

Word meaning, concepts and Word meaning ...
In the writing of this thesis, I have benefitted from the continuous support and .... I will also try to show how pragmatics can help to explain how concepts expressing .... and/or inaccessible to people's perceptual systems and therefore unsuitable

Word meaning, concepts and Word meaning, concepts ...
pragmatics, concepts have come to play a prominent role, although not much work ..... I start by outlining the linguistic underdeterminacy thesis, which holds that in ...... So the contents of mind-internal concepts like DOG, COFFEE, WATER, ...... 'r