Phonotac(cs
in
Word
Recogni(on
Robert
Daland
(1st
Annual?)
SCULC
4/10/2010
Goal
of
Phonology
• Classical
concep(on
of
grammar
– From
enormous
space
of
all
logically
possible
combina(ons,
picks
out
well‐formed
items
• Categorical:
toma
>
xkcd
– Typically
derive
surface
form
from
underlying
• Issues
– Gradience:
poik,
shmammer
probabilis)c
– Percep(on:
phone(c
cat’s,
word
recogni)on
The
Arg
1. All
current
theories
of
word
recg’n
are
prob’x
2. Open‐vocabulary
word
recg’n
requires
a
component
that
assigns
probabili(es
to
new
forms
3. Independent
evidence
for
necessity
of
scalar/ gradient
phonotac(c
theory
4. Scalar
phonotac(c
theories
effec(vely
assign
probabili(es
already
Already
have
what
we
need!
Word
recogni(on
–
Bayesian
• (H)ypothesis:
sequence
of
words
(ω)
• Observed
(D)ata:
sequence
of
phonemes
(φ)
Language
model
Pronuncia)on
model
(prior)
(likelihood
model)
• Bayes’
Theorem:
p(ω
|
φ)
=
p(ω)p(φ
|
ω)
/
p(φ)
*
How
well
does
the
– Hypothesis
space:
all
possible
sequences
Ω How
likely
is
the
speaker
hypothesis
that
the
– Speaker
only
intended
one
to
produce
sentence
ω?
speaker
said
ω
explain
– Ra(onal
listener:
pick
the
most
likely
the
observed
data
φ?
– p(D)
constant
w.r.t.
H,
so
just
ignore
Posterior
likelihood
The
maximum
likelihood
hyp.
ωbest
=
arg
max{ω∊Ω*}
p(ω)p(φ
|
ω)
is
intrinsically
likely,
and
explains
the
data
well.
Example:
[ɑɪlɑɪktəplænt]
ω
ω1
ω2
Φ
|
|
|
|
|
|
|
|
φ
[ɑɪ
lɑɪk
tə
plænt]
[ɑɪ
lɑɪkt
ə
plænt]
ω3
|
|
|
|
[ɑɪl
ɑɪk
tə
plænt]
Whence
probabili(es?
Language
model
(unigram)
• Frequency:
p(ωi)
=
freq(ωi)/F
– Probability
of
a
word
propor(onal
to
its
frequency
– F
is
total
frequency
in
training
corpus
• Independence:
p(ω1ω2…ωn)
=
p(ω1)p(ω2)…p(ωn)
– Probt’y
of
a
sequence
is
product
of
probt’ys
of
its
words
Whence
probabili(es?
Pronuncia(on
model
• Canonical
pronuncia)on:
Φ(ωj)
=
φj1φj2…φjm
– Each
word
has
a
single,
canonical
pronuncia(on
• Concatena)on:
Φ(ω1ω2…ωn)
=
Φ(ω1)Φ(ω2)…Φ(ωn)
– A
sentence
is
pronounced
by
concatena(ng
the
pronuncia(ons
of
each
of
its
words.
• Explains
observed:
p(ω
|
φ)
=
(Φ(ω)
==
φ)
– 0
if
ω
cannot
be
pronounced
like
φ,
1
if
it
can
Example:
[ɑɪlɑɪktəplænt]
ω
ω1
Φ
|
|
|
|
φ
[ɑɪ
lɑɪk
tə
plænt]
p()
=
p(
[ɑɪlɑɪktəplænt])
=
p()p()p()p()
=
(Φ()
==
[ɑɪlɑɪktəplænt])
=
.01288.003806.02037.000399
=♪
(Φ()Φ()()Φ()
==
[ɑɪlɑɪktəplænt])
=
3.97710‐10
([ɑɪ][lɑɪk][tə][plænt]
==
[ɑɪlɑɪktəplænt])
=
1
Language
model
♪
Probabili(es
es(mated
from
Google
hits
circa
Jan.
28,
2010
Pronuncia(on
model
Example:
[ɑɪlɑɪktəplænt]
ω
ω1
ω2
Φ
|
|
|
|
|
|
|
|
φ
[ɑɪ
lɑɪk
tə
plænt]
[ɑɪ
lɑɪkt
ə
plænt]
p(ω
|
φ)
α
p(ω)p(φ
|
ω)
=
3.97710‐101
=
3.97710‐10
ω3
|
|
|
|
[ɑɪl
ɑɪk
tə
plænt]
Hyp w1
w2
w3
p(ω)
4e‐10
2e‐11
1e‐14
p(φ
|
ω)
1
1
1
p(ω|
φ)
.9505
.0493
.0002
Open‐vocabulary:
[ɑɪlɑɪktətətoʊv]
ω
ω1
ω2
Φ
|
|
|
|
|
|
|
φ
[ɑɪ
lɑɪk
tə
toʊv]
[ɑɪ
lɑɪk
tətoʊv]
Need
to
know
the
likelihood
that
a
new
word
would
take
Problem
1:
freq()
=
0
so
unparsable
the
form
[toʊv]
vs.
[tətoʊv]
• Solu(on:
Set
aside
probability
mass
for
new
words
Component:
given
a
wordform
‘candidate’,
give
the
Problem
2:
don’t
know
pronuncia(on
probability
that
a
new
word
would
have
this
form
• To
compute
p(ω1),
need
p(Φ()
==
[toʊv])
• Entailed
by
probabilis(c
open‐vocabulary
word
recogni(on
To
compute
p(ω2),
need
p(Φ()
==
[tətoʊv])