Introduction Structured Vectorial Semantics Evaluation

Structured Composition of Semantic Vectors Stephen Wu Division of Biomedical Statistics and Informatics Mayo Clinic

January 13, 2011 | IWCS

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Outline 1

Introduction Overview Related Work

2

Structured Vectorial Semantics Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

3

Evaluation Model Fit Parsing Speed Performance

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Outline 1

Introduction Overview Related Work

2

Structured Vectorial Semantics Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

3

Evaluation Model Fit Parsing Speed Performance

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Big Picture

Distributed Semantic Vector Composition

    .5 .1 .2 , .1 .1 .1 the

engineers

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Big Picture

Distributed Semantic Vector Composition

    .5 .1 .2 , .1 .1 .1 the

(Syntactic) Parsing

+

S NP

pulled off ...

DT

NN

the

engineers

engineers

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Big Picture

Distributed Semantic Vector Composition

    .5 .1 .2 , .1 .1 .1 the

(Syntactic) Parsing

+

=

Structured Vectorial Semantics (SVS)

  .1 .2 .1 NP

S NP

pulled off ...

DT

NN

the

engineers

engineers

  .1 .2 .1 DT the

Stephen Wu

  .5 .1 .1 NN

engineers

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Weaknesses of Distributed Semantic models 1

No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history

2

Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.

⇒ Structured Vectorial Semantics (SVS) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Weaknesses of Distributed Semantic models 1

No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history complaining of abdominal pain.

2

Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.

⇒ Structured Vectorial Semantics (SVS) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Weaknesses of Distributed Semantic models 1

No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history complaining of abdominal pain.

2

Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.

⇒ Structured Vectorial Semantics (SVS) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Weaknesses of Distributed Semantic models 1

No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history complaining of abdominal pain.

2

Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.

⇒ Structured Vectorial Semantics (SVS) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Weaknesses of Distributed Semantic models 1

No compositionality Ex 1: Patient is a 48-year old male with no significant past medical history complaining of abdominal pain.

2

Bag-of-words independence assumption Ex 2: 2a) Significant improvement of health outcomes followed the drastic overhaul of surgical pre-operation procedure. 2b) Significant overhaul of surgical pre-operation procedure followed the drastic improvement of health outcomes.

⇒ Structured Vectorial Semantics (SVS) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Vector Composition Background General definition

(Mitchell & Lapata ’08)

eγ = f ( eα , eβ , M, L )

Syntactic context Predicate–argument Selectional preferences Language models Matrices

Stephen Wu

(Kintsch ’01) (Erk & Padó ’08) (Mitchell & Lapata ’09) (Rudolph & Giesbrecht ’10)

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Vector Composition Background General definition

(Mitchell & Lapata ’08)

eγ = f ( eα , eβ , |{z} M , |{z} L ) |{z} |{z} |{z}

target vector

source 1 source 2 syntax knowledge

Syntactic context Predicate–argument Selectional preferences Language models Matrices

Stephen Wu

(Kintsch ’01) (Erk & Padó ’08) (Mitchell & Lapata ’09) (Rudolph & Giesbrecht ’10)

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Vector Composition Background General definition

(Mitchell & Lapata ’08)

eγ = f ( eα , eβ , |{z} M , |{z} L ) |{z} |{z} |{z}

target vector

source 1 source 2 syntax knowledge

Add: eγ [i] = eα [i] + eβ [i] Mult: eγ [i] = eα [i] · eβ [i] Syntactic context Predicate–argument Selectional preferences Language models Matrices

Stephen Wu

(Kintsch ’01) (Erk & Padó ’08) (Mitchell & Lapata ’09) (Rudolph & Giesbrecht ’10)

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Vector Composition Background General definition

(Mitchell & Lapata ’08)

eγ = f ( eα , eβ , |{z} M , |{z} L ) |{z} |{z} |{z}

target vector

source 1 source 2 syntax knowledge

Add: eγ [i] = eα [i] + eβ [i] Mult: eγ [i] = eα [i] · eβ [i] Syntactic context Predicate–argument Selectional preferences Language models Matrices

Stephen Wu

(Kintsch ’01) (Erk & Padó ’08) (Mitchell & Lapata ’09) (Rudolph & Giesbrecht ’10)

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Semantically-annotated Parsing Headword Lexicalization S

(Charniak ’97) One-word semantics Subcategorization ⇒ Headword-lex SVS

VP

NP

DT

NN

the

engineers

VBD

NP

Latent Annotations (Matsuzaki VBD

PRT

DT

pulled

off

an

et al. ’05)

NN

NN

NN

engineering

trick

Learned subcats Clustered semantics ⇒ Relationally-clustered SVS

Semantic parsing Logical forms ⇒ Logical interpretation SVS

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Semantically-annotated Parsing Headword Lexicalization

S

ipulled

NP

VP

iengineers

DT

NN

(Charniak ’97) One-word semantics Subcategorization ⇒ Headword-lex SVS

ipulled

VBD

ithe

iengineers

the

engineers iVBD pulled

NP

ipulled

pulled

itrick

PRT ioff

off

DT ian

Latent Annotations (Matsuzaki NN

et al. ’05)

itrick

an i NN engineering engineering

NN

itrick

trick

Learned subcats Clustered semantics ⇒ Relationally-clustered SVS

Semantic parsing Logical forms ⇒ Logical interpretation SVS

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Semantically-annotated Parsing Headword Lexicalization S[e]

NP[e]

DT[e]

NN[e]

(Charniak ’97) One-word semantics Subcategorization ⇒ Headword-lex SVS

VP[e]

VBD[e]

NP[e]

Latent Annotations (Matsuzaki the

engineers VBD[e] PRT[e]

pulled

off

DT[e]

an

et al. ’05)

NN[e]

NN[e]

NN[e]

engineering

trick

Learned subcats Clustered semantics ⇒ Relationally-clustered SVS

Semantic parsing Logical forms ⇒ Logical interpretation SVS

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Overview Related Work

Semantically-annotated Parsing Headword Lexicalization

S pulled(egr,trick(egrng))

VP

NP egr

DT -

(Charniak ’97) One-word semantics Subcategorization ⇒ Headword-lex SVS

pulled(x,trick(egrng))

NN eng

VBD

NP

pulled(x,y)

trick(egrng)

Latent Annotations (Matsuzaki the

VBD engineers pulled(x,y)

pulled

PRT -

DT -

off

an

NN trick(egrng)

NN

NN

egrng

trick(z)

engineering

trick

et al. ’05)

Learned subcats Clustered semantics ⇒ Relationally-clustered SVS

Semantic parsing Logical forms ⇒ Logical interpretation SVS

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Outline 1

Introduction Overview Related Work

2

Structured Vectorial Semantics Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

3

Evaluation Model Fit Parsing Speed Performance

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Components Word vector in context (e)   .5 eβ = .1 .1 P(engineers | lciβ ) iu ik ip

  .1 engineers eα = .2 .1 P(the | lciα ) iu ik ip

the

ip ik iu 1 0 0 Lγ×β (lId )= 0 1 0 0 0 1 P(iγ | iβ , lβ )

Syntactic vector (m)

iu : unknown

iu ik ip

iu ik ip

ip : people ik : known

Relation matrices (L) u ip ik i .6 .2 .2 Lγ×α (lM OD )= .2 .5 .3 .1 .2 .7 P(iγ | iα , lα )

eγ = f (eα , eβ , M, L)

iu ik ip

  .2 m(lMod NP  lMod DT lId NN) = .3 .4 – not purely syntactic P(lciγ → lcα lcβ ) Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP eα

(lM OD )DT

the

Stephen Wu

pulled off ... eβ

(lI D )NN

engineers

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            .6 .2 .2 0.0120 .2 .1 .5 1 0 0 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 .1 .2 .7 0.0048 .4 .1 .1 | {z } | {z } | | {z } | {z } {z } | {z } eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            .6 .2 .2 0.0120 .2 .1 .5 1 0 0 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 .1 .2 .7 0.0048 .4 .1 .1 | {z } | {z } | | {z } | {z } {z } | {z } eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            1 0 0 .6 .2 .2 .1 .5 .2 0.0120 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 .1 .2 .7 .1 .1 .4 0.0048 | {z } | {z } | | {z } {z } | {z } | {z } eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Composition Equation eǫ (lM OD )S

Composing “the engineers...” eγ (lM OD )NP

eγ = f ( eα , eβ , M, L )

(lM OD )DT

iu ik ip

= m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ )

the

pulled off ... eβ



(lI D )NN

engineers

            0.0120 .2 .1 .5 1 0 0 .6 .2 .2 0.0042= .3 ⊙ .2 .5 .3 .2 ⊙ 0 1 0 .1 0 0 1 0.0048 .4 .1 .1 .1 .2 .7 | {z } | {z } | {z } | {z } {z } | {z } | eγ M Lγ×α (lMod ) eα Lγ×β (lId ) eβ



What context? Choose between? ⇒ Consider dual problem of parsing! Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ



=

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )

XX

P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )



=



=









Semantic labels l, concepts i Standard equations Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ



=

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )

XX

P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )



=



=









Semantic labels l, concepts i Standard equations Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ



=

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )

XX

P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )



=



=









Semantic labels l, concepts i Standard equations Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ



=

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )

XX

P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )



=



=









Semantic labels l, concepts i Standard equations Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ



=

XX iα

=

XX iα

=



XX iα

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )



P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ ) | {z } P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )



Semantic labels l, concepts i Standard equations Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Dual Problem: Parsing eγ = m ⊙ ( Lγ×α eα ) ⊙ ( Lγ×β eβ ) X X P(iβ | iγ , lβ ) · P(xβ | lciβ ) P(iα | iγ , lα ) · P(xα | lciα ) · = P(lciγ → lcα lcβ ) · iβ



=

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(xα | lciα ) · P(iβ | iγ , lβ ) · P(xβ | lciβ )

XX

P(lciγ → lcα lcβ ) · P(iα | iγ , lα ) · P(iβ | iγ , lβ ) · P(xα | lciα ) · P(xβ | lciβ )

XX

P(lciγ → lciα lciβ ) · P(xα | lciα ) · P(xβ | lciβ )



=



=









Semantic labels l, concepts i Standard equations Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector



def

PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι



aTι eι

 K

Implied tree Similar at root

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector



def

PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι



aTι eι

 K

Implied tree Similar at root

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector



def

PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι



aTι eι

 K

Implied tree Similar at root

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector



def

PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι



aTι eι

 K

Implied tree Similar at root

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Most Likely Tree Compare vectors (aT ) X P(xγ , lcγ ) = P(lciγ )·P(xγ | lciγ ) = aTγ · eγ Best vector



def

PθVit(G) (xγ | lceγ ) = J eγ = arg max lceι



aTι eι

 K

Implied tree Similar at root

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Probability Models Syntactic model

m(lcγ lcα lcβ )[iγ , iγ ] =PθM (lciγ → lcα lcβ )

Semantic model

Lγ×ι(lι )[iγ , iι ] =PθL (iι | iγ , lι ) eγ [iγ ] =PθP-Vit(G) (xγ | lciγ ),

Preterminal model

for preterm γ

T

Root const. model

aǫ [iǫ ] =PπGǫ (lciǫ )

Any const. model

aT γ [iγ ] =PπG (lciγ )

Different instantiations

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Probability Models Syntactic model

m(lcγ lcα lcβ )[iγ , iγ ] =PθM (lciγ → lcα lcβ )

Semantic model

Lγ×ι(lι )[iγ , iι ] =PθL (iι | iγ , lι ) eγ [iγ ] =PθP-Vit(G) (xγ | lciγ ),

Preterminal model

for preterm γ

T

Root const. model

aǫ [iǫ ] =PπGǫ (lciǫ )

Any const. model

aT γ [iγ ] =PπG (lciγ )

Different instantiations

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Probability Models Syntactic model

m(lcγ lcα lcβ )[iγ , iγ ] =PθM (lciγ → lcα lcβ )

Semantic model

Lγ×ι(lι )[iγ , iι ] =PθL (iι | iγ , lι ) eγ [iγ ] =PθP-Vit(G) (xγ | lciγ ),

Preterminal model

for preterm γ

T

Root const. model

aǫ [iǫ ] =PπGǫ (lciǫ )

Any const. model

aT γ [iγ ] =PπG (lciγ )

Different instantiations

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

SVS Probability Models Syntactic model

m(lcγ lcα lcβ )[iγ , iγ ] =PθM (lciγ → lcα lcβ )

Semantic model

Lγ×ι(lι )[iγ , iι ] =PθL (iι | iγ , lι ) eγ [iγ ] =PθP-Vit(G) (xγ | lciγ ),

Preterminal model

for preterm γ

T

Root const. model

aǫ [iǫ ] =PπGǫ (lciǫ )

Any const. model

aT γ [iγ ] =PπG (lciγ )

Different instantiations

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Relationally-clustered headwords Headword Lexicalization    e= 

0 1 .. . 0

Relational clusters   p1

iaardvark iengineers .. . izygote

  

icluster1

e =  ...  ... p|e|

iengineers

icluster|e|

icluster1 (lM OD )NP

(lM OD )NP

ithe (lM OD )DT

iengineers (lI D )NN

icluster2 (lM OD )DT

icluster3

the

engineers

the

engineers

(lI D )NN

Inside–Outside Algorithm (EM)

−−−−−−−−−−−−−−−−−−−−−−−−−−−→ Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Relationally-clustered headwords Headword Lexicalization    e= 

0 1 .. . 0

Relational clusters   p1

iaardvark iengineers .. . izygote

  

icluster1

e =  ...  ... p|e|

iengineers

icluster|e|

icluster1 (lM OD )NP

(lM OD )NP

ithe (lM OD )DT

iengineers (lI D )NN

icluster2 (lM OD )DT

icluster3

the

engineers

the

engineers

(lI D )NN

Inside–Outside Algorithm (EM)

−−−−−−−−−−−−−−−−−−−−−−−−−−−→ Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

Relationally-clustered headwords Headword Lexicalization    e= 

0 1 .. . 0

Relational clusters   p1

iaardvark iengineers .. . izygote

  

icluster1

e =  ...  ... p|e|

iengineers

icluster|e|

icluster1 (lM OD )NP

(lM OD )NP

ithe (lM OD )DT

iengineers (lI D )NN

icluster2 (lM OD )DT

icluster3

the

engineers

the

engineers

(lI D )NN

Inside–Outside Algorithm (EM)

−−−−−−−−−−−−−−−−−−−−−−−−−−−→ Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Outline 1

Introduction Overview Related Work

2

Structured Vectorial Semantics Vector Composition Semantically-annotated Parsing Distributed Semantics in SVS

3

Evaluation Model Fit Parsing Speed Performance

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Model Fit Evaluation WSJ Sec 02–21 train, 23 test Binarized, subcategorized Non-syntactic information

Quantitative fit: Perplexity Models explain language Sec. 23, ‘unk’+‘num’ syntax only baseline

Stephen Wu

Perplexity 428.94

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Model Fit Evaluation WSJ Sec 02–21 train, 23 test Binarized, subcategorized Non-syntactic information

Quantitative fit: Perplexity Models explain language Sec. 23, ‘unk’+‘num’ syntax only baseline

Stephen Wu

Perplexity 428.94

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Model Fit Evaluation WSJ Sec 02–21 train, 23 test Binarized, subcategorized Non-syntactic information

Quantitative fit: Perplexity Models explain language Sec. 23, ‘unk’+‘num’ syntax only baseline rel’n clust. 1khw→005e

Stephen Wu

Perplexity 428.94 371.76

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

EM-learned Relational Clusters Clusters in syntactic context (plural nouns) Cluster i0 ‘money’ unk 0.431 cents 0.135 shares 0.084 yen 0.036 sales 0.025 points 0.023 marks 0.018 francs 0.018 tons 0.013 people 0.012

Cluster i1 ‘people’ officials 0.145 unk 0.141 years 0.132 shares 0.093 prices 0.061 people 0.050 stocks 0.032 sales 0.027 executives 0.024 analysts 0.018

Stephen Wu

Cluster i2 ‘companies’ unk 0.248 markets 0.056 companies 0.036 issues 0.035 firms 0.033 banks 0.030 loans 0.025 investors 0.024 contracts 0.022 stocks 0.021

Cluster i5 ‘time’ years 0.25 months 0.19 unk 0.18 days 0.12 weeks 0.06 points 0.03 companies 0.02 hours 0.02 people 0.01 units 0.01

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

EM-learned Relational Clusters Clusters in syntactic context (past-tense verbs) Cluster i1 ‘announcement’ unk 0.362 was 0.173 reported 0.097 posted 0.036 earned 0.029 filed 0.024 were 0.022 had 0.020 told 0.013 approved 0.013

Cluster i5 ‘change in value’ rose 0.137 fell 0.124 unk 0.116 gained 0.063 dropped 0.051 attributed 0.051 jumped 0.046 added 0.041 lost 0.039 advanced 0.022

Stephen Wu

Cluster i7 ‘change possession’ unk 0.381 had 0.065 was 0.062 took 0.036 bought 0.027 completed 0.025 received 0.024 were 0.023 got 0.018 made 0.018 acquired 0.016

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

WSJ Parsing accuracy and Relational clusters Are distributed semantics better? Sec. 23, length < 40 wds syntax-only baseline: headword-lex. 10hw: headword-lex. 50hw: rel’n clust. 50hw10 clust:

LR 83.32 83.10 83.09 83.67

LP 83.83 83.61 83.40 84.13

F 83.57 83.35 83.24 83.90

Are more clusters better? Sec. 23, length < 40 wds baseline1 clust 1000 hw5 clust, avg 1000 hw10 clust, avg 1000 hw15 clust, avg 1000 hw20 clust, avg Stephen Wu

LR 83.34 83.85 84.04 84.15 84.21

LP 83.90 84.23 84.40 84.38 84.42

F 83.62 84.04 84.21 84.26 84.31

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

WSJ Parsing accuracy and Relational clusters Are distributed semantics better? Sec. 23, length < 40 wds syntax-only baseline: headword-lex. 10hw: headword-lex. 50hw: rel’n clust. 50hw10 clust:

LR 83.32 83.10 83.09 83.67

LP 83.83 83.61 83.40 84.13

F 83.57 83.35 83.24 83.90

Are more clusters better? Sec. 23, length < 40 wds baseline1 clust 1000 hw5 clust, avg 1000 hw10 clust, avg 1000 hw15 clust, avg 1000 hw20 clust, avg Stephen Wu

LR 83.34 83.85 84.04 84.15 84.21

LP 83.90 84.23 84.40 84.38 84.42

F 83.62 84.04 84.21 84.26 84.31

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

WSJ Parsing accuracy and Relational clusters Are distributed semantics better? Sec. 23, length < 40 wds syntax-only baseline: headword-lex. 10hw: headword-lex. 50hw: rel’n clust. 50hw10 clust:

LR 83.32 83.10 83.09 83.67

LP 83.83 83.61 83.40 84.13

F 83.57 83.35 83.24 83.90

Are more clusters better? Sec. 23, length < 40 wds baseline1 clust 1000 hw5 clust, avg 1000 hw10 clust, avg 1000 hw15 clust, avg 1000 hw20 clust, avg Stephen Wu

LR 83.34 83.85 84.04 84.15 84.21

LP 83.90 84.23 84.40 84.38 84.42

F 83.62 84.04 84.21 84.26 84.31

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized

Efficient operations

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized

Average Parsing Time (s)

500 Non−vectorized Vectorized

400 300 200 100 0

0

5

10

15 20 25 Sentence Length

30

Efficient operations

Stephen Wu

Structured Composition of Semantic Vectors

35

40

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized

Average Parsing Time (s)

500 Non−vectorized Vectorized

400 300 200 100 0

0

5

10

15 20 25 Sentence Length

30

Efficient operations

Stephen Wu

Structured Composition of Semantic Vectors

35

40

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized

Average Parsing Time (s)

500 Non−vectorized Vectorized

400 300 200 100 0

0

5

10

15 20 25 Sentence Length

30

Efficient operations

Stephen Wu

Structured Composition of Semantic Vectors

35

40

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Parsing Speed with Vectors Extra operations Slower Vectorization improves speed O(n3 ) runtime Coefficients? 0.66505 un-vectorized 0.00267 vectorized

Average Parsing Time (s)

500 Non−vectorized Vectorized

400 300 200 100 0

0

5

10

15 20 25 Sentence Length

30

Efficient operations

Stephen Wu

Structured Composition of Semantic Vectors

35

40

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Conclusion: Structured Vectorial Semantics Addressing weaknesses No compositionality ← Phrasal semantics Bag-of-words ← Context

Relational-clustering SVS Distributed semantics + Latent-annotation parsing Broad-coverage

Evaluation Perplexity reduction Qualitative clusters Mild parsing gains Tractability

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Conclusion: Structured Vectorial Semantics Addressing weaknesses No compositionality ← Phrasal semantics Bag-of-words ← Context

Relational-clustering SVS Distributed semantics + Latent-annotation parsing Broad-coverage

Evaluation Perplexity reduction Qualitative clusters Mild parsing gains Tractability

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Conclusion: Structured Vectorial Semantics Addressing weaknesses No compositionality ← Phrasal semantics Bag-of-words ← Context

Relational-clustering SVS Distributed semantics + Latent-annotation parsing Broad-coverage

Evaluation Perplexity reduction Qualitative clusters Mild parsing gains Tractability

Stephen Wu

Structured Composition of Semantic Vectors

Introduction Structured Vectorial Semantics Evaluation

Model Fit Parsing Speed Performance

Thank you! [email protected]

Stephen Wu

Structured Composition of Semantic Vectors

Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule





P(iγ , iα , iβ | lcγ , lcα , lcβ ) =



PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧

P(lchǫ )

Weight against real data ∧

M-step:



:

P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη  lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )

Estimate grammar rules Imagine annotations

Frequency count

P



PθL (iη0 | iη ; lη0 ) ← P ∧

:

clη ,cη0 ,cliη1

P(lciη , lciη0 , lciη1 )

clη ,ciη0 ,cliη1

P(lciη , lciη0 , lciη1 )

:

:

P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu

Structured Composition of Semantic Vectors

Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule





P(iγ , iα , iβ | lcγ , lcα , lcβ ) =



PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧

P(lchǫ )

Weight against real data ∧

M-step:



:

P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη  lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )

Estimate grammar rules Imagine annotations

Frequency count

P



PθL (iη0 | iη ; lη0 ) ← P ∧

:

clη ,cη0 ,cliη1

P(lciη , lciη0 , lciη1 )

clη ,ciη0 ,cliη1

P(lciη , lciη0 , lciη1 )

:

:

P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu

Structured Composition of Semantic Vectors

Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule





P(iγ , iα , iβ | lcγ , lcα , lcβ ) =



PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧

P(lchǫ )

Weight against real data ∧

M-step:



:

P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη  lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )

Estimate grammar rules Imagine annotations

Frequency count

P



PθL (iη0 | iη ; lη0 ) ← P ∧

:

clη ,cη0 ,cliη1

P(lciη , lciη0 , lciη1 )

clη ,ciη0 ,cliη1

P(lciη , lciη0 , lciη1 )

:

:

P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu

Structured Composition of Semantic Vectors

Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule





P(iγ , iα , iβ | lcγ , lcα , lcβ ) =



PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧

P(lchǫ )

Weight against real data ∧

M-step:



:

P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη  lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )

Estimate grammar rules Imagine annotations

Frequency count

P



PθL (iη0 | iη ; lη0 ) ← P ∧

:

clη ,cη0 ,cliη1

P(lciη , lciη0 , lciη1 )

clη ,ciη0 ,cliη1

P(lciη , lciη0 , lciη1 )

:

:

P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu

Structured Composition of Semantic Vectors

Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule





P(iγ , iα , iβ | lcγ , lcα , lcβ ) =



PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧

P(lchǫ )

Weight against real data ∧

M-step:



:

P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη  lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )

Estimate grammar rules Imagine annotations

Frequency count

P



PθL (iη0 | iη ; lη0 ) ← P ∧

:

clη ,cη0 ,cliη1

P(lciη , lciη0 , lciη1 )

clη ,ciη0 ,cliη1

P(lciη , lciη0 , lciη1 )

:

:

P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu

Structured Composition of Semantic Vectors

Inside–Outside Algorithm (EM) E-step: Estimates → annot. rule





P(iγ , iα , iβ | lcγ , lcα , lcβ ) =



PθOut (lciγ , lchǫ −lchγ ) · PθIns (lchγ | lciγ ) ∧

P(lchǫ )

Weight against real data ∧

M-step:



:

P(lciγ ,lciα ,lciβ ) = P(iγ ,iα ,iβ |lcγ ,lcα ,lcβ ) · P(lcγ ,lcα ,lcβ ) : P ∧ iη0 ,iη1 P(lciη , lciη0 , lciη1 ) PθM (lciη  lcη0 , lcη1 ) ← P : latent cliη0 ,cliη1 P(lciη , lciη0 , lciη1 )

Estimate grammar rules Imagine annotations

Frequency count

P



PθL (iη0 | iη ; lη0 ) ← P ∧

:

clη ,cη0 ,cliη1

P(lciη , lciη0 , lciη1 )

clη ,ciη0 ,cliη1

P(lciη , lciη0 , lciη1 )

:

:

P(lciη , −, −) PθH (hη | lciη ) ← P : hη P(lciη , −, −) Stephen Wu

Structured Composition of Semantic Vectors

Relational Clustering SVS Five SVS models to train Syntactic model

PθM (lciγ → lcα lcβ )

estimated in EM

Semantic model

PθL (iι | iγ , lι )

estimated in EM

PθP-Vit(G) (xγ | lciγ )

backed off from EM

Root const. model

PπGǫ (lciǫ )

byproduct of EM

Any const. model

PπG (lciγ )

byproduct of EM

Preterminal model

∧ P (x | lci ) η θH η PθP-Vit(G) (xη | lciη ) = ∧ P θP-Vit(G) (xη | cη )· PθH (unk | lciη ) ∧

Preterminal model



Root const. model Any const. model

def



PπGǫ (lciǫ ) = PθOut (lciǫ , lchǫ −lchǫ ) X: ∧ def P(lciη , lciη0 , lciη1 ) PπG (lciη ) = lciη0 ,lciη1

Stephen Wu

Structured Composition of Semantic Vectors

xη ∈ H xη 6∈ H

Relational Clustering SVS Five SVS models to train Syntactic model

PθM (lciγ → lcα lcβ )

estimated in EM

Semantic model

PθL (iι | iγ , lι )

estimated in EM

PθP-Vit(G) (xγ | lciγ )

backed off from EM

Root const. model

PπGǫ (lciǫ )

byproduct of EM

Any const. model

PπG (lciγ )

byproduct of EM

Preterminal model

∧ P (x | lci ) η θH η PθP-Vit(G) (xη | lciη ) = ∧ P θP-Vit(G) (xη | cη )· PθH (unk | lciη ) ∧

Preterminal model



Root const. model Any const. model

def



PπGǫ (lciǫ ) = PθOut (lciǫ , lchǫ −lchǫ ) X: ∧ def P(lciη , lciη0 , lciη1 ) PπG (lciη ) = lciη0 ,lciη1

Stephen Wu

Structured Composition of Semantic Vectors

xη ∈ H xη 6∈ H

Relational Clustering SVS Five SVS models to train Syntactic model

PθM (lciγ → lcα lcβ )

estimated in EM

Semantic model

PθL (iι | iγ , lι )

estimated in EM

PθP-Vit(G) (xγ | lciγ )

backed off from EM

Root const. model

PπGǫ (lciǫ )

byproduct of EM

Any const. model

PπG (lciγ )

byproduct of EM

Preterminal model

∧ P (x | lci ) η θH η PθP-Vit(G) (xη | lciη ) = ∧ P θP-Vit(G) (xη | cη )· PθH (unk | lciη ) ∧

Preterminal model



Root const. model Any const. model

def



PπGǫ (lciǫ ) = PθOut (lciǫ , lchǫ −lchǫ ) X: ∧ def P(lciη , lciη0 , lciη1 ) PπG (lciη ) = lciη0 ,lciη1

Stephen Wu

Structured Composition of Semantic Vectors

xη ∈ H xη 6∈ H

Structured Composition of Semantic Vectors

Speed Performance. Stephen Wu ... Speed Performance. Stephen Wu ..... SVS Composition Components. Word vector in context (e) the eα = iu ik ip. .1 .2 .1.

1MB Sizes 1 Downloads 444 Views

Recommend Documents

Vectors
The gravitational field strength, g, gravitational field strength, g, gravitational field strength, g, of a planet is the force exerted per unit of mass of an object (that is on that planet). Gravitational field strength therefore has units of N kg-1

Probing for semantic evidence of composition by ...
information. 1 Introduction. Sentence-level meaning representations, when formed from word-level representations, require a process of composition. Central to evaluation of sentence-level vector ... testing for extractability of semantic information

structured language modeling for speech ... - Semantic Scholar
20Mwds (a subset of the training data used for the baseline 3-gram model), ... it assigns probability to word sequences in the CSR tokenization and thus the ...

Filters for Efficient Composition of Weighted Finite ... - Semantic Scholar
ter and presents various filters that process epsilon transitions, look- ahead along ... in OpenFst [3], an open-source weighted transducer library. 2 Composition ...

Prediction of Thematic Rank for Structured Semantic ...
To import structural information, re-ranking technique is emplied to incorporate ... the subject is the O [=Object, i.e., Patient/Theme]. ▻ John broke the window ...

Vectors - PDFKUL.COM
consequences. Re-entry into the Atmosphere Gravitational potential energy is converted into kinetic energy as a space craft re-enters the atmosphere. Due ... a massive ball of gas which is undergoing nuclear fusion. Produces vast .... populated areas

Probabilistic relevance feedback with binary semantic feature vectors ...
For information retrieval, relevance feedback is an important technique. This paper proposes a relevance feedback technique which is based on a probabilistic framework. The binary feature vectors in our experiment are high-level semantic features of

Independence of Perpendicular Vectors WARREN.pdf
He pulls out an arrow and notches it. Determine. the velocity of his arrow over the ground ... north side. Determine the heading. Huck must take, and the time it will take him to cross the 2.0 km wide Mississippi River. Page 2 of 3 ... Independence o

Efficient Inference and Structured Learning for ... - Semantic Scholar
1 Introduction. Semantic role .... speech tag).1 For each lexical unit, a list of senses, or frames ..... start, and then subtract them whenever we assign a role to a ...

Vectors & Scalars 1 Vectors & Scalars 1
3) A baseball player runs 27.4 meters from the batter's box to first base, overruns first base by 3.0 meters, and then returns to first base. Compared to the total distance traveled by the player, ... the second kilometer in 6.2 minutes, the third ki

A Structured SVM Semantic Parser Augmented by ... - CiteSeerX
We formulate the problem of semantic tagging as a sequence learning using a conditional random field models ... constructing database interfaces require an expert to hand-craft an appropriate semantic parser. (Allen, 95). ... We also utilize the adva

A Structured SVM Semantic Parser Augmented by ...
modification of the Viterbi algorithm can be applied. The detail of the training and inference CRFs has been introduced in (Lafferty, et.al., 2001) and (Sha 2003).

Vectors - with mr mackenzie
National 5 Physics Summary Notes. Dynamics & Space. 3. F. Kastelein ..... galaxy. Universe everything we know to exist, all stars planets and galaxies. Scale of ...

Vectors - with mr mackenzie
beyond the limits of our solar system. Space exploration may also refer simply to the use of satellites, placed in orbit around the. Earth. Satellites. The Moon is a ...

Vectors 2D Answers.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Vectors 2D ...

Predicate Vectors If You Must
static techniques. We implemented our optimization using LLVM [6], as part of ...... In MICRO, 2011. [4] http://developer.android.com/guide/topics/renderscript/.

Method of motion-picture composition
As an illustration of an application of this invention, it is .... the desired background, making a positive. 'therefrom .... projected picture by-creating on said screen '.

cxcDirect Vectors tutorial.pdf
Page 3 of 7. cxcDirect Vectors tutorial.pdf. cxcDirect Vectors tutorial.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying cxcDirect Vectors tutorial.pdf.

Independence of Vectors in Codes over Rings
Jun 22, 2011 - to that of codes over finite local Frobenius rings. Remark 1. The following is an example of a ring that is a local Frobenius ring but not a chain ring. We shall use this ring to exhibit several of the results of the paper. Example 1.

short report: identification in triatomine vectors of ...
mained undetermined (10.5%) and 11 (6.4%) showed a com- plex pattern that could be explained by multiple blood sources and sequencing reactions were unreadable. Figure 1 shows the different HDA patterns observed among T. longi- pennis blood meal samp

Intrinsic Evaluation of Word Vectors Fails to Predict ... -
ten word similarity benchmarks and tagger performance on three standard sequence labeling tasks using a variety of word vec- tors induced from an unannotated corpus of 3.8 billion words, and demonstrate that most intrinsic evaluations are poor predic

The Philosophy of Composition
He first involved his hero in a web of difficulties, forming the ... the tone at all points, tend to the development of the intention. 2. There is a .... refrain, the division of the poem into stanzas was, of course, a corollary: the refrain forming