Complexity Metrics in an Incremental, Right-corner Parser

Viewer
Transcript

Introduction

Background

Complexity Metrics

Evaluation

Complexity Metrics in an Incremental, Right-corner Parser Stephen Wu, Asaf Bachrach, Carlos Cardenas, William Schuler University of Minnesota, INSERM-CEA, MIT, University of Minnesota and The Ohio State University

July 14, 2010 | ACL 2010

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Cognitively plausible Manner Memory limits Ex:

(Miller ’56, Cowan ’01) the nurse

Center-embedding

(Miller & Chomsky ’63, Gibson ’98)

Harder to understand Incremental Processing

(Tanenhaus et al. ’95)

Predictions: lexical, syntactic Context-dependent Right-corner HHMM parser PCFG Right-corner grammar (bounded memory) Chart parser HHMM (incremental processing)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Cognitively plausible Manner Memory limits Ex:

(Miller ’56, Cowan ’01)

the intern the nurse supervised

Center-embedding

(Miller & Chomsky ’63, Gibson ’98)

Harder to understand Incremental Processing

(Tanenhaus et al. ’95)

Predictions: lexical, syntactic Context-dependent Right-corner HHMM parser PCFG Right-corner grammar (bounded memory) Chart parser HHMM (incremental processing)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Cognitively plausible Manner Memory limits

(Miller ’56, Cowan ’01)

Ex: The drug the intern the nurse supervised administered

Center-embedding

(Miller & Chomsky ’63, Gibson ’98)

Harder to understand Incremental Processing

(Tanenhaus et al. ’95)

Predictions: lexical, syntactic Context-dependent Right-corner HHMM parser PCFG Right-corner grammar (bounded memory) Chart parser HHMM (incremental processing)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Cognitively plausible Manner Memory limits

(Miller ’56, Cowan ’01)

Ex: The drug the intern the nurse supervised administered cured the patient.

Center-embedding

(Miller & Chomsky ’63, Gibson ’98)

Harder to understand Incremental Processing

(Tanenhaus et al. ’95)

Predictions: lexical, syntactic Context-dependent Right-corner HHMM parser PCFG Right-corner grammar (bounded memory) Chart parser HHMM (incremental processing)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Cognitively plausible Manner Memory limits

(Miller ’56, Cowan ’01)

Ex: The drug [the intern [the nurse supervised] administered] cured the patient.

Center-embedding

(Miller & Chomsky ’63, Gibson ’98)

Harder to understand Incremental Processing

(Tanenhaus et al. ’95)

Predictions: lexical, syntactic Context-dependent Right-corner HHMM parser PCFG Right-corner grammar (bounded memory) Chart parser HHMM (incremental processing)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Cognitively plausible Manner Memory limits

(Miller ’56, Cowan ’01)

Ex: The drug [the intern [the nurse supervised] administered] cured the patient.

Center-embedding

(Miller & Chomsky ’63, Gibson ’98)

Harder to understand Incremental Processing

(Tanenhaus et al. ’95)

Predictions: lexical, syntactic Context-dependent Right-corner HHMM parser PCFG Right-corner grammar (bounded memory) Chart parser HHMM (incremental processing)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Cognitively plausible Manner Memory limits

(Miller ’56, Cowan ’01)

Ex: The drug [the intern [the nurse supervised] administered] cured the patient.

Center-embedding

(Miller & Chomsky ’63, Gibson ’98)

Harder to understand Incremental Processing

(Tanenhaus et al. ’95)

Ex: The left striker kicked the

Predictions: lexical, syntactic Context-dependent Right-corner HHMM parser PCFG Right-corner grammar (bounded memory) Chart parser HHMM (incremental processing)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Cognitively plausible Manner Memory limits

(Miller ’56, Cowan ’01)

Ex: The drug [the intern [the nurse supervised] administered] cured the patient.

Center-embedding

(Miller & Chomsky ’63, Gibson ’98)

Harder to understand Incremental Processing

(Tanenhaus et al. ’95)

Ex: The left striker kicked the penalty.

Predictions: lexical, syntactic Context-dependent Right-corner HHMM parser PCFG Right-corner grammar (bounded memory) Chart parser HHMM (incremental processing)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Cognitively plausible Manner Memory limits

(Miller ’56, Cowan ’01)

Ex: The drug [the intern [the nurse supervised] administered] cured the patient.

Center-embedding

(Miller & Chomsky ’63, Gibson ’98)

Harder to understand Incremental Processing

(Tanenhaus et al. ’95)

Ex: The left striker kicked the penalty.

Predictions: lexical, syntactic Context-dependent Right-corner HHMM parser PCFG Right-corner grammar (bounded memory) Chart parser HHMM (incremental processing)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Cognitively plausible Manner Memory limits

(Miller ’56, Cowan ’01)

Ex: The drug [the intern [the nurse supervised] administered] cured the patient.

Center-embedding

(Miller & Chomsky ’63, Gibson ’98)

Harder to understand Incremental Processing

(Tanenhaus et al. ’95)

Ex: The left striker kicked the bucket.

Predictions: lexical, syntactic Context-dependent Right-corner HHMM parser PCFG Right-corner grammar (bounded memory) Chart parser HHMM (incremental processing)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Cognitively plausible Manner Memory limits

(Miller ’56, Cowan ’01)

Ex: The drug [the intern [the nurse supervised] administered] cured the patient.

Center-embedding

(Miller & Chomsky ’63, Gibson ’98)

Harder to understand Incremental Processing

(Tanenhaus et al. ’95)

Ex: The left striker kicked the penalty.

Predictions: lexical, syntactic Context-dependent Right-corner HHMM parser PCFG Right-corner grammar (bounded memory) Chart parser HHMM (incremental processing)

Introduction

Background

Complexity Metrics

Evaluation

Does this match human sentence processing? Corpus studies (Schuler et al. ’08) HHMM depth limit no memory 1 memory element 2 memory elements 3 memory elements 4 memory elements 5 memory elements TOTAL

sentences 127 3,496 25,909 38,902 39,816 39,832 39,832

coverage 0.32% 8.78% 65.05% 97.67% 99.96% 100.00% 100.00%

Parsing Acuracy (Schuler et al. ’10) with punctuation: (≤ 40 wds) KM’03: unmodified, devset KM’03: par+sib, devset CKY: binarized, devset HHMM: par+sib, devset CKY: binarized, sect 23 HHMM: par+sib, sect 23

LR − − 80.3 84.1 78.8 83.4

LP − − 79.9 83.5 79.4 83.7

F 72.6 77.4 80.1 83.8 79.1 83.5

sentence failure 0 0 0.8 0.5 0.1 0.1

Predictions of reading difficulty? (this paper)

error reduction 17.5% 18.6% 21.1%

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Does this match human sentence processing? Corpus studies (Schuler et al. ’08) HHMM depth limit no memory 1 memory element 2 memory elements 3 memory elements 4 memory elements 5 memory elements TOTAL

sentences 127 3,496 25,909 38,902 39,816 39,832 39,832

coverage 0.32% 8.78% 65.05% 97.67% 99.96% 100.00% 100.00%

Parsing Acuracy (Schuler et al. ’10) with punctuation: (≤ 40 wds) KM’03: unmodified, devset KM’03: par+sib, devset CKY: binarized, devset HHMM: par+sib, devset CKY: binarized, sect 23 HHMM: par+sib, sect 23

LR − − 80.3 84.1 78.8 83.4

LP − − 79.9 83.5 79.4 83.7

F 72.6 77.4 80.1 83.8 79.1 83.5

sentence failure 0 0 0.8 0.5 0.1 0.1

Predictions of reading difficulty? (this paper)

error reduction 17.5% 18.6% 21.1%

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Does this match human sentence processing? Corpus studies (Schuler et al. ’08) HHMM depth limit no memory 1 memory element 2 memory elements 3 memory elements 4 memory elements 5 memory elements TOTAL

sentences 127 3,496 25,909 38,902 39,816 39,832 39,832

coverage 0.32% 8.78% 65.05% 97.67% 99.96% 100.00% 100.00%

Parsing Acuracy (Schuler et al. ’10) with punctuation: (≤ 40 wds) KM’03: unmodified, devset KM’03: par+sib, devset CKY: binarized, devset HHMM: par+sib, devset CKY: binarized, sect 23 HHMM: par+sib, sect 23

LR − − 80.3 84.1 78.8 83.4

LP − − 79.9 83.5 79.4 83.7

F 72.6 77.4 80.1 83.8 79.1 83.5

sentence failure 0 0 0.8 0.5 0.1 0.1

Predictions of reading difficulty? (this paper)

error reduction 17.5% 18.6% 21.1%

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Does this match human sentence processing? Corpus studies (Schuler et al. ’08) HHMM depth limit no memory 1 memory element 2 memory elements 3 memory elements 4 memory elements 5 memory elements TOTAL

sentences 127 3,496 25,909 38,902 39,816 39,832 39,832

coverage 0.32% 8.78% 65.05% 97.67% 99.96% 100.00% 100.00%

Parsing Acuracy (Schuler et al. ’10) with punctuation: (≤ 40 wds) KM’03: unmodified, devset KM’03: par+sib, devset CKY: binarized, devset HHMM: par+sib, devset CKY: binarized, sect 23 HHMM: par+sib, sect 23

LR − − 80.3 84.1 78.8 83.4

LP − − 79.9 83.5 79.4 83.7

F 72.6 77.4 80.1 83.8 79.1 83.5

sentence failure 0 0 0.8 0.5 0.1 0.1

Predictions of reading difficulty? (this paper)

error reduction 17.5% 18.6% 21.1%

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Does this match human sentence processing? Corpus studies (Schuler et al. ’08) HHMM depth limit no memory 1 memory element 2 memory elements 3 memory elements 4 memory elements 5 memory elements TOTAL

sentences 127 3,496 25,909 38,902 39,816 39,832 39,832

coverage 0.32% 8.78% 65.05% 97.67% 99.96% 100.00% 100.00%

Parsing Acuracy (Schuler et al. ’10) with punctuation: (≤ 40 wds) KM’03: unmodified, devset KM’03: par+sib, devset CKY: binarized, devset HHMM: par+sib, devset CKY: binarized, sect 23 HHMM: par+sib, sect 23

LR − − 80.3 84.1 78.8 83.4

LP − − 79.9 83.5 79.4 83.7

F 72.6 77.4 80.1 83.8 79.1 83.5

sentence failure 0 0 0.8 0.5 0.1 0.1

Predictions of reading difficulty? (this paper)

error reduction 17.5% 18.6% 21.1%

Conclusion

Introduction

Background

Complexity Metrics

Reading Times

Linear mixed-effect regression

Evaluation

Conclusion

(Bachrach et al)

Introduction

Background

Complexity Metrics

Reading Times

Linear mixed-effect regression

Evaluation

Conclusion

(Bachrach et al)

Introduction

Background

Complexity Metrics

Reading Times

Linear mixed-effect regression

Evaluation

Conclusion

(Bachrach et al)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Right-corner transform S

S

NP

S/NN DT

NN

VBD

NN

NP S/NP

the engineers

NN

S/NN

VP

VBD

PRT

DT

pulled

off

an

NN NN

NN

engineering trick

VBD

NP

VBD/PRT

NP/NN

NN

VBD

DT

engineers

pulled

the

Incremental by nature (Active/Awaited) Flatter structure Trunks and memory

DT

S/VP

an PRT off

trick

engineering

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Right-corner transform S

S

NP

S/NN DT

NN

VBD

NN

NP S/NP

the engineers

NN

S/NN

VP

VBD

PRT

DT

pulled

off

an

NN NN

NN

engineering trick

VBD

NP

VBD/PRT

NP/NN

NN

VBD

DT

engineers

pulled

the

Incremental by nature (Active/Awaited) Flatter structure Trunks and memory

DT

S/VP

an PRT off

trick

engineering

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Right-corner transform S

S

NP

S/NN DT

NN

VBD

NN

NP S/NP

the engineers

NN

S/NN

VP

VBD

PRT

DT

pulled

off

an

NN NN

NN

engineering trick

VBD

NP

VBD/PRT

NP/NN

NN

VBD

DT

engineers

pulled

the

Incremental by nature (Active/Awaited) Flatter structure Trunks and memory

DT

S/VP

an PRT off

trick

engineering

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Right-corner transform S

S

NP

S/NN

VP S/NN

DT

NN

VBD

NP S/NP

the engineers

VBD

PRT

DT

pulled

off

an

NN NN

NN

engineering trick

VBD

NP

VBD/PRT

NP/NN

NN

VBD

DT

engineers

pulled

the

Incremental by nature (Active/Awaited) Flatter structure Trunks and memory

DT

S/VP

NN NN

an PRT off

trick

engineering

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Right-corner transform S

S

NP

S/NN

VP S/NN

DT

NN

VBD

NP S/NP

the engineers

VBD

PRT

DT

pulled

off

an

NN NN

NN

engineering trick

VBD

NP

VBD/PRT

NP/NN

NN

VBD

DT

engineers

pulled

the

Incremental by nature (Active/Awaited) Flatter structure Trunks and memory

DT

S/VP

NN NN

an PRT off

trick

engineering

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Right-corner transform S

S

NP

S/NN DT

NN

VBD

NN

NP S/NP

the engineers

NN

S/NN

VP

VBD

PRT

DT

pulled

off

an

NN NN

NN

engineering trick

VBD

NP

VBD/PRT

NP/NN

NN

VBD

DT

engineers

pulled

the

Incremental by nature (Active/Awaited) Flatter structure Trunks and memory

DT

S/VP

an PRT off

trick

engineering

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Right-corner transform S

S

NP

S/NN DT

NN

VBD

NN

NP S/NP

the engineers

NN

S/NN

VP

VBD

PRT

DT

pulled

off

an

NN NN

NN

engineering trick

VBD

NP

VBD/PRT

NP/NN

NN

VBD

DT

engineers

pulled

the

Incremental by nature (Active/Awaited) Flatter structure Trunks and memory

DT

S/VP

an PRT off

trick

engineering

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Parsing with a Hierarchic Hidden Markov Model t=1

t=4

t=5

t=6

S/

···

···

P

N

N

VP

N

S/

S/

nn

VB

···

D rt /p

d vb

g in er ne gi en

an

f of

d lle pu

s er ne gi en

e th

word

VP

P/

d=2

t=3

S/

N

dt

d=1

t=2

···

1 center-embedding depth/memory element per trunk 1 word per time Many hypotheses (partial trees)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Parsing with a Hierarchic Hidden Markov Model t=1

t=4

t=5

t=6

···

S/ VP D

VB rt /p

d vb

f of

d lle pu

word

VP

s er ne gi en

nn

P/

e th

d=2

t=3

S/

N

dt

d=1

t=2

1 center-embedding depth/memory element per trunk 1 word per time Many hypotheses (partial trees)

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

The Trellis in HHMM Inference

◦

◦

◦

◦

◦

◦

◦

◦

◦

◦

.. . Incomplete trees in parallel Ranked by P(q1..t o1..t ) Beam Bt Viterbi (use backpointers)

◦

◦

◦

◦

d =1

···

◦

◦

◦

◦

◦

d =2

◦

d =1

d =2

.. .

◦

◦

◦

◦

◦

◦

◦

◦

◦

d =2

◦

.. .

d=3

◦

◦

◦

d=2

d=1

d=2

◦

◦

d=1

◦

◦

d =1

◦

d=1

◦

d =2

◦

◦

◦

◦

d =1

d=2

d=2

···

◦

d=1

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

The Trellis in HHMM Inference

◦

◦

◦

◦

◦

◦

◦

◦

◦

◦

.. . Incomplete trees in parallel Ranked by P(q1..t o1..t ) Beam Bt Viterbi (use backpointers)

◦

◦

◦

◦

d =1

···

◦

◦

◦

◦

◦

d =2

◦

d =1

d =2

.. .

◦

◦

◦

◦

◦

◦

◦

◦

◦

d =2

◦

.. .

d=3

◦

◦

◦

d=2

d=1

d=2

◦

◦

d=1

◦

◦

d =1

◦

d=1

◦

d =2

◦

◦

◦

◦

d =1

d=2

d=2

···

◦

d=1

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

The Trellis in HHMM Inference

◦

◦

◦

◦

◦

◦

◦

◦

◦

◦

.. . Incomplete trees in parallel Ranked by P(q1..t o1..t ) Beam Bt Viterbi (use backpointers)

◦

◦

◦

◦

d =1

···

◦

◦

◦

◦

◦

d =2

◦

d =1

d =2

.. .

◦

◦

◦

◦

◦

◦

◦

◦

◦

d =2

◦

.. .

d=3

◦

◦

◦

d=2

d=1

d=2

◦

◦

d=1

◦

◦

d =1

◦

d=1

◦

d =2

◦

◦

◦

◦

d =1

d=2

d=2

···

◦

d=1

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

The Trellis in HHMM Inference

◦

◦

◦

◦

◦

◦

◦

◦

◦

◦

.. . Incomplete trees in parallel Ranked by P(q1..t o1..t ) Beam Bt Viterbi (use backpointers)

◦

◦

◦

◦

d =1

···

◦

◦

◦

◦

◦

d =2

◦

d =1

d =2

.. .

◦

◦

◦

◦

◦

◦

◦

◦

◦

d =2

◦

.. .

d=3

◦

◦

◦

d=2

d=1

d=2

◦

◦

d=1

◦

◦

d =1

◦

d=1

◦

d =2

◦

◦

◦

◦

d =1

d=2

d=2

···

◦

d=1

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

The Trellis in HHMM Inference

◦

◦

◦

◦

◦

◦

◦

◦

◦

◦

.. . Incomplete trees in parallel Ranked by P(q1..t o1..t ) Beam Bt Viterbi (use backpointers)

◦

◦

◦

◦

d =1

···

◦

◦

◦

◦

◦

d =2

◦

d =1

d =2

.. .

◦

◦

◦

◦

◦

◦

◦

◦

◦

d =2

◦

.. .

d=3

◦

◦

◦

d=2

d=1

d=2

◦

◦

d=1

◦

◦

d =1

◦

d=1

◦

d =2

◦

◦

◦

◦

d =1

d=2

d=2

···

◦

d=1

Introduction

Background

Surprisal

Complexity Metrics

Evaluation

Conclusion

(Hale ’01; Levy ’07; Demberg & Keller ’08; Boston et al ’08; Roark et al ’09)

How much each word narrows possible interpretations

Introduction

Background

Surprisal

Complexity Metrics

Evaluation

(Hale ’01; Levy ’07; Demberg & Keller ’08; Boston et al ’08; Roark et al ’09)

How much each word narrows possible interpretations Pre(o1..t ) =

X

P(q1..t o1..t )

q1..t

◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦ ◦ ◦

.. .

.. .

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦

···

Conclusion

···

.. .

Introduction

Background

Surprisal

Complexity Metrics

Evaluation

(Hale ’01; Levy ’07; Demberg & Keller ’08; Boston et al ’08; Roark et al ’09)

How much each word narrows possible interpretations Pre(o1..t ) =

X q1..t

|{z}

P(q1..t o1..t ) | {z } trellis prob.

from qt

◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦ ◦ ◦

.. .

.. .

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦

···

Conclusion

···

.. .

Introduction

Background

Surprisal

Complexity Metrics

Evaluation

(Hale ’01; Levy ’07; Demberg & Keller ’08; Boston et al ’08; Roark et al ’09)

How much each word narrows possible interpretations Pre(o1..t ) =

X

P(q1..t o1..t )

q1..t

Surprisal(t) = log2 Pre(o1..t-1 ) − log2 Pre(o1..t )

◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦ ◦ ◦

.. .

.. .

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦

···

Conclusion

···

.. .

Introduction

Background

Complexity Metrics

Evaluation

Embedding Difference Explicit representation for center-embedding “Memory cost” (Gibson ’98, ’00)

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Embedding Difference Explicit representation for center-embedding “Memory cost” (Gibson ’98, ’00) µEMB (o1..t ) =

X

d(qt ) · P(o1..t q1..t )/

P(o1..t q′1..t )

q′t ∈Bt

qt ∈Bt

◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦ ◦ ◦

.. .

.. .

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦

···

X

···

.. .

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Embedding Difference Explicit representation for center-embedding “Memory cost” (Gibson ’98, ’00) frontier depth

X X z }| { P(o1..t q′1..t ) µEMB (o1..t ) = d(qt ) · P(o1..t q1..t ) / | {z } ′ qt ∈Bt

◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦ ◦ ◦

.. .

.. .

qt ∈Bt

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦

···

trellis prob.

···

.. .

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Embedding Difference Explicit representation for center-embedding “Memory cost” (Gibson ’98, ’00) µEMB (o1..t ) =

X

d(qt ) · P(o1..t q1..t )/

P(o1..t q′1..t )

q′t ∈Bt

qt ∈Bt

◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦ ◦ ◦

.. .

.. .

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦

···

X

···

.. .

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Embedding Difference Explicit representation for center-embedding “Memory cost” (Gibson ’98, ’00) µEMB (o1..t ) =

X

d(qt ) · P(o1..t q1..t )/

X

P(o1..t q′1..t )

q′t ∈Bt

qt ∈Bt

EmbDiff(o1..t ) = µEMB (o1..t ) − µEMB (o1..t−1 )

◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦ ◦ ◦

.. .

.. .

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦

···

···

.. .

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Entropy Reduction How much info is in a probability distribution

Conclusion

(Hale ’03, Hale ’06, etc)

Introduction

Background

Complexity Metrics

Evaluation

Entropy Reduction

(Hale ’03, Hale ’06, etc)

How much info is in a probability distribution Ht =

Conclusion

X q1..t

P(q1..t o1..t ) log2 P(q1..t o1..t )

Introduction

Background

Complexity Metrics

Evaluation

Entropy Reduction

(Hale ’03, Hale ’06, etc)

How much info is in a probability distribution Ht =

X

P(q1..t o1..t ) log2 P(q1..t o1..t )

q1..t

◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦ ◦ ◦

.. .

.. .

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦

···

Conclusion

···

.. .

Introduction

Background

Complexity Metrics

Evaluation

Entropy Reduction

(Hale ’03, Hale ’06, etc)

How much info is in a probability distribution Ht =

X

P(q1..t o1..t ) log2 P(q1..t o1..t )

q1..t

ER(ot ) = max(0, Ht−1 − Ht )

◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦

◦ ◦ ◦ ◦ ◦ ◦ ◦

.. .

.. .

◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦

◦ ◦ ◦

···

Conclusion

···

.. .

Introduction

Background

Complexity Metrics

Evaluation

Linear Mixed-Effects Regression =

+

+

···

Fixed effects e.g., surprisal, bigram Random effects e.g., by word, by subj Slope (Coeff) Confidence (t) HHMM surprisal & entropy rdn. — significant predictors Embedding difference — independent contribution

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Linear Mixed-Effects Regression =

+

Fixed effects e.g., surprisal, bigram Random effects e.g., by word, by subj Slope (Coeff) Confidence (t)

+

···

F ULL DATA Coefficient Std. Err. t-value (Intcpt) order rlength unigrm bigrm embdiff etrpyrd srprsl

-9.340·10−3 -3.746·10−5 -2.002·10−2 -8.090·10−2 -2.074·10+0 9.390·10−3 2.753·10−2 3.950·10−3

5.347·10−2 7.808·10−6 1.635·10−2 3.690·10−1 8.132·10−1 3.268·10−3 6.792·10−3 3.452·10−4

-0.175 -4.797∗ -1.225 -0.219 -2.551∗ 2.873∗ 4.052∗ 11.442∗

HHMM surprisal & entropy rdn. — significant predictors Embedding difference — independent contribution

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Linear Mixed-Effects Regression =

+

Fixed effects e.g., surprisal, bigram Random effects e.g., by word, by subj Slope (Coeff) Confidence (t)

+

···

F ULL DATA Coefficient Std. Err. t-value (Intcpt) order rlength unigrm bigrm embdiff etrpyrd srprsl

-9.340·10−3 -3.746·10−5 -2.002·10−2 -8.090·10−2 -2.074·10+0 9.390·10−3 2.753·10−2 3.950·10−3

5.347·10−2 7.808·10−6 1.635·10−2 3.690·10−1 8.132·10−1 3.268·10−3 6.792·10−3 3.452·10−4

-0.175 -4.797∗ -1.225 -0.219 -2.551∗ 2.873∗ 4.052∗ 11.442∗

HHMM surprisal & entropy rdn. — significant predictors Embedding difference — independent contribution

Introduction

Background

Complexity Metrics

Evaluation

Conclusion

Linear Mixed-Effects Regression =

+

Fixed effects e.g., surprisal, bigram Random effects e.g., by word, by subj Slope (Coeff) Confidence (t)

+

···

F ULL DATA Coefficient Std. Err. t-value (Intcpt) order rlength unigrm bigrm embdiff etrpyrd srprsl

-9.340·10−3 -3.746·10−5 -2.002·10−2 -8.090·10−2 -2.074·10+0 9.390·10−3 2.753·10−2 3.950·10−3

5.347·10−2 7.808·10−6 1.635·10−2 3.690·10−1 8.132·10−1 3.268·10−3 6.792·10−3 3.452·10−4

-0.175 -4.797∗ -1.225 -0.219 -2.551∗ 2.873∗ 4.052∗ 11.442∗

HHMM surprisal & entropy rdn. — significant predictors Embedding difference — independent contribution

Introduction

Background

Complexity Metrics

Conclusion Simple complexity metrics in HHMM Significant predictors Surprisal Entropy reduction Embedding difference

∴ Modeling bounded memory in a parser: quantitative broad-coverage ... while not hurting the parse

Evaluation

Conclusion

Introduction

Background

Complexity Metrics

Conclusion Simple complexity metrics in HHMM Significant predictors Surprisal −→ standout Entropy reduction Embedding difference

∴ Modeling bounded memory in a parser: quantitative broad-coverage ... while not hurting the parse

Evaluation

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Conclusion Simple complexity metrics in HHMM Significant predictors Surprisal −→ standout Entropy reduction Embedding difference

∴ Modeling bounded memory in a parser: ↑ short-term memory use quantitative broad-coverage ... while not hurting the parse

↑ linguistic complexity

Conclusion

Introduction

Background

Complexity Metrics

Thank you! William Schuler Tim Miller Brian Roark Mark Holland

I have moved to Mayo Clinic. [email protected]

Evaluation

Conclusion

Introduction

Background

Complexity Metrics

Evaluation

Results F ULL DATA Coefficient Std. Err. t-value (Intcpt) order rlength unigrm bigrm embdiff etrpyrd srprsl order rlength unigrm bigrm emdiff etrpyrd srprsl

± + + + +

O PEN t-value -0.237 -4.621∗ 0.554 -0.391 -3.248∗ 0.539 0.063 6.285∗

-9.340·10−3 5.347·10−2 -0.175 -3.746·10−5 7.808·10−6 -4.797∗ -2.002·10−2 1.635·10−2 -1.225 -8.090·10−2 3.690·10−1 -0.219 -2.074·10+0 8.132·10−1 -2.551∗ 9.390·10−3 3.268·10−3 2.873∗ 2.753·10−2 6.792·10−3 4.052∗ 3.950·10−3 3.452·10−4 11.442∗ (Intr) order rlngth ungrm bigrm emdiff entrpy .000 -.006 -.003 .049 .000 -.479 .001 .005 -.006 -.073 .000 .009 -.049 -.089 .095 .000 .003 .016 -.014 .020 -.010 .000 -.008 -.033 -.079 .107 .362 .171

C LOSED ± t-value - -0.794 - -4.232∗ - -0.865 - -0.644 - -2.645∗ + 3.082∗ + 4.857∗ + 9.286∗

Conclusion

An Incremental Approach for Collaborative Filtering in ...