Word Context Entropy Kenneth Heafield Google Inc

January 16, 2008

Example code from Hadoop 0.13.1 used under the Apache License Version 2.0 and modified for presentation. Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

1 / 15

1

Problem Context Entropy

2

Implementation Streaming Entropy Reducer Sorting

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

2 / 15

Problem

Word Weighting Idea Measure how specific a word is Applications Query refinement Automatic tagging Example Specific Generic

airplane 5.0 a 9.8

Kenneth Heafield (Google Inc)

whistle 4.2 is 9.6

purple 5.3 any 8.7

pangolin 1.6 from 9.6

Word Context Entropy

January 16, 2008

3 / 15

Problem

Context

Neighbors Idea Non-specific words appear in random contexts. Example A bug in the code is worth two in the documentation. A complex system that works is invariably found to have evolved from a simple system that works. A computer scientist is someone who fixes things that aren’t broken. I’m still waiting for the advent of the computer science groupie. If I’d known computer science was going to be like this, I’d never have given up being a rock ’n’ roll star. A bug, complex, from, simple, computer, being, rock

Quotes copied from fortune computers file Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

4 / 15

Problem

Context

Neighbors Idea Non-specific words appear in random contexts. Example A bug in the code is worth two in the documentation. A complex system that works is invariably found to have evolved from a simple system that works. A computer scientist is someone who fixes things that aren’t broken. I’m still waiting for the advent of the computer science groupie. If I’d known computer science was going to be like this, I’d never have given up being a rock ’n’ roll star. A bug, complex, from, simple, computer, being, rock Computer A, scientist, the, science, known, science Quotes copied from fortune computers file Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

4 / 15

Problem

Context

Neighbors Idea Non-specific words appear in random contexts. Example A bug in the code is worth two in the documentation. A complex system that works is invariably found to have evolved from a simple system that works. A computer scientist is someone who fixes things that aren’t broken. I’m still waiting for the advent of the computer science groupie. If I’d known computer science was going to be like this, I’d never have given up being a rock ’n’ roll star. A bug, complex, from, simple, computer, being, rock Computer A, scientist, the, science, known, science Quotes copied from fortune computers file Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

4 / 15

Problem

Context

Context Distribution Cambridge

0.4 0.3

Ambiguous

0.2

Closer to equal

0.1 0

Boston

City

England

MA

University

Attleboro

0.4 0.3

Just a city

0.2

Spiked

0.1 0

Boston

City

England

Kenneth Heafield (Google Inc)

MA

University

Word Context Entropy

January 16, 2008

5 / 15

Problem

Entropy

Entropy

Definition Measures how uncertain a random Xvariable N is: Entropy (N) = − p (N = n) log2 p (N = n) n

Properties Minimized at 0 when only one outcome is possible Maximized at log2 k when k outcomes are equally probable

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

6 / 15

Problem

Entropy

Context Distribution Entropy Cambridge

0.4 0.3 0.2 0.1 0

Boston

City

England

MA

University

Attleboro

0.4 0.3 0.2 0.1 0

Boston

City

England

Kenneth Heafield (Google Inc)

MA

University

n Boston City England MA University Entropy

p 0.1 0.2 0.3 0.2 0.2

−p log2 p 0.332 0.464 0.521 0.464 0.464 2.246

x Boston City England MA University Entropy

p 0.2 0.3 0.1 0.3 0.1

−p log2 p 0.464 0.521 0.332 0.521 0.332 2.171

Word Context Entropy

January 16, 2008

7 / 15

Problem

Summary

Summary

Goal Measure how specific a word is Approach 1

Count the surrounding words

2

Normalize to make a probability distribution

3

Evaluate entropy

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

8 / 15

Implementation

All At Once Implementation Mapper outputs key word and value neighbor . Reducer

1 2 3

Counts each neighbor using a hash table. Normalizes counts. Computes entropy and outputs key word, value entropy .

Example Reduce Values Hash Table Normalize Entropy

City, Boston, City, MA, England, City, England City→ 3, Boston→ 1, MA→ 1, England→ 2 City→ 37 , Boston→ 17 , MA→ 71 , England→ 27 City→.523, Boston→.401, MA→.341, England→.401

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

9 / 15

Implementation

All At Once Implementation Mapper outputs key word and value neighbor . Reducer

1 2 3

Counts each neighbor using a hash table. Normalizes counts. Computes entropy and outputs key word, value entropy .

Example Reduce Values Hash Table Normalize Entropy

City, Boston, City, MA, England, City, England City→3, Boston→1, MA→1, England→2 3, 1, 1, 2 .523, .401, .341, .401

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

9 / 15

Implementation

All At Once Implementation Mapper outputs key word and value neighbor . Reducer

1 2 3

Counts each neighbor using a hash table. Normalizes counts. Computes entropy and outputs key word, value entropy .

Example Reduce Values Hash Table Normalize Entropy

City, Boston, City, MA, England, City, England City→3, Boston→1, MA→1, England→2 3, 1, 1, 2 .523, .401, .341, .401

Issues - Too many neighbors of “the” to fit in memory. Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

9 / 15

Implementation

Two Phases Implementation 1

Count Mapper outputs key (word, neighbor ) and empty value. Reducer counts values. Then it outputs key word and value count.

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

10 / 15

Implementation

Two Phases Implementation 1

Count Mapper outputs key (word, neighbor ) and empty value. Reducer counts values. Then it outputs key word and value count.

2

Entropy Mapper is Identity. All counts for word go one Reducer. Reducer buffers counts, normalizes, and computes entropy.

Issues + Entropy Reducer needs only counts in memory. - There can still be a lot of counts.

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

10 / 15

Implementation

Streaming Entropy

Streaming Entropy Observation Normalization and entropy can be Xcomputed simultaneously. Entropy (N) = − p (N = n) log2 p (N = n)

(1)

n

X c(n)

(log2 c(n) − log2 t) t n  X  c(n) = log2 t − log2 c(n) t n 1X = log2 t − c(n) log2 c(n) t n =−

(2) (3) (4)

Moral Provided counts c(n), Reducer need only remember total t and a sum. Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

11 / 15

Implementation

Streaming Entropy

Two Phases with Streaming Implementation 1

Count Mapper outputs key (word, neighbor ) and empty value. Reducer counts values. Then it outputs key word and value count.

2

Entropy Mapper is Identity. All counts for word go one Reducer. Reducer computes streaming entropy.

Issues + Constant memory Reducer.

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

12 / 15

Implementation

Streaming Entropy

Two Phases with Streaming Implementation 1

Count Mapper outputs key (word, neighbor ) and empty value. Reducer counts values. Then it outputs key word and value count.

2

Entropy Mapper is Identity. All counts for word go one Reducer. Reducer computes streaming entropy.

Issues + Constant memory Reducer. - Not enough disk to store counts thrice on HDFS.

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

12 / 15

Implementation

Word A Foo A A A Alice The A A

Reducer Sorting

Neighbor Plane Bar Bird Plane The Bob Problem Plane The

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

13 / 15

Implementation

Word A Foo A A A Alice The A A

Neighbor Plane Bar Bird Plane The Bob Problem Plane The

Kenneth Heafield (Google Inc)

Sort



Word A A A A A A Alice Foo The

Reducer Sorting

Neighbor Bird Plane Plane Plane The The Bob Bar Problem

Word Context Entropy

January 16, 2008

13 / 15

Implementation

Word A Foo A A A Alice The A A

Neighbor Plane Bar Bird Plane The Bob Problem Plane The

Kenneth Heafield (Google Inc)

Reducer Sorting

Word A

Neighbor Bird

Reducer Output →Reduce→A,1

A A A

Plane Plane Plane

→Reduce→A,3

A A

The The

→Reduce→A,2

Alice

Bob

→Reduce→Alice,1

Foo

Bar

→Reduce→Foo, 1

The

Problem

→Reduce→The, 1

Sort



Word Context Entropy

January 16, 2008

13 / 15

Implementation

Reducer Sorting

Word A

Neighbor Bird

Call

Output

A A A

Plane Plane Plane

A A

The The

→Reduce

Alice

Bob

→Reduce →(A;1,3,2)

→Reduce ↓(A;1)

Word A Foo A A A Alice The A A

Neighbor Plane Bar Bird Plane The Bob Problem Plane The

→Reduce ↓(A;1,3)

Sort



↓(A;1,3,2) ↓(Alice;1) Foo

Bar

→Reduce →(Alice;1) ↓(Foo;1)

The

Problem

→Reduce →(Foo;1) ↓(The;1)

close() Kenneth Heafield (Google Inc)

Word Context Entropy

→Close

→(The;1) January 16, 2008

13 / 15

Implementation

Reducer Sorting

Recall Streaming Entropy Observation Normalization and entropy can be Xcomputed simultaneously. Entropy (N) = − p (N = n) log2 p (N = n)

(5)

n

=−

X c(n)

(log2 c(n) − log2 t) t  X  c(n) log2 c(n) = log2 t − t n 1X c(n) log2 c(n) = log2 t − t n

(6)

n

(7) (8)

Moral Computing t and a sum can be done in parallel. Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

14 / 15

Implementation

Word A

Neighbor Bird

A A A

Plane Plane Plane

A A

The The

Alice

Bob

Call

Reducer Sorting

Output

→Reduce ↓(A;1)

→Reduce ↓(A;1,3)

→Reduce ↓(A;1,3,2)

→Reduce →(A;1,3,2) ↓(Alice;1)

Foo

Bar

→Reduce →(Alice;1) ↓(Foo;1)

The

Problem

→Reduce →(Foo;1) ↓(The;1)

close()

→Close

Kenneth Heafield (Google Inc)

→(The;1) Word Context Entropy

January 16, 2008

15 / 15

Implementation

Word A

Neighbor Bird

Call

A A A

Plane Plane Plane

A A

The The

→Reduce

Alice

Bob

→Reduce

Reducer Sorting

Output

→Reduce ↓(A,1,0)

→Reduce ↓(A,4,4.7)

↓(A,6,6.7) →(A,(6,6.7))

↓(Alice,1,0) Foo

Bar

→Reduce

→(Alice,(1,0))

↓(Foo,1,0) The

Problem

→Reduce

→(Foo,(1,0))

↓(The,1,0) close()

→Close

Kenneth Heafield (Google Inc)

→(The,(1,0)) Word Context Entropy

January 16, 2008

15 / 15

Word Context Entropy

Jan 16, 2008 - Problem. Word Weighting. Idea. Measure how specific a word is ... A computer scientist is someone who fixes things that aren't broken. I'm still ...

274KB Sizes 10 Downloads 377 Views

Recommend Documents

No documents