Word Context Entropy Kenneth Heafield Google Inc
January 16, 2008
Example code from Hadoop 0.13.1 used under the Apache License Version 2.0 and modified for presentation. Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
1 / 15
1
Problem Context Entropy
2
Implementation Streaming Entropy Reducer Sorting
Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
2 / 15
Problem
Word Weighting Idea Measure how specific a word is Applications Query refinement Automatic tagging Example Specific Generic
airplane 5.0 a 9.8
Kenneth Heafield (Google Inc)
whistle 4.2 is 9.6
purple 5.3 any 8.7
pangolin 1.6 from 9.6
Word Context Entropy
January 16, 2008
3 / 15
Problem
Context
Neighbors Idea Non-specific words appear in random contexts. Example A bug in the code is worth two in the documentation. A complex system that works is invariably found to have evolved from a simple system that works. A computer scientist is someone who fixes things that aren’t broken. I’m still waiting for the advent of the computer science groupie. If I’d known computer science was going to be like this, I’d never have given up being a rock ’n’ roll star. A bug, complex, from, simple, computer, being, rock
Quotes copied from fortune computers file Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
4 / 15
Problem
Context
Neighbors Idea Non-specific words appear in random contexts. Example A bug in the code is worth two in the documentation. A complex system that works is invariably found to have evolved from a simple system that works. A computer scientist is someone who fixes things that aren’t broken. I’m still waiting for the advent of the computer science groupie. If I’d known computer science was going to be like this, I’d never have given up being a rock ’n’ roll star. A bug, complex, from, simple, computer, being, rock Computer A, scientist, the, science, known, science Quotes copied from fortune computers file Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
4 / 15
Problem
Context
Neighbors Idea Non-specific words appear in random contexts. Example A bug in the code is worth two in the documentation. A complex system that works is invariably found to have evolved from a simple system that works. A computer scientist is someone who fixes things that aren’t broken. I’m still waiting for the advent of the computer science groupie. If I’d known computer science was going to be like this, I’d never have given up being a rock ’n’ roll star. A bug, complex, from, simple, computer, being, rock Computer A, scientist, the, science, known, science Quotes copied from fortune computers file Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
4 / 15
Problem
Context
Context Distribution Cambridge
0.4 0.3
Ambiguous
0.2
Closer to equal
0.1 0
Boston
City
England
MA
University
Attleboro
0.4 0.3
Just a city
0.2
Spiked
0.1 0
Boston
City
England
Kenneth Heafield (Google Inc)
MA
University
Word Context Entropy
January 16, 2008
5 / 15
Problem
Entropy
Entropy
Definition Measures how uncertain a random Xvariable N is: Entropy (N) = − p (N = n) log2 p (N = n) n
Properties Minimized at 0 when only one outcome is possible Maximized at log2 k when k outcomes are equally probable
Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
6 / 15
Problem
Entropy
Context Distribution Entropy Cambridge
0.4 0.3 0.2 0.1 0
Boston
City
England
MA
University
Attleboro
0.4 0.3 0.2 0.1 0
Boston
City
England
Kenneth Heafield (Google Inc)
MA
University
n Boston City England MA University Entropy
p 0.1 0.2 0.3 0.2 0.2
−p log2 p 0.332 0.464 0.521 0.464 0.464 2.246
x Boston City England MA University Entropy
p 0.2 0.3 0.1 0.3 0.1
−p log2 p 0.464 0.521 0.332 0.521 0.332 2.171
Word Context Entropy
January 16, 2008
7 / 15
Problem
Summary
Summary
Goal Measure how specific a word is Approach 1
Count the surrounding words
2
Normalize to make a probability distribution
3
Evaluate entropy
Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
8 / 15
Implementation
All At Once Implementation Mapper outputs key word and value neighbor . Reducer
1 2 3
Counts each neighbor using a hash table. Normalizes counts. Computes entropy and outputs key word, value entropy .
Example Reduce Values Hash Table Normalize Entropy
City, Boston, City, MA, England, City, England City→ 3, Boston→ 1, MA→ 1, England→ 2 City→ 37 , Boston→ 17 , MA→ 71 , England→ 27 City→.523, Boston→.401, MA→.341, England→.401
Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
9 / 15
Implementation
All At Once Implementation Mapper outputs key word and value neighbor . Reducer
1 2 3
Counts each neighbor using a hash table. Normalizes counts. Computes entropy and outputs key word, value entropy .
Example Reduce Values Hash Table Normalize Entropy
City, Boston, City, MA, England, City, England City→3, Boston→1, MA→1, England→2 3, 1, 1, 2 .523, .401, .341, .401
Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
9 / 15
Implementation
All At Once Implementation Mapper outputs key word and value neighbor . Reducer
1 2 3
Counts each neighbor using a hash table. Normalizes counts. Computes entropy and outputs key word, value entropy .
Example Reduce Values Hash Table Normalize Entropy
City, Boston, City, MA, England, City, England City→3, Boston→1, MA→1, England→2 3, 1, 1, 2 .523, .401, .341, .401
Issues - Too many neighbors of “the” to fit in memory. Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
9 / 15
Implementation
Two Phases Implementation 1
Count Mapper outputs key (word, neighbor ) and empty value. Reducer counts values. Then it outputs key word and value count.
Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
10 / 15
Implementation
Two Phases Implementation 1
Count Mapper outputs key (word, neighbor ) and empty value. Reducer counts values. Then it outputs key word and value count.
2
Entropy Mapper is Identity. All counts for word go one Reducer. Reducer buffers counts, normalizes, and computes entropy.
Issues + Entropy Reducer needs only counts in memory. - There can still be a lot of counts.
Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
10 / 15
Implementation
Streaming Entropy
Streaming Entropy Observation Normalization and entropy can be Xcomputed simultaneously. Entropy (N) = − p (N = n) log2 p (N = n)
(1)
n
X c(n)
(log2 c(n) − log2 t) t n X c(n) = log2 t − log2 c(n) t n 1X = log2 t − c(n) log2 c(n) t n =−
(2) (3) (4)
Moral Provided counts c(n), Reducer need only remember total t and a sum. Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
11 / 15
Implementation
Streaming Entropy
Two Phases with Streaming Implementation 1
Count Mapper outputs key (word, neighbor ) and empty value. Reducer counts values. Then it outputs key word and value count.
2
Entropy Mapper is Identity. All counts for word go one Reducer. Reducer computes streaming entropy.
Issues + Constant memory Reducer.
Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
12 / 15
Implementation
Streaming Entropy
Two Phases with Streaming Implementation 1
Count Mapper outputs key (word, neighbor ) and empty value. Reducer counts values. Then it outputs key word and value count.
2
Entropy Mapper is Identity. All counts for word go one Reducer. Reducer computes streaming entropy.
Issues + Constant memory Reducer. - Not enough disk to store counts thrice on HDFS.
Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
12 / 15
Implementation
Word A Foo A A A Alice The A A
Reducer Sorting
Neighbor Plane Bar Bird Plane The Bob Problem Plane The
Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
13 / 15
Implementation
Word A Foo A A A Alice The A A
Neighbor Plane Bar Bird Plane The Bob Problem Plane The
Kenneth Heafield (Google Inc)
Sort
→
Word A A A A A A Alice Foo The
Reducer Sorting
Neighbor Bird Plane Plane Plane The The Bob Bar Problem
Word Context Entropy
January 16, 2008
13 / 15
Implementation
Word A Foo A A A Alice The A A
Neighbor Plane Bar Bird Plane The Bob Problem Plane The
Kenneth Heafield (Google Inc)
Reducer Sorting
Word A
Neighbor Bird
Reducer Output →Reduce→A,1
A A A
Plane Plane Plane
→Reduce→A,3
A A
The The
→Reduce→A,2
Alice
Bob
→Reduce→Alice,1
Foo
Bar
→Reduce→Foo, 1
The
Problem
→Reduce→The, 1
Sort
→
Word Context Entropy
January 16, 2008
13 / 15
Implementation
Reducer Sorting
Word A
Neighbor Bird
Call
Output
A A A
Plane Plane Plane
A A
The The
→Reduce
Alice
Bob
→Reduce →(A;1,3,2)
→Reduce ↓(A;1)
Word A Foo A A A Alice The A A
Neighbor Plane Bar Bird Plane The Bob Problem Plane The
→Reduce ↓(A;1,3)
Sort
→
↓(A;1,3,2) ↓(Alice;1) Foo
Bar
→Reduce →(Alice;1) ↓(Foo;1)
The
Problem
→Reduce →(Foo;1) ↓(The;1)
close() Kenneth Heafield (Google Inc)
Word Context Entropy
→Close
→(The;1) January 16, 2008
13 / 15
Implementation
Reducer Sorting
Recall Streaming Entropy Observation Normalization and entropy can be Xcomputed simultaneously. Entropy (N) = − p (N = n) log2 p (N = n)
(5)
n
=−
X c(n)
(log2 c(n) − log2 t) t X c(n) log2 c(n) = log2 t − t n 1X c(n) log2 c(n) = log2 t − t n
(6)
n
(7) (8)
Moral Computing t and a sum can be done in parallel. Kenneth Heafield (Google Inc)
Word Context Entropy
January 16, 2008
14 / 15
Implementation
Word A
Neighbor Bird
A A A
Plane Plane Plane
A A
The The
Alice
Bob
Call
Reducer Sorting
Output
→Reduce ↓(A;1)
→Reduce ↓(A;1,3)
→Reduce ↓(A;1,3,2)
→Reduce →(A;1,3,2) ↓(Alice;1)
Foo
Bar
→Reduce →(Alice;1) ↓(Foo;1)
The
Problem
→Reduce →(Foo;1) ↓(The;1)
close()
→Close
Kenneth Heafield (Google Inc)
→(The;1) Word Context Entropy
January 16, 2008
15 / 15
Implementation
Word A
Neighbor Bird
Call
A A A
Plane Plane Plane
A A
The The
→Reduce
Alice
Bob
→Reduce
Reducer Sorting
Output
→Reduce ↓(A,1,0)
→Reduce ↓(A,4,4.7)
↓(A,6,6.7) →(A,(6,6.7))
↓(Alice,1,0) Foo
Bar
→Reduce
→(Alice,(1,0))
↓(Foo,1,0) The
Problem
→Reduce
→(Foo,(1,0))
↓(The,1,0) close()
→Close
Kenneth Heafield (Google Inc)
→(The,(1,0)) Word Context Entropy
January 16, 2008
15 / 15