Embedding Probabilistic Logic for Machine Reading Sebastian Riedel (University College London)
1
Overview Machine Reading & Reasoning … … with Probabilistic Logics and Embeddings Challenges Injecting Explanations Extracting Explanations
2
Machine Reading “Who works in London and is interested in NLP?
in(UCL,London)
interest(x,NLP),! worksFor(x,y),
in(y,London)
Relational DB
topic(Seb,NLP) worksFor(Seb,UCL)
[Kwiatkowski et al., 2013]
Narrow domain-specific schema
[Mintz et al., 2009]
Semantics Statistical NLP Syntax
Coreference
”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 3
Machine Reading [Riedel et al., 2013] in(UCL,London)
“Who works in London and is interested in NLP?
works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL)
Relational DB
interest(x,NLP),! worksFor(x,y),! in(y,London)
Semantics Wide universal schema Syntax
Coreference
Statistical NLP
”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 4
Semantics as Reasoning [Riedel et al., 2013] in(UCL,London)
“Who works in London and is interested in NLP? interest(x,NLP),! worksFor(x,y),! in(y,London)
works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL) worksFor(x,y): faculty-at(x,y) interest(x,y): works-in-area-of(x,y)[0.9]
Statistical Relational Learner and Reasoner
faculty-at(x,y): lecturer-at(x,y)
Wide universal schema Syntax
Coreference
Statistical NLP
”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 5
Benefit: Transitive Reasoning in(UCL,London)
“Who works in London and is interested in NLP? interest(x,NLP),! worksFor(x,y),! in(y,London)
works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL) worksFor(x,y): faculty-at(x,y) interest(x,y): works-in-area-of(x,y)[0.9]
Statistical Relational Learner and Reasoner
faculty-at(x,y): lecturer-at(x,y)
Wide universal schema Syntax
Coreference
Statistical NLP
”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 6
Benefit: More Coverage in(UCL,London)
“Who is faculty in London and interested in NLP? interest(x,NLP),! worksFor(x,y),! in(y,London)
works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL) worksFor(x,y): faculty-at(x,y) interest(x,y): works-in-area-of(x,y)[0.9]
Statistical Relational Learner and Reasoner
faculty-at(x,y): lecturer-at(x,y)
Wide universal schema Syntax
Coreference
Statistical NLP
”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 7
Benefit: Code Reuse in(UCL,London)
“Who lives in London and is interested in NLP? interest(x,NLP),! worksFor(x,y),! in(y,London)
works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL) worksFor(x,y): faculty-at(x,y) interest(x,y): works-in-area-of(x,y)[0.9] livesIn(x,z): worksFor(x,y),! locatedIn(y,z) [0.6]
Statistical Relational Learner and Reasoner
[Lao et al., 2011]
Wide universal schema Syntax
Coreference
Statistical NLP
”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 8
Reasoner and Learner Statistical Relational Learner and Reasoner
? 9
Probabilistic Logics Use (weighted) logics to define graphical models lecturer-at
prof-at
works-for
Examples Markov Logic
[Richardson and Domingos, 2006]
Bayesian Logic
Programs [Kersting , 2007]
10
Probabilistic Logics Use (weighted) logics to define graphical models lecturer-at
prof-at
works-for
Problems Inference Rule Learning
11
Matrix Factorization Think of database as a matrix or tensor lecturer-at
prof-at
works-for
1
1
1 1 1
1
12
Matrix Factorization Embed entity (pairs) in low dimensional vector spaces lecturer-at
prof-at
works-for
1
1
1 1 1
?? ??
1
?? ?? 13
Matrix Factorization Embed relations in low dimensional vector spaces
1
1
lecturer-at
1 1 1
?? ??
1
??
? ?
prof-at
? ?
works-for
? ?
?? 14
Matrix Factorization Find a matrix-matrix product that approximates observed DB
1
1
lecturer-at
1 1 1
?? ??
1
⇡
??
⇥
? ?
prof-at
? ?
works-for
? ?
?? 15
Matrix Factorization Or a non-linear function of this product
1
1
1 1 1
1
⇡
sigmoid
⇥
16
Matrix Factorization Low rank forces some 0 cells to become non-zero => prediction
1 1
1 .9
1 1
1 .9
⇡
sigmoid
⇥
[Nickel, Bordes, …] 17
Results for Relation Extraction [Riedel et al. 2013, NAACL] Averaged 11-point Precision/Recall 1 0.9 0.8
Precision
0.7 SU12 N F NF NFE
0.6 0.5 0.4 0.3 0.2 0.1 0
0.2
0.4
0.6
0.8
1
Recall
18
Facts
|P|
Challenge 1: Injecting Symbolic Rules
First-orde Formulae
KB
8x, y : #2-unit-of-#1(x, y) ) organi Example: “Boeing and the Sikorsky Aircraf 8x, y : #1-city-in-#2(x, y) ) locati Example: “With 900,000 people, San Jose#1 “lecturers are employees!”
?
sigmoid
⇥
Figure 1: Injecting Logic into Matrix Factorization: G
entity-pairs P and predicates/relations R, matrix factori embeddings that approximate the observed matrix. In entities and relations to learn the embeddings such that th 19
Challenge 1: Injecting Symbolic Rules
“a liquid turns into a solid ! when its temperature is ! lowered below its freezing point
?
sigmoid
⇥
20
Some Experiments “Zero-shot” learning Given: a lot of relational data, but not for worksFor Goal: given few of worksFor rules, learn to predict worksFor
Results (in MAP for several relations) Only rules: 0.23 Apply rules after factorization: 0.34 Apply rules before factorization: 0.43 Incorporate rules into training objective: 0.52
[Rocktaeschel et al. 2014, SP14] 21
Facts
|P|
Challenge 1: Injecting Symbolic Rules
First-orde Formulae
KB
8x, y : #2-unit-of-#1(x, y) ) organi Example: “Boeing and the Sikorsky Aircraf 8x, y : #1-city-in-#2(x, y) ) locati Example: “With 900,000 people, San Jose#1 “lecturers are employees!”
?
sigmoid
⇥
Figure 1: Injecting Logic into Matrix Factorization: G
entity-pairs P and predicates/relations R, matrix factori embeddings that approximate the observed matrix. In entities and relations to learn the embeddings such that th 22
Facts
|P|
Challenge 2: Extracting Explanations
First-orde Formulae
KB
8x, y : #2-unit-of-#1(x, y) ) organi Example: “Boeing and the Sikorsky Aircraf 8x, y : #1-city-in-#2(x, y) ) locati Example: “With 900,000 people, San Jose#1 “lecturers are employees!”
?
sigmoid
⇥
Figure 1: Injecting Logic into Matrix Factorization: G
entity-pairs P and predicates/relations R, matrix factori embeddings that approximate the observed matrix. In entities and relations to learn the embeddings such that th 23
Facts
|P|
Challenge 2: Extracting Explanations
First-orde Formulae
KB
8x, y : #2-unit-of-#1(x, y) ) organi Example: “Boeing and the Sikorsky Aircraf 8x, y : #1-city-in-#2(x, y) ) locati Example: “With 900,000 people, San Jose#1 “I returned Sebastian! because we know he is a lecturer! at UCL, which is in London,! so he most likely lives in London! …
?
sigmoid
⇥
Figure 1: Injecting Logic into Matrix Factorization: G
entity-pairs P and predicates/relations R, matrix factori embeddings that approximate the observed matrix. In [Thrun 1995, NIPS, Craven 1996, NIPS] that th entities and relations to learn the embeddings such 24
Summary Do semantics in a probabilistic relational reasoner Reasoner: matrix/tensor factorization (or other LV models) Challenges: inject explanations extract explanations
Do this for: deeper downstream tasks such as question answering, fact checking, machine comprehension We are hiring (thanks to the Paul G. Allen Foundation) 25
Thanks
26