Embedding Probabilistic Logic for Machine Reading Sebastian Riedel (University College London)


Overview Machine Reading & Reasoning … … with Probabilistic Logics and Embeddings Challenges Injecting Explanations Extracting Explanations


Machine Reading “Who works in London and is interested in NLP?


interest(x,NLP),! worksFor(x,y),

Relational DB

topic(Seb,NLP) worksFor(Seb,UCL)

[Kwiatkowski et al., 2013]

Narrow domain-specific schema

[Mintz et al., 2009]

Semantics Statistical NLP Syntax


”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 3

Machine Reading [Riedel et al., 2013] in(UCL,London)

“Who works in London and is interested in NLP?

works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL)

Relational DB

interest(x,NLP),! worksFor(x,y),! in(y,London)

Semantics Wide universal schema Syntax


Statistical NLP

”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 4

Semantics as Reasoning [Riedel et al., 2013] in(UCL,London)

“Who works in London and is interested in NLP? interest(x,NLP),! worksFor(x,y),! in(y,London)

works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL) worksFor(x,y): faculty-at(x,y) interest(x,y): works-in-area-of(x,y)[0.9]

Statistical Relational Learner and Reasoner

faculty-at(x,y): lecturer-at(x,y)

Wide universal schema Syntax


Statistical NLP

”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 5

Benefit: Transitive Reasoning in(UCL,London)

“Who works in London and is interested in NLP? interest(x,NLP),! worksFor(x,y),! in(y,London)

works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL) worksFor(x,y): faculty-at(x,y) interest(x,y): works-in-area-of(x,y)[0.9]

Statistical Relational Learner and Reasoner

faculty-at(x,y): lecturer-at(x,y)

Wide universal schema Syntax


Statistical NLP

”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 6

Benefit: More Coverage in(UCL,London)

“Who is faculty in London and interested in NLP? interest(x,NLP),! worksFor(x,y),! in(y,London)

works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL) worksFor(x,y): faculty-at(x,y) interest(x,y): works-in-area-of(x,y)[0.9]

Statistical Relational Learner and Reasoner

faculty-at(x,y): lecturer-at(x,y)

Wide universal schema Syntax


Statistical NLP

”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 7

Benefit: Code Reuse in(UCL,London)

“Who lives in London and is interested in NLP? interest(x,NLP),! worksFor(x,y),! in(y,London)

works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL) worksFor(x,y): faculty-at(x,y) interest(x,y): works-in-area-of(x,y)[0.9] livesIn(x,z): worksFor(x,y),! locatedIn(y,z) [0.6]

Statistical Relational Learner and Reasoner

[Lao et al., 2011]

Wide universal schema Syntax


Statistical NLP

”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 8

Reasoner and Learner Statistical Relational Learner and Reasoner

? 9

Probabilistic Logics Use (weighted) logics to define graphical models lecturer-at



Examples Markov Logic
 [Richardson and Domingos, 2006]

Bayesian Logic
 Programs [Kersting , 2007]


Probabilistic Logics Use (weighted) logics to define graphical models lecturer-at



Problems Inference Rule Learning


Matrix Factorization Think of database as a matrix or tensor lecturer-at





1 1 1



Matrix Factorization Embed entity (pairs) in low dimensional vector spaces lecturer-at





1 1 1

?? ??


?? ?? 13

Matrix Factorization Embed relations in low dimensional vector spaces




1 1 1

?? ??



? ?


? ?


? ?

?? 14

Matrix Factorization Find a matrix-matrix product that approximates observed DB




1 1 1

?? ??



? ?


? ?


? ?

?? 15

Matrix Factorization Or a non-linear function of this product



1 1 1




Matrix Factorization Low rank forces some 0 cells to become non-zero => prediction

1 1

1 .9

1 1

1 .9


[Nickel, Bordes, …] 17

Results for Relation Extraction [Riedel et al. 2013, NAACL] Averaged 11-point Precision/Recall 1 0.9 0.8


0.7 SU12 N F NF NFE

0.6 0.5 0.4 0.3 0.2 0.1 0










Challenge 1: Injecting Symbolic Rules

First-orde Formulae


8x, y : #2-unit-of-#1(x, y) ) organi Example: “Boeing and the Sikorsky Aircraf 8x, y : #1-city-in-#2(x, y) ) locati Example: “With 900,000 people, San Jose#1 “lecturers are employees!”



Figure 1: Injecting Logic into Matrix Factorization: G

entity-pairs P and predicates/relations R, matrix factori embeddings that approximate the observed matrix. In entities and relations to learn the embeddings such that th 19

Challenge 1: Injecting Symbolic Rules

“a liquid turns into a solid ! when its temperature is ! lowered below its freezing point




Some Experiments “Zero-shot” learning Given: a lot of relational data, but not for worksFor Goal: given few of worksFor rules, learn to predict worksFor

Results (in MAP for several relations) Only rules: 0.23 Apply rules after factorization: 0.34 Apply rules before factorization: 0.43 Incorporate rules into training objective: 0.52

[Rocktaeschel et al. 2014, SP14] 21



Challenge 1: Injecting Symbolic Rules

First-orde Formulae


8x, y : #2-unit-of-#1(x, y) ) organi Example: “Boeing and the Sikorsky Aircraf 8x, y : #1-city-in-#2(x, y) ) locati Example: “With 900,000 people, San Jose#1 “lecturers are employees!”



Figure 1: Injecting Logic into Matrix Factorization: G

entity-pairs P and predicates/relations R, matrix factori embeddings that approximate the observed matrix. In entities and relations to learn the embeddings such that th 22



Challenge 2: Extracting Explanations

First-orde Formulae


8x, y : #2-unit-of-#1(x, y) ) organi Example: “Boeing and the Sikorsky Aircraf 8x, y : #1-city-in-#2(x, y) ) locati Example: “With 900,000 people, San Jose#1 “lecturers are employees!”



Figure 1: Injecting Logic into Matrix Factorization: G

entity-pairs P and predicates/relations R, matrix factori embeddings that approximate the observed matrix. In entities and relations to learn the embeddings such that th 23



Challenge 2: Extracting Explanations

First-orde Formulae


8x, y : #2-unit-of-#1(x, y) ) organi Example: “Boeing and the Sikorsky Aircraf 8x, y : #1-city-in-#2(x, y) ) locati Example: “With 900,000 people, San Jose#1 “I returned Sebastian! because we know he is a lecturer! at UCL, which is in London,! so he most likely lives in London! …



Figure 1: Injecting Logic into Matrix Factorization: G

entity-pairs P and predicates/relations R, matrix factori embeddings that approximate the observed matrix. In [Thrun 1995, NIPS, Craven 1996, NIPS] that th entities and relations to learn the embeddings such 24

Summary Do semantics in a probabilistic relational reasoner Reasoner: matrix/tensor factorization (or other LV models) Challenges: inject explanations extract explanations

Do this for: deeper downstream tasks such as question answering, fact checking, machine comprehension We are hiring (thanks to the Paul G. Allen Foundation) 25



NIPS Learning Semantics 2014

Relational DB. Statistical NLP topic(Seb .... Use (weighted) logics to define graphical models. Probabilistic ... Embed relations in low dimensional vector spaces.

Verum Focus in Alternative Semantics
Jan 9, 2016 - The relevant empirical domain is also a matter of controversy. • The most ... free head features of Φ with the occupant of Φ (complementizer, finite verb). e. A feature ..... (33) A: I was wondering how much food to buy for tonight.