MARKOV LOGIC NASSLLI 2010 Mathias Niepert

MARKOV LOGIC: INTUITION A logical KB is a set of hard constraints on the set of possible worlds  Let’s make some of them soft constraints: When a world violates a formula, it becomes less probable, not impossible  Give each formula a weight (Higher weight  Stronger constraint) 

P(world)  exp  weights of formulas it satisfies

MARKOV LOGIC: DEFINITION 

A Markov Logic Network (MLN) is a set of pairs where

(Fi, wi)

Fi is a formula in first-order logic  wi is a real-valued weight 



Together with a finite set of constants C, it defines a Markov network with One binary node for each grounding of each predicate in the MLN. The value of the node is 1 if the ground atom is true, and 0 otherwise.  One feature for each grounding of each formula F in the MLN, with the corresponding weight wi 

LOG-LINEAR MODELS 

A distribution is a log-linear model over a Markov network H if it is associated with 



A set of features F = {f1(D1),…,fk(Dk)}, where each Di is a complete subgraph (clique) of H, A set of weights w1 ,…,wk such that

P (X1; :::Xn) =

1 Z

exp

hP k

i i=1 wi fi (Di )

ASSUMPTIONS 1.

2.

3.

Unique names. Different constants refer to different objects. (Genesereth & Nilsson, 1987) Domain closure. The only objects in the domain are those representable using the constant and function symbols (Genesereth & Nilsson, 1987) Known functions. For each function, the value of that function applied to every possible tuple of arguments is known, and is an element of C

EXAMPLE: FRIENDS & SMOKERS Smoking causes cancer. Friends have similar smoking habits.

EXAMPLE: FRIENDS & SMOKERS Smoking causes cancer. Friends have similar smoking habits.

x Smokes( x)  Cancer ( x) x, y Friends ( x, y )  Smokes( x)  Smokes( y) 

EXAMPLE: FRIENDS & SMOKERS Smoking causes cancer. Friends have similar smoking habits.

1 .5 1 .1

x Smokes( x)  Cancer ( x) x, y Friends ( x, y )  Smokes( x)  Smokes( y) 

EXAMPLE: FRIENDS & SMOKERS Smoking causes cancer. Friends have similar smoking habits.

1.5 x Smokes( x)  Cancer ( x) 1.1 x, y Friends ( x, y )  Smokes( x)  Smokes( y) 

Two constants: Anna (A) and Bob (B)

EXAMPLE: FRIENDS & SMOKERS 1.5 x Smokes( x )  Cancer ( x ) 1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 

Two constants: Anna (A) and Bob (B)

Smokes(A)

Cancer(A)

Smokes(B)

Cancer(B)

EXAMPLE: FRIENDS & SMOKERS 1.5 x Smokes( x )  Cancer ( x ) 1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 

Two constants: Anna (A) and Bob (B) Friends(A,B)

Friends(A,A)

Smokes(A)

Smokes(B)

Cancer(A)

Friends(B,B)

Cancer(B) Friends(B,A)

EXAMPLE: FRIENDS & SMOKERS 1.5 x Smokes( x)  Cancer ( x) 1.1 x, y Friends ( x, y )  Smokes( x)  Smokes( y) 

Two constants: Anna (A) and Bob (B) Friends(A,B)

Friends(A,A)

Smokes(A)

Smokes(B)

Cancer(A)

Friends(B,B)

Cancer(B) Friends(B,A)

EXAMPLE: FRIENDS & SMOKERS 1.5 x Smokes( x )  Cancer ( x ) 1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 

Two constants: Anna (A) and Bob (B) Friends(A,B)

Friends(A,A)

Smokes(A)

Smokes(B)

Cancer(A)

Friends(B,B)

Cancer(B) Friends(B,A)

MARKOV LOGIC NETWORKS MLN is template for ground Markov networks  Probability of a world x: 

1   P( x)  exp   wi ni ( x)  Z  i  Weight of formula i

No. of true groundings of formula i in x

Typed variables and constants greatly reduce size of ground Markov net  Functions, existential quantifiers, etc.  Infinite and continuous domains are possible 

EXAMPLE: FRIENDS & SMOKERS 1.5 x Smokes( x)  Cancer ( x) 1.1 x, y Friends ( x, y )  Smokes( x)  Smokes( y) 

Two constants: Anna (A) and Bob (B) 1.5 : Smokes( A)  Cancer ( A) 1.5 : Smokes( B)  Cancer ( B)

1.1 : Friends ( A, A)  Smokes( A)  Smokes( A)  1.1 : Friends ( A, B)  Smokes( A)  Smokes( B)  1.1 : Friends ( B, A)  Smokes( B)  Smokes( A)  1.1 : Friends ( B, B)  Smokes( B)  Smokes( B) 

P( S ( A)  T , S ( B)  F , F ( A, A)  T , F ( A, B)  T , F ( B, A)  F , F ( B, B)  T , C ( A)  F , C ( B)  T ) 

1   1 exp   wi ni ( x)   exp 1.5 *1  1.1* 3 Z  i  Z

RELATION TO FIRST-ORDER LOGIC Infinite weights  First-order logic  Satisfiable KB, positive weights  Satisfying assignments = Modes of distribution  Markov logic allows inconsistencies (contradictions between formulas) 

MAP INFERENCE IN MARKOV LOGIC NETWORKS 

Problem: Find most likely state of world y given evidence e

arg max P( y | e) y

Query

Evidence

MAP INFERENCE 

Problem: Find most likely state of world y given evidence e

1   arg max exp   wi ni ( y, e)  Ze y  i  ni is the feature corresponding to formula Fi  wi is the weight corresponding to formula Fi 

MAP INFERENCE 

Problem: Find most likely state of world y given evidence e

  arg max   wi ni ( y, e)  y  i  ni is the feature corresponding to formula Fi  wi is the weight corresponding to formula Fi 

MAP INFERENCE 1.5 x Smokes( x )  Cancer ( x ) 1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 

Two constants: Anna (A) and Bob (B) Evidence: Friends(A,B), Friends(B,A), Smokes(B) true Friends(A,B) ? Friends(A,A)

Cancer(A)

? Smokes(A)

true Smokes(B)

? Friends(B,A)

true

? Friends(B,B)

Cancer(B)

?

MAP INFERENCE 1.5 x Smokes( x )  Cancer ( x ) 1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 

Two constants: Anna (A) and Bob (B) Evidence: Friends(A,B), Friends(B,A), Smokes(B) true Friends(A,B) false Friends(A,A)

Cancer(A)

true Smokes(A)

true Smokes(B)

true Friends(B,A)

true

false Friends(B,B)

Cancer(B)

true

MAP INFERENCE 

Problem: Find most likely state of world given evidence e

arg max y

 w n ( y, e) i i

i

This is the weighted MAX-SAT problem  Use weighted MAX-SAT solver (e.g. MaxWalkSAT)  Better: Integer Linear Programming 

MAP INFERENCE 1.5 x Smokes( x )  Cancer ( x ) 1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 

Two constants: Anna (A) and Bob (B) Evidence: Friends(A,B), Friends(B,A), Smokes(B) :Smokes(A) _ Cancer(A) 1.5 :Smokes(B) _ Cancer(B) 1.5 :Smokes(A) _ Cancer(B) 1.5 :Smokes(B) _ Cancer(A) 1.5 :Friends(A,B) _ :Smokes(A) _ Smokes(B) 0.55 :Friends(A,B) _ :Smokes(B) _ Smokes(A) 0.55 …

RELATION TO STATISTICAL MODELS  Special       

cases:

Markov networks Bayesian networks Log-linear models Exponential models Max. entropy models Hidden Markov models …

 Obtained

by making all predicates zeroarity

Every probability distribution over discrete or finiteprecision numeric variables can be represented as a Markov logic network.

MARKOV LOGIC 

Declarative language, several challenges Inference  Weight Learning  Structure Learning 



Many ways to perform probabilistic inference Conditional probability query  MAP query 

There’s a large body of work on probabilistic inference in graphical models  We’ll talk about some of these methods and how they can be put to work in Markov logic networks 

INFERENCE IN GRAPHICAL MODELS 

Conditional probability queries P(Y|E=e) where Y µX and E µX and Y and E disjoint

P (YjE = e) =



Let W = X – Y – E be the variables that are neither query nor evidence, then

P(y; e) = 

P (Y;e) P (e)

P

w P(y; e; w)

P(e) can be computed reusing the previous computation P

P (e) =

y P (y; e)

COMPLEXITY OF INFERENCE The process of “summing out” the joint not satisfactory as it leads to the  Exponential blowup that the graphical model representation was supposed to avoid  Problem: Exponential blow-up in the worst case is unavoidable   Worse: Approximate inference is also NP-hard  But: We really care about the cases that we encounter in practice not the worst-case 

COMPLEXITY OF INFERENCE Theoretical analysis can focus on Bayesian networks as they can be converted to Markov networks with no increase in representation size  First question: How do we encode a BN? 

 

Decision Problem BN-Pr-DP: 



DAG structure + worst-case representation of each CPD as a table of size |Val({X_i} [ PaXi )| Given a BN B over X, a variable X 2 X, and a value x 2 Val(X), decide whether PB (X=x) > 0

BN-Pr-DP is NP-complete

COMPLEXITY OF INFERENCE 

BN-Pr-DP is in NP: We guess an assignment x to the network variables  We check whether X=x holds in x and whether P(x)>0  The latter can be accomplished in linear time using the chain rule of BNs 



BN-Pr-DP is NP-hard: Reduction from 3-SAT  Given any 3-SAT formula f we can create a Bayesian network B with some distinguished binary variable X such that f is satisfiable if and only if PB(X=x1)>0  The BN has to be constructible in polynomial time 

COMPLEXITY OF INFERENCE

For each prop. variable qi one root node Qi with P(qi1)=0.5  For each clause ci one node Ci with edge from Qi to Cj if qi or ¬qi occurs in the clause cj 

COMPLEXITY OF INFERENCE

c1 = q1 _ :q3

c10

c11

q10 , q30

0

1

q11 , q30

0

1

q10 , q31

1

0

q11 , q31

0

1

COMPLEXITY OF INFERENCE

We cannot connect the Ci’s (i=1,…,m) directly to the variable X as the CPD for X would be exponentially large  We introduce m-2 AND nodes 

COMPLEXITY OF INFERENCE

Now, X has value 1 if and only if all of the clauses are satisfied  All nodes have at most three parents and, therefore, the size of the BN is polynomial in the size of f 

COMPLEXITY OF INFERENCE

Prior probability of each possible assignment is 1/2n  P(X=x1) = number of satisfying assignments to f divided by 2n  f has a satisfying assignment iff P(x1) > 0 

COMPLEXITY OF INFERENCE Probabilistic inference is a numerical problem not a decision problem  We can use a similar construction to show the following problem is #P-complete 



Given a BN B over X, a variable X 2 X, and a value x 2 Val(X), compute PB (X=x)

We have to do a weighted count of instantiations  #P is the set of the counting problems associated with the decision problems in the set NP  #P problem must be at least as hard as the corresponding NP problem 

COMPLEXITY OF APPROXIMATE INFERENCE Goal is to compute P(Y|e)  An estimate r has relative error e for P(y|e) if: 

r 1 e 

 P( y | e)  r (1  e )

We can use a similar construction again to show the following problem is NP-hard 

Given a BN B over X, a variable X 2 X, and a value x 2 Val(X), find a number r that has relative error e for PB (X=x)

COMPLEXITY OF APPROXIMATE INFERENCE Goal is to compute P(Y|e)  An estimate r has absolute error e for P(y|e) if: 

| P( y | e)  r | e Computing P(X=x) up to some absolute error r has a randomized polynomial time algorithm  However, when evidence is introduced, we’re back to NP-hardness  Following problem is NP-hard for any e 2 (0,1/2) 



Given a BN B over X, a variable X 2 X, a value x 2 Val(X), and an observation E=e, find a number r that has absolute error e for PB (X=x|e)

MONTE CARLO PRINCIPLE ? 







Consider the game of solitaire: What’s the probability of winning a game? Hard to compute analytically because winning or losing depends on a complex procedure of reorganizing cards Let’s play a few hands, and see empirically how many do in fact win Idea: Approximate a probability distribution using only samples from that distribution

Lose Lose Lose Win Lose Chance of winning is 1 in 5!

SAMPLING FROM A BAYESIAN NETWORK 

Generate samples (particles) from a Bayesian network using a random number generator i0 normal i1 high

d0

easy d1 difficult

g1 A g2 B g3 C

1 1

l0 weak l1 strong 3

s0 low s1 high

SAMPLING FROM A BAYESIAN NETWORK 

Generate samples (particles) from a Bayesian network using a random number generator i0 normal i1 high

d0

easy d1 difficult

g1 A g2 B g3 C

1 1

l0 weak l1 strong 3

s0 low s1 high

SAMPLING FROM A BAYESIAN NETWORK 

Generate samples (particles) from a Bayesian network using a random number generator i0 normal i1 high

d0

easy d1 difficult

g1 A g2 B g3 C

1 1

l0 weak l1 strong 3

s0 low s1 high

SAMPLING FROM A BAYESIAN NETWORK 

Generate samples (particles) from a Bayesian network using a random number generator i0 normal i1 high

d0

easy d1 difficult

g1 A g2 B g3 C

1 1

l0 weak l1 strong 3

s0 low s1 high

SAMPLING FROM A BAYESIAN NETWORK 

Generate samples (particles) from a Bayesian network using a random number generator i0 normal i1 high

d0

easy d1 difficult

g1 A g2 B g3 C

1 1

l0 weak l1 strong 3

s0 low s1 high

SAMPLING One sample can be computed in linear time  Sampling process generates a set of particles D = {x[1],…,x[M]}  When computing P(y), the estimate is simply the fraction of particles in which y “was seen” 

P^D =

1 M

PM

m=1 1fy[m]

= yg

with 1 the indicator function and y[m] the assignment to Y in particle x[m]

EXAMPLE: BAYESIAN NETWORK INFERENCE  





Suppose we have a Bayesian network with variables X Our state space is the set of all possible assignments of values to variables We can draw a sample in time that is linear in the size of the network Draw N samples, use them to approximate the joint

1st sample: D=d0,I=i1,G=g2,S=s0, L=l1 2nd sample: D=d1,I=i1,G=g1,S=s1, L=l1 …

REJECTION SAMPLING 







Suppose we have a Bayesian network with variables X

We wish to condition on some evidence E=e and compute the posterior over Y=X-E

Draw samples and reject them when not compatible with evidence e Inefficient if the evidence is itself improbable  we must reject a large number of samples

1st sample: D=d0,I=i1,G=g2,S=s0, L=l1 2nd sample: D=d1,I=i1,G=g1,S=s1, L=l1 …

reject accept

SAMPLING IN MARKOV LOGIC NETWORKS Sampling is performed on the ground Markov logic network  Alchemy uses a variant of the MCMC (Markov Chain Monte Carlo) method  Can answer arbitrary queries of the form P(Fi|MLNC,L)  Example: P(Cancer(Alice)|MLNC,L) 

MAP INFERENCE IN GRAPHICAL MODELS 

The following problem is NP-complete: 



Given a BN B over X and a number t, decide whether there exists an assignment x to X such that P(x) > t.

There exist several algorithms for MAP inference with reasonable performance on most practical problems

MAP INFERENCE IN MARKOV LOGIC NETWORKS 

Problem: Find most likely state of world y given evidence e

  arg max   wi ni ( y, e)  y  i  ni is the feature corresponding to formula Fi  wi is the weight corresponding to formula Fi 

MAP INFERENCE IN MARKOV LOGIC NETWORKS 

Problem: Find most likely state of world given evidence e

arg max y

 w n ( y, e) i i

i

This is the weighted MAX-SAT problem  Use weighted MAX-SAT solver (e.g. MaxWalkSAT)  Better: Integer Linear Programming 

THE MAXWALKSAT ALGORITHM for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if ∑ weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes ∑ weights(sat. clauses) return failure, best solution found

MAP INFERENCE IN MARKOV LOGIC NETWORKS We’ve tried Alchemy (and MaxWalkSAT) and the results were poor  Better results with integer linear programming (ILP)  ILP performs exact inference  Works very well on the problems we are concerned with  Originated in the field of operations research 

LINEAR PROGRAMMING A linear programming problem is the problem of maximizing (or minimizing) a linear function subject to a finite number of linear constraints  Standard form of linear programming: 

n

maximize

c x j

j 1 n

subject to

a x j 1

ij

xj

j

j

 bi

(i



( j  1, 2, ...,

0

 1, 2, ..., m) n)

INTEGER LINEAR PROGRAMMING An integer linear programming problem is the problem of maximizing (or minimizing) a linear function subject to a finite number of linear constraints  Difference to LP: Variables only allowed to have integer values 

n

maximize

c x j

j 1

subject to

n

a x j 1

ij

xj

j

j

 bi

(i



( j  1, 2, ...,

0

 1, 2, ..., m) n)

x j  {...,1,0,1,...} 54

MAP INFERENCE 1.5 x Smokes( x )  Cancer ( x ) 1.1 x, y Friends ( x, y )  Smokes( x )  Smokes( y ) 

Two constants: Anna (A) and Bob (B) Evidence: Friends(A,B), Friends(B,A), Smokes(B) :Smokes(A) _ Cancer(A) 1.5 :Smokes(B) _ Cancer(B) 1.5 :Smokes(A) _ Cancer(B) 1.5 :Smokes(B) _ Cancer(A) 1.5 :Friends(A,B) _ :Smokes(A) _ Smokes(B) 0.55 :Friends(A,B) _ :Smokes(B) _ Smokes(A) 0.55 …

MAP INFERENCE - EXAMPLE :Smokes(A) _ Cancer(A) 1.5

Introduce new variable for each ground atom: sa , ca Introduce new variable for each formula: xj Add the following three constraints:  sa + xj ¸ 1  -ca + xj ¸ 0  -xj - sa + ca ¸ -1 Add 1,5xj to the objective function

n

maximize

c x j 1

subject to

j

j

j

 bi

n

a x j 1

ij

x j  {0,1}

(i  1, 2, ..., m)

TOMORROW Ontology Matching with Markov Logic  Weight Learning  Experiments 

Markov Logic Networks

A Markov Logic Network (MLN) is a set of pairs. (F i. , w i. ) where. F i is a formula in first-order logic. w ... A distribution is a log-linear model over a. Markov network H if it is associated with. A set of features F = {f. 1. (D ..... MONTE CARLO PRINCIPLE. ○ Consider the game of solitaire: What's the probability of winning.

513KB Sizes 4 Downloads 210 Views

Recommend Documents

Markov Logic
MAP INFERENCE IN MARKOV LOGIC. NETWORKS. ○ We've tried Alchemy (MaxWalkSAT) with poor results. ○ Better results with integer linear programming.

Tractable Learning of Liftable Markov Logic Networks - UCLA CS
Each clique of variables Xk in the graph has an associated potential function ... As wi increases, so does the strength of the constraint Fi imposed on the world.

Markov Logic Networks for Natural Language ... - Ashish Sabharwal
Computer Sci. and Engr. University of Washington. Seattle ... Markov Logic Network (MLN) is a formal probabilistic in- ference framework that ... S, KB]. This is a Partial MAP computation, in general #P- hard (Park 2002). Hence methods such as Intege

Tractable Learning of Liftable Markov Logic ... - Lirias - KU Leuven
learning task is to learn the weights associated with each formula in a given theory. This is an analogous problem to that of learning feature weights in a log-linear propositional. Markov network. For the structure learning task, one also learns the

Web-Scale Knowledge Inference Using Markov Logic ...
web-scale MLN inference by designing a novel ... this problem by using relational databases and task ... rithms in SQL that applies MLN clauses in batches.

Using Markov Logic Network for On-line Activity ...
is dressing up in the morning (in that case, the best action could be to provide high intensity light .... command), to ease social inclusion and to provide security reassurance by de- tecting situations of ... the home automation system transmitted

Direct Learning of Sparse Changes in Markov Networks ...
Through experiments on gene expression and Twitter data analysis, we demonstrate the ... learning and data mining, because it provides useful insights into underlying mechanisms ...... Journal of the Royal Statistical Society: Series B (Statis-.

Discovering Order in Unordered Datasets: Generative Markov Networks
Generative Markov Networks with fine-tuning (ours). Basic. Generative. Nonparametric. 48.87±1.10 that our model aims at discovering datasets' orders, while other iterative sampling models (Bengio et al., 2013, 2014; Sohl-Dickstein et al., 2015; Bord

Reed-Muller universal logic module networks - Semantic Scholar
Downloaded on October 19, 2008 at 06:58 from IEEE Xplore. Restrictions apply. .... ables i which have not previously been selected (i.e. not ... p,(m) = 0 Vm 4 {il,.

Reed-Muller universal logic module networks
Reed-Muller universal logic module. A RM-ULM with c-control (selection) inputs, is a device with 2' data inputs and a single output, abbreviated from now on as RM-ULM(c). The symbol of the RM-ULM(c) is given in Fig. 1. The output of the device is as

Markov Bargaining Games
apply mutatis mutandis to any irreducible component of the Markov chain. A sufficient condition for the. Markov chain to be aperiodic is for πii > 0 ... subgame perfect equilibrium. 4It is not necessary for the discount factors to be strictly less t

markov chain pdf
File: Markov chain pdf. Download now. Click here if your download doesn't start automatically. Page 1 of 1. markov chain pdf. markov chain pdf. Open. Extract.

Markov Bargaining Games
I am grateful to Klaus Schmidt,. Avner Shaked and to the University of Bonn for their hospitality and support whilst I completed this ..... define b(i) and B(i) to be the bounds on the buyer's equilibrium payoffs when he is the proposer in a subgame

Hidden Markov Models - Semantic Scholar
A Tutorial for the Course Computational Intelligence ... “Markov Models and Hidden Markov Models - A Brief Tutorial” International Computer Science ...... Find the best likelihood when the end of the observation sequence t = T is reached. 4.

Hidden Markov Models - Semantic Scholar
Download the file HMM.zip1 which contains this tutorial and the ... Let's say in Graz, there are three types of weather: sunny , rainy , and foggy ..... The transition probabilities are the probabilities to go from state i to state j: ai,j = P(qn+1 =

Finite discrete Markov process clustering
Sep 4, 1997 - Microsoft Research. Advanced Technology Division .... about the process clustering that is contained in the data. However, this distribution is ...

Finite discrete Markov process clustering
Sep 4, 1997 - about the process clustering that is contained in the data. ..... Carlin, Stern, and Rubin, Bayesian Data Analysis, Chapman & Hall, 1995. 2.

Bayesian Variable Order Markov Models
ference on Artificial Intelligence and Statistics (AISTATS). 2010, Chia Laguna .... over the set of active experts M(x1:t), we obtain the marginal probability of the ...

Word Confusability --- Measuring Hidden Markov Model Similarity
240–243. [6] Markus Falkhausen, Herbert Reininger, and Dietrich Wolf,. “Calculation of distance measures between hidden Markov models,” in Proceedings of ...

Lumping Markov Chains with Silent Steps
a method for the elimination of silent (τ) steps in Markovian process ...... [8] M. Bravetti, “Real time and stochastic time,” in Formal Methods for the. Design of ...

Semiparametric Estimation of Markov Decision ...
Oct 12, 2011 - procedure generalizes the computationally attractive methodology of ... pecially in the recent development of the estimation of dynamic games. .... distribution of εt ensures we can apply Hotz and Miller's inversion theorem.