Causal Modelling and Probabilistic Causation in ...

Viewer
Transcript

CENTRIA Departamento de Informática Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa

Causal Modelling and Probabilistic Causation in Acorda European Master's Program in Computational Logic (EMCL) Project Report

Carroline Dewi Puspa Kencana Ramli

author

Prof. Luís Moniz Pereira

advisor

Lisbon

19th October 2008

Causal Modelling and Probabilistic Causation in Acorda

European Master's Program in Computational Logic (EMCL) Project Report

Carroline Dewi Puspa Kencana Ramli

CENTRIA Departamento de Informática Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa

Prof. Luís Moniz Pereira

Lisbon, 19th October 2008

Abstract Basically, humans reason based on cases and eect in combination with probabilistic information. Cause and eect itself is not enough to draw conclusion due to the problem of imperfect regulaties. The probabilistic theorem of causation can help solve these problems. The resulting theory is called causal Bayes nets. Translating human reasoning using the theory described previously to a computational framework is possible using logic programming. Thus, in this work we henceforth adopt a logic programming framework and methodology to model our functional description of causal models and Bayes Nets, building on its many strengths and advantages to derive both a consistent denition of its semantics and a working implementation with which to conduct relevant experiments. Acorda is a prospective logic programming language which simulates humans brain reason multiple steps into the future.

Acorda itself if not

equipped to deal with either utility functions or probabilistic theory.

On

another hand, P-log is a declarative logic programming language that can be used to reason with probabilistic models.

Combining with P-log, now

Acorda is ready to deal with uncertain problems that we face on a daily basis. In this project, we will show how the integration between Acorda and P-log works, and later we present several daily life examples that Acorda can help people to reason about.

Keywords:

P-log, Acorda, Prospective Logic Programming, Human Rea-

soning, Causal Models, Bayes Networks

3

Acknowledgement I would like to say thank my supervisor Prof.

Luís Moniz Pereira for his

patience during our discussions. He gave me a lot of advice in both academic and non-academic life . I'm also very grateful to Martin Slota for his help in the discussion, revising and editing the report. My thanks goes to Gonçalo Lopes for his help with xing Acorda and giving more detailed explanations in Acorda. I'm indebted to Han The Anh for the discussion about P-log and to Michael Gelfond and Weijin Zhu for the discussion and bug xing of the original P-log. Many thanks go also to secretariat personnel: Mrs Sandra, Mrs Filipa, Mrs Anabella for their supporting during my study.

To my fellow EMCL

classmates Mishiko, Rasha, Clemens, João, Luca and Anh I would like to thank for the nice time we had together during our study. Last but denitely not least, I would like to thank my parents and my brother for spiritual support. My study is supported by the Erasmus Mundus Scholarship.

4

Contents 1 Introduction

6

1.1

Causal Models of Human Reasoning

. . . . . . . . . . . . . .

6

1.2

Logic Programming . . . . . . . . . . . . . . . . . . . . . . . .

7

2 Background

9

2.1

Prospective Logic Programming . . . . . . . . . . . . . . . . .

9

2.2

P-log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.3

2.2.1

Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.2.2

Semantics . . . . . . . . . . . . . . . . . . . . . . . . .

16

XSB-XASP Interface . . . . . . . . . . . . . . . . . . . . . . .

21

3 Implementation

23

3.1

Implementation of Prospective Logic Programming . . . . . .

23

3.2

Implementation of P-log . . . . . . . . . . . . . . . . . . . . .

24

3.3

Integration of Prospective Logic Programming with P-log

. .

26

. . . . . . . . . . . . . . . . . . . . . . . . .

26

3.3.1

Overview

3.3.2

Expected Abducibles . . . . . . . . . . . . . . . . . . .

26

3.3.3

Utility Function . . . . . . . . . . . . . . . . . . . . . .

29

3.3.4

A Priori Preference . . . . . . . . . . . . . . . . . . . .

30

3.3.5

A Posteriori Preference . . . . . . . . . . . . . . . . . .

32

3.3.6

Multiple-step Prospection

36

. . . . . . . . . . . . . . . .

4 Futher Examples

42

4.1

Risk Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

4.2

Counterfactual

. . . . . . . . . . . . . . . . . . . . . . . . . .

46

4.3

Law Application

. . . . . . . . . . . . . . . . . . . . . . . . .

48

4.3.1

Jury Observation Fallacy

. . . . . . . . . . . . . . . .

5 Conclusion and Future Work

49

54

5.1

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

5.2

Future Work

55

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Bibliography

56

5

Chapter 1

Introduction Studying how human brain works [1] is one of the most astonishing areas of research. It draws from studies of intelligence by psychologists, but even more from ethology, evolutionary biology, linguistics, and the neurosciences [2, 3, 4, 5, 6].

The computational aspects of human brain are one of the

attractive research areas of computer science. Nowadays, a lot of research in the Articial Intelligence (AI) community tries to mimic how humans reason.

1.1

Causal Models of Human Reasoning

Basically humans reason based on cause and eect. David Hume described causes as objects regularly followed by their eects [7] We may dene a cause to be an object, followed by another, and where all the objects similar to the rst, are followed by objects similar to the second. Hume attempted to analyse causation in terms of invariable patterns of succession that are referred to as regularity theories of causation. There are a number of well-known diculties with regularity theories, and these may be used to motivate probabilistic approaches to causation. The dicult part in regularity theories is that most causes are not invariably followed by their eects. For example, it's widely acceptable that smoking is a cause of lung cancer, but not all smokers have lung cancer. By contrast, the central idea behind probabilistic theories of causation is that causes raise the probability of their eects; an eect may still occur in the absence of a cause or fail to occur in its presence. The probabilistic theorem of causation helps in dening a pattern in problems with imperfect regularities [8]. If

A

causes

B,

then, typically,

B

will not also cause

A.

Smoking causes

lung cancer, but lung cancer does not cause someone to smoke.

6

In other

words, causation is usually asymmetric, cause and eect can not be commuted. In addition, the asymmetry of causal relation is unrelated with the asymmetry of causal implication and its contraposition [9]. For example, if

A

stands for the statement Peter smokes and

Peter has lung cancer, then

¬B

doesn't imply

A

implies

B

B

stands for the statement

(smoking causes lung cancer) but

¬A (the absence of lung cancer doesn't cause Peter not to

smoke). This may pose a problem for regularity theories. It would be nice if a theory of causation could provide some explanation of the directionality of causation, rather than merely stipulate it [10]. For a few decades, statisticians, computer scientists and philosophers have worked on developing a theory about how to represent causal relations and how causal claims connect with probabilities.

Those representations

show how information about some features of the world may be used to compute probabilities for other features, how partial causal knowledge may be used to compute the eects of actions and how causal relations can be reliably learned, at least by computers. The resulting theory is called Causal Bayes Nets.

1.2

Logic Programming

Translation of human reasoning using causal models and Bayes Nets described previously into a computational framework would be possible using logic programming. The main argument is that humans reason using logic. Logic itself can be implemented on top of a symbol processing system like a computer. On the other hand, there is an obvious human capacity for understanding logic reasoning, one that might even be said to have developed throughout our evolution. Its most powerful expression today is science itself, and the knowledge amassed from numerous disciplines, each with its own logic. From state laws to quantum physics, logic has become the foundation on which human knowledge is built and improved. A part of the Articial Intelligence community has struggled, for some time now, to turn logic into an eective programming language, allowing it to be used as a system and application specication language which is not only executable, but on top of which one can demonstrate properties and proofs of correctness that validate the very self-describing programs which are produced. At the same time, AI has developed logic beyond the connes of monotonic cumulativity and into the non-monotonic realms that are typical of the real world of incomplete, contradictory, arguable, revised, distributed and evolving knowledge.

Over the years, enormous amount of

work and results have been achieved on separate topics in logic programming (LP), language semantics, revision, preferences, and evolving programs with updates[11, 12, 13]. Computational logic has shown itself capable to evolve and meet the demands of the dicult descriptions it is trying to address.

7

Thus, in this work we henceforth adopt a logic programming framework and methodology to model our functional description of causal models and Bayes Nets, building on its many strengths and advantages to derive both a consistent denition of its semantics and a working implementation with which to conduct relevant experiments. The use of the logic paradigm also allows us to present the discussion of our system at a suciently high level of abstraction and generality to allow for productive interdisciplinary discussions both about its specication and the derived properties.

The language of logic is universally used by both

natural sciences and humanities, and more generally at the core of any source of human derived knowledge, so it provides us with a common ground on which to reason about our theory.

Since the eld of cognitive science is

essentially a joint eort on the part of many dierent kinds of knowledge elds, we believe such language and vocabulary unication eorts are not only useful but mandatory.

8

Chapter 2

Background 2.1

Prospective Logic Programming

Modelling programs that mimic how human brain works and which are capable of non-deterministic self-evolution through self-updating is not so difcult. This possibility is already at hand, since there are working implementations of a logic programming language which was specically designed to model such program evolutions. EVOLP[13] is a non-deterministic LP language, having a well-dened semantics that accounts for self-updates, and we intend to use it to model autonomous agents capable of evolving by making proactive commitments concerning their imagined prospective futures. Such futures can also be prospections of actions, either from the outside environment or originating in the agent itself. This implies the existence of partial internal and external models, which can already be modelled and codied with logic programming. As we now have the real possibility of modelling programs that are capable of non-deterministic self-evolution, through selfupdating, we are confronted with the problem of having several dierent possible futures for a single starting program. It is desirable that such a system be able to somehow look ahead into such possible futures to determine the best paths of evolution from its present, at any moment. Now, we are confronted with the problem of having several dierent possible futures for a single starting program. It is desirable that such a system be able to somehow look ahead into such possible futures to determine the best paths of evolution from its present, at any moment. Prospective logic programming enables an evolving program to look ahead prospectively into its possible future states and to prefer amongst them to satisfy goals [14]. This paradigm is particularly benecial to the agents community, since it can be used to predict an agent's future by employing the methodologies from abductive logic programming [15, 16] in order to synthesize and maintain abductive hypotheses. Prospective logic programming is an instance of an architecture for causal

9

models, which implies a notion of simulation causes and eects in order to solve the choice problem for the alternative futures.

This entails that the

program is capable of conjuring up hypothetical what-if scenaria and formulating abductive explanations for both external and internal observations. Since we have multiple possible scenaria to choose from, we need some form of preference specication, which can be either a priori or a posteriori. A priori preferences are embedded in the program's own knowledge representation theory and can be used to produce the most relevant hypothetical abductions for a given state and observations, in order to conjecture possible future states. A posteriori preferences represent choice mechanisms, which enable the program to commit to one of the hypothetical scenaria engendered by the relevant abductive theories. These mechanisms may trigger additional simulations, by means of the functional connectivity, in order to posit which new information to acquire, so more informed choices can be enacted, in particular by restricting and committing to some of the abductive explanations along the way. Acorda

1 is a system that implements prospective logic programming and

is based on the above architecture. Acorda is implemented based on the implementation of EVOLP [13] and is further developed on top of XSB Prolog. In order to compute abductive stable models [11], Acorda also benets from the XSB-XASP interface to Smodels.

Language Let

L

be a rst order language. A domain literal in

L

is a domain atom A

or its default negation not A, the latter expressing that the atom is false by default (CWA). A domain rule in

L

is a rule of the form:

A ← L1 , . . . , Lt (t ≥ 0) where A is a domain atom and constraint in

L

L1 , . . . , Lt

are domain literals. An integrity

is a rule of the form:

⊥ ← L1 , . . . , Lt (t < 0) where

⊥ is a domain atom denoting falsity, and L1 , . . . , Lt are domain literals. P over L is a set of domain rules and integrity con-

A (logic) program

straints, standing for all their ground instances.

Abducibles Every program

P

is associated with a set of abducibles

A ⊆ L.

Abducibles

can be seen as hypotheses that provide hypothetical solutions or possible

1

Acorda implementation can be downloaded in

http://centria.di.fct.unl.pt/~lmp/publications/talks.html 10

explanations of given queries. An abducible

A

can be assumed only if it is a

considered one, i.e. it is expected in the given situation, and moreover there is no expectation to the contrary [11].

consider(A) ← expect(A), not expect_not(A) The rules about expectations are domain-specic knowledge contained in the theory of the program, and eectively constrain the hypotheses which are available.

Preferring Abducibles To express preference criteria amongst abducibles, we introduce the language

L∗ . A relevance atom is one of the form a / b, where a and b are abducibles. a / b means that the abducible a is more relevant than the abducible b. A relevance rule is one of the form:

a / b ← L1 , . . . , Lt (t ≥ 0) where

a/b

is a relevance atom and every

or a relevance literal. Let

L∗

Li (1 ≤ i ≤ t)

is a domain literal

be a language consisting of domain rules and

relevance rules.

Example 2.1

(Tea 1)

.

Consider a situation where an agent Claire drinks

either tea or coee (but not both). She does not expect to have coee if she has high blood pressure. Also suppose that Claire prefers coee over tea when she is sleepy. This situation can be represented by a program set of abducibles

Q

over

L

with

AQ = {tea, cof f ee}:

falsum <- not drink. drink <- consider(tea). drink <- consider(coffee). constrain(1,[tea, coffee],1). expect(tea). expect(coffee). expect_not(coffee) <- blood_pressure_high. coffee < tea <- sleepy. The query is triggered by integrity constraint coded with

falsum/0.

Acorda

has to fulll the integrity constraint by trying to nd all the explanations of

the atom

drink .

We codied the preference using the

predicate. The solutions are:

11

Having the notion of expectation allows one to express the preconditions for an expectation or otherwise about an assumption

a,

and express which

possible expectations are conrmed (or assumed) in a given situation. the preconditions do not hold, then expectation therefore

a will never be assumed.

By means of

If

a cannot be considered, and expect_not one can express

situations where one does not expect something. In this case, when blood pressure is high, coee will not be considered or assumed because of the contrary expectation arising as well (and therefore tea will be assumed).

Abducibles Sets In many situations it is desirable not only to include rules about the expectations for single abducibles, but also to express contextual information constraining the powerset of abducibles.

For instance, in the previous ex-

ample we expressed that abducing tea or coee was mutually exclusive (i.e. only one of them could be abduced), but it is easy to imagine similar choice situations where it would be possible, indeed even desirable, to abduce both, or neither. The behaviour of abducibles over dierent sets is highly contextdependent, and as such, should also be embedded over rules in the theory. Overall, the problem is analogous to the ones addressed by cardinality and weight constraint rules for the Stable Model semantics, and below we present how one can nicely import these results to work with abduction of sets, and also hierarchies of sets.

Example 2.2.

Consider a situation where Claire is deciding what to have for

a meal from a limited buet. The menu has appetizers (which Claire doesn't mind skipping, unless she's very hungry), three main dishes, from which one can select a maximum of two, and drinks, from which she will have a single one. The situation, with all possible choices, can be modelled by the following program

P

over

L

with the set of abducibles

AP = {bread, salad, cheese, f ish, meat, veggie, wine, juice, water, appertizers, main_dishes, drinks} constrain(0, [bread, salad, cheese], 3) <- appetizers. constrain(1, [fish, meat, veggie], 2) <- main_dishes. constrain(1, [wine, juice, water], 1) <- drinks. constrain(2, [appetizers, main_dishes, drinks], 3). main_dishes < appetizers. drinks < appetizers. appetizers <- very_hungry. In this situation we model appetizers as being the least preferred set from those available for the meal. This shows how we can condition sets of abducibles based on the generation of literals from other cardinality constraints along with preferences amongst such literals.

12

A Posteriori Preference Once each possible scenario is actually obtained, there are a number of different strategies which can be used to choose which of the scenaria leads to more favourable consequences. A possible way to achieve this is using numeric functions to generate a quantitative measure of utility for each possible action. We allow for the application of this strategy, by making a priori assignments of probability values to uncertain literals and utilities to relevant consequences of abducibles. We can then obtain a posteriori the overall utility of a model by weighing the utility of its consequences by the probability of its uncertain literals. It is then possible to use this numerical assessment to establish a preorder amongst remaining models. Both qualitative and quantitative evaluations of the scenaria can be greatly improved by merely acquiring additional information to make a nal decision. We next consider the mechanism that our agents use to question external systems, be they other agents, actuators, sensors or other procedures. Each of these serves the purpose of an oracle, which the agent can probe through observations of its own. Having computed possible scenaria, represented by abductive stable models, more favourable scenaria can be preferred amongst them a posteriori. Typically, a posteriori preferences are performed by evaluating consequences of abducibles in abductive stable models. The evaluation can be done quantitatively (for instance by utility functions) or qualitatively (for instance by enforcing some rules to hold). When currently available knowledge is insucient to prefer amongst abductive stable models, additional information can be gathered, e.g. by performing experiments or consulting an oracle. To realize a posteriori preferences, Acorda provides predicate

select/2

that can

be dened by users following some domain-specic mechanism for selecting favoured abductive stable models.

The use of this predicate to perform a

posteriori preferences will be discussed in a subsequent section.

2.2

P-log

Probabilistic logic (P-log) was introduced for the rst time by Chitta Baral et.al [17, 18].

P-log is a declarative language that combines logical and

probabilistic reasoning. P-log uses Answer Set Programming (ASP) as its logical foundation and Causal Bayes Nets [19] as its probabilistic foundation. The original P-log

2 [17, 18] uses Answer Set Programming (ASP) as a

tool for computing all stable models of the logical part of P-log. Although ASP has been proved to be a useful paradigm for solving varieties of combinatorial problems, its non-relevance property [20] makes the P-log system

2

The original P-log can be accessed at

http://www.cs.ttu.edu/~wezhu/

13

3 [21]

sometimes computationally redundant. Newer developments of P-log

use the XASP package of XSB Prolog for interfacing with an answer set solver. The power of ASP allows the representation of both clasical and default negation in P-log easily. Moreover, P-log(XSB) uses XSB as the underlying processing platform, allowing arbitrary Prolog code for recursive denitions. P-log can also represent a mechanism for updating or reassesing probability [22]. Later by using XASP, we can easily integrate with the prospective logic programming system Acorda [14] that was explained in Section 2.1.

2.2.1

Syntax

In general, a P-log program

Π

consists of a sorted signature, declarations,

a regular part, a set of random selection rules, a probabilistic information part, and a set of observations and actions. This P-log syntax is based on P-log(XSB) [21].

Sorted signature and Declaration The sorted signature

Σ

of

Π

contains a set of constant symbols and term-

building function symbols, which are used to form terms in the usual way. Additionally, the signature contains a collection of special function symbols

a(t¯), where a is an attribute and a vector of terms of the sorts required by a. A literal is an atomic statement, p, or its explicit negation, neg _p. Literals p and neg _p are called contrary. The expressions p and not p where not is the called attributes. Attribute terms are expressions of the form

t¯ is

default negation of ASP are called extended literals. The declaration part of a P-log program can be dened as a collection of

c can be dened by listing all the elements c = {x1 , . . . , xn }, by specifying the range of values c = {L..U } where L and U are the integer lower bound and upper bound of the sort c, or even by specifying range of values of its members c = {h(L..U )} where h/1 is any unary predicate. We are also able to dene a sort by arbitrarily mixing the previous constructions, e.g. c = {x1 , .., xn , L..U, h(M..N )}. In sorts and sort declarations of attributes. A sort

addition in extended P-log, it is allowed to declare union as well as inter-

c = union(c1 , ...., cn ) while c = intersection(c1 , ..., cn ), where ci , 1 ≤ i ≤ n are

section of sorts. A union sort is represented by an intersection sort by declared sorts. Attribute

a with domain c1 ×...×cn and range c0 is represented as follows: a : c1 × ... × cn > c0

3

The newer P-log(XSB) can be accessed at

Home

http://sites.google.com/site/plogxsb/

14

If attribute of attribute

a has no domain parameter, a is denoted by range(a).

we simply write

a : c0 .

The range

Regular part Regular part of a P-log program consists of a collection of XSB Prolog rules, facts and integrity constraints (IC) formed using literals of constraint is encoded as a XSB rule with the

false

Σ.

An integrity

literal in the head.

Random Selection Rule Random selection rule for attribute

a

has the form:

random(RandomN ame, a(t¯), DynamicRange) :- Condition. a(t¯) is random if the conditions in DynamicRange allows to restrict the default The RandomN ame is a syntactic mechanism

This means that the attribute instance

Condition

are satised. The

range for random attributes.

used to link random attributes to the corresponding probabilities. If there is no precondition we simply write:

random(RandomN ame, a(t¯), DynamicRange). The constant

f ull

domain is equal to

is used in

DynamicRange

to signal that the dynamic

range(a).

Probabilistic Information Information about probabilities of random attribute instances particular value

y

a(t¯)

taking

is given by probability atoms (or simply pa-atoms) which

have the following form:

pa(RandomN ame, a(t¯, y), d(A, B)):- Condition. Condition were to be true, and the value of a(t¯) named RandomN ame, then Condition would cause

it means that if the were selected by a rule

a(t¯) = y

with probability

Example 2.3 (Dice).

A B.

There are two dice,

d1

and

d2,

belonging to Mike and

John, respectively. Each dice has scores from 1 through 6, and will be rolled

1/4 [17]. program Πdice

once. The dice owned by Mike is biased to 6 with probability scenario can be coded with the following P-log(XSB)

1. 2. 3. 4. 5. 6.

score = {1..6}. dice = {d1, d2}. owns(d1, mike). owns(d2, john). roll : dice --> score. random(r(D), roll(D), full). pa(r(D), roll(D, 6), d(1, 4)) :- owns(D, mike). 15

This

Two sorts

score

and

roll

of the signature of

Πdice

are declared in lines 1-2.

The regular part contains two facts in line 3 which are to say that dice belongs to Mike and

roll

d2 belongs to John.

d1

Line 4 is the declaration of attribute

which maps each dice to a score. The fact that the distribution of the

attribute

roll

is random is expressed by the random selection rule in line 5.

Line 6 belongs to probabilistic information part, saying that the dice owned by Mike is biased to 6 with probability

Example 2.4

1 4.

.

(Dice 2) Suppose there is a third dice which only shows even numbers. It is codied below:

7. dice = {d3}. 8. random(r(D), roll(D), even) :- D == d3. 9. even(X) :- 0 is X mod 2. % update the previous random rule 5. random(r(D), roll(D), full) :- D \= d3. We add more information about the third dice in lines 7 9. Dice

d3 will

be randomly distributed over even numbers. After we put the new information, we should replace the previous random rule on line 5 so that the dice are randomly distributed over the score from 1 to 6 except for dice

d3.

Observations and Actions Observations and actions are, respectively, statements of the forms and

do(l),

where

l

obs(l)

is a literal. Observations are used to record the outcomes

of random events, i.e. random attributes and attributes dependent on them.

obs(roll(d1,4)) to record the do(a(t, y)) indicates that a(t) = y

The dice domain may, for instance, contain outcome of rolling dice d1. The statement

is made true as the result of a deliberate (non-random) action. For instance,

do(roll(d1, 4))

may indicate that d1 was simply put on the table in the

described position.

2.2.2

Semantics

The semantics is dened in two stages. logical part of

Π

will play the role of possible worlds of

τ (Π)

First it denes a mapping of the

into its XASP counterpart

Π.

τ (Π).

The answer sets of

τ (Π)

Next the probabilistic part of

will be used to dene a measure over the possible worlds as well as the

probability of (complex) formulas. The logical part of a P-log program counterpart 1.

τ (Π)

Π

is transformed into its XASP

by the following ve steps:

Sort declaration: •

for every sort declaration for each

c = {x1 , .., xn } of Π, τ (Π) contains c(xi )

1 ≤ i ≤ n. 16

•

for every sort declaration where

•

L ≤ i ≤ U,

for every sort declaration

c(h(i)) •

where

L ≤ i ≤ U,

c(X) : − ci (X)

Π, τ (Π)

Regular part:

c(i)

contains

c = {h(L..U )} of Π, τ (Π) L ≤ U.

for

contains

with integers

c = union(c1 , ..., cn ), τ (Π) each 1 ≤ i ≤ n.

contains the

c = intersection(c1 , ..., cn ), τ (Π) c(X) : − c1 (X), . . . , cn (X).

for every sort declaration tains the rule

2.

of

for every sort declaration rules

•

c = {L..U } L ≤ U.

with integers

For each attribute term

a(t¯), τ (Π)

con-

contains the rules:

• f alse :- a(t¯, Y 1), a(t¯, Y 2), Y 1\ = Y 2. which is to guarantee that in each answer set

a(t¯) has at most one

value.

• a(t¯, y) :- do(a(t¯, y)). which is to guarantee that the atoms which are made true by a deliberate action are indeed true. 3.

Random selection: a, τ (Π) contains the rule: intervene(a(t¯)) :- do(a(t¯, Y )).

•

For attribute

•

Each random selection rule

random(RandomN ame, a(t¯), DynamicRange) :- Condition. is translated into:

a(t¯, Y ) :- tnot(intervene(a(t¯))), tnot(neg _a(t¯, Y )), Condition. neg _a(t¯, Y ) :- tnot(intervene(a(t¯))), tnot(a(t¯, Y )), Condition. atLeastOne(t¯) :- a(t¯, Y ). f alse :- tnot(atLeastOne(t¯)). pd(RandomN ame, a(t¯, Y )) :tnot(intervene(a(t¯))), DynamicRange(Y ), Condition. if DynamicRange is not f ull, τ (Π) contains f alse :- Condition, a(t¯, Y ), tnot(DynamicRange(Y )), tnot(intervene(a(t¯))).

4.

Observation and action: τ (Π) contains obs(l) and do(l), where l

is

a literal. 5. For each literal l, 6.

τ (Π)

contains the rule:

Probabilistic Information: τ (Π)

f alse :- obs(l), tnot(l).

for each probabilistic information rule,

contains exactly same rule without any change.

17

Notice that, similarly to ASP, in the body of each XASP rule additional domain predicates are necessary for grounding the variables appeared in non-domain predicates (see example 2.5 for further understanding). In the

tnot/1 is used. In pd/3 is to dene default ¯) makes the correon a(t

transformation the XSB default table negation operator the transformation of random selection the predicate probabilities. The execution of a deliberate action sponding

intervene(a(t¯))

alternatives for attribute

true, thereby blocking the generation of random

a(t¯).

Also notice that our semantics is equivalent

to the semantics dened in [17] for the original P-log syntax.

In fact, we

reformulated the transformation from the original paper to adapt it to the XASP syntax. For example, the cardinality expression of Smodels language used in the original paper is changed by an even loop to generate stable models and rules for determining upper and lower bounds. The rationale for the transformation can be found in [17].

Example 2.5.

For better understanding of the transformation, we provide here the resulting transformed program τ (Πdice ) of Πdice described in example 2.3:

1. 2. 3. 4. 5. 6.

score(1). score(2). score(3). score(4). score(5). score(6). dice(d1). dice(d2). dice(d3). owns(d1,mike). owns(d2, john). false :- score(X), score(Y), dice(D), roll(D, X), roll(D,Y), X \= Y. roll(D, X) :- dice(D), score(X), do(roll(D, X)). intervene(roll(D)) :- dice(D), score(X), do(roll(D, X)). false :- roll(D, X), dice(D), D == d3 score(X), tnot(even(X)), tnot(intervene(roll(D))), 7. roll(D, X) :- dice(D), score(X), tnot(intervene(roll(D))), tnot(neg_roll(D, X)). 8. neg_roll(D, X) :- dice(D),score(X),tnot(intervene(roll(D))), tnot(roll(D, X)). 9. atLeastOne(D) :- dice(D), score(X), roll(D, X). 10.false :- dice(D), tnot(atLeastOne(D)). 11.pd(r(D), roll(D, X)) :- dice(D), D \= d3, score(X), tnot(intervene(roll(D))). 12.pd(r(D), roll(D, X)) :- dice(D), D == d3 score(X), tnot(intervene(roll(D))), even(X). 13.pa(r(D), roll(D, 6), d_(1, 4)) :- owns(D, mike). Lines 1-2 are the transformation of sorts declaration. resulting code of the transformation for the attribute

Lines 3-4 are the

roll

(regular part).

Lines 5-12 are the result of the transformation for the random selection part in line 5 and line 8 of the original program

Πdice .

Line 13 is the probabilistic

information part that is kept unchanged from the original program (line 6 of

Πdice ).

Notice that the domain predicates

score/1

and

dice/1

were added

in some rules (lines 3 12) for the purpose of grounding of the variables.

Denition 2.6 (Possible Worlds). world of

Π.

An answer set of

The set of all possible worlds of

18

Π

τ (Π) is called a possible Ω(Π).

will be denoted by

There are several meta-conditions which guarantee that the possible worlds provide reasonable assignments of probability to attributes.

These

are discussed in the original paper [17].

Assigning Measures of Probability The dicult part of the semantics is the assignment of probability measures for attributes and then to answer sets (i.e.

possible worlds).

The follow-

ing denition captures which attributes will be considered for assignment of probability. The following presentation is adapted from [17].

Denition 2.7 (Possible Outcomes).

Let W be a consistent set of literals of Σ, Π be a P-log program, a be an attribute, and y belong to the range of a. We say that the atom a(t¯) = y is possible in W with respect to Π if Π contains a random selection rule r for a(t¯), where r is of the form dened in Section 2.2.1, such that the DynamicRange ∈ W and W satises Condition. We say that y is a possible outcome of a(t¯) in W with respect to Π via rule r, and that r is a generating rule for the atom a(t¯) = y .

W ∈ Ω(Π) and every atom a(t¯) = y possible in W we will dene ¯) = y). Whenever possible, the the corresponding causal probability P (W, a(t ¯) = y will be directly assigned by pr-atoms of the probability of an atom a(t ¯) = y). To dene probabilities of the program and denoted by P A(W, a(t For every

remaining atoms we assume that by default, all values of a given attribute which are not assigned a probability are equally likely. Their probabilities will be denoted by

P D(W, a(t¯) = y).

(PA stands for assigned probability

and PD stands for default probability). For each atom

a(t¯) = y

possible in

W: 1.

Assigned probability:

2.

Default probability:

Π contains pa(r, a(t¯, y), v) : − B where r ¯ is the generating rule of a(t) = y , B ⊆ W , and W does not contain intervene(a(t¯)), then P A(W, a(t¯) = y) = v If

S , let |S| denote the cardinality of ¯ S . Let Aa(t¯) (W ) = {y|P A(W, a(t) = y) is dened}, and a(t¯) = y be possible in W such that y ∈ / Aa(t¯) (W ). Then let αa(t¯) (W ) = P ¯ ¯ y∈Aa(t¯)(W ) P A(W, a(t) = y) and βa(t¯) (W ) = |{y : a(t) = y is possible in W and y ∈ / Aa(t¯) (W )}| For any set

P D(W, a(t¯) = y) = 3.

1 − αa(t¯) (W ) βa(t¯) (W )

Causal probability: P (W, a(t¯) = y) of a(t¯) in W P (W, a(t¯) = y) =

P A(W, a(t¯) = y) P D(W, a(t¯) = y) 19

if

is dened by:

y ∈ Aa(t¯) (W )

otherwise

Case 1 captures the assigned probabilities obtained in the possible world. Case 2 provides a uniform distribution for the unassigned values of an attribute. The non-assigned probability of attributes is distributed equally by the remaining values. The last case combines both cases in order to be able to dene a proper probability measure:

Denition 2.8

.

µ ˆΠ (W ), of a possible world W induced by Π is: µ ˆΠ (W ) = a(t¯,y)∈W P (W, a(t¯) = y). Where the product is taken over atoms for which P (W, a(t¯) = y) is

1. Let

W

(Measure)

be a possible world of

Π.

The unnormalized probability,

Q

dened. 2. Suppose

Π

is a P-log program having at least one possible world with a

µΠ (W ),

nonzero unnormalized probability. The measure world

W

induced by

Π

is the unnormalized probability of

of a possible

W

divided by

the sum of the unnormalized probabilities of all possible worlds of i.e.

µΠ (W ) = P

Denition 2.9 (Probability). worlds of program

E,

Π

Π,

µ ˆΠ (W ) ˆΠ (Wi ) Wi ∈Ω µ

The probability,

PΠ (E),

of a set

E

of possible

is the sum of the measures of the possible worlds from

i.e.

PΠ (E) =

X

µΠ (W )

W ∈E

Denition 2.10 Π

(Probability of Formulas)

A, PΠ (A), is A is true, i.e.

of a (ASP) formula

worlds of

Π

in which

PΠ (A) =

.

The probability w.r.t program

the sum of the measures of the possible

X

µΠ (W )

W `A The syntax and semantics of ASP formulas can be found in [17].

Denition 2.11 log program

T,

(Conditional Probability in P-log)

formula

A,

and a set of

Σ-literals B

.

For any consistent P-

such that

PT ∪obs(B) (A) = PT (A ∧ B)/PT (B) in other words,

PT (A|B) = PT ∪obs(B) (A).

20

PT (B) 6= 0,

2.3

XSB-XASP Interface

The Prolog language has been for quite some time one of the most accepted means to codify and execute logic programs, and as such has become a useful tool for research and application development in logic programming. Several stable implementations have been developed and rened over the years, with plenty of working solutions to pragmatic issues ranging from eciency and portability to explorations of language extensions. The XSB Prolog system is one of the most sophisticated, powerful, ecient and versatile among these implementations, with a focus on execution eciency and interaction with external systems, implementing program evaluation following the WellFounded Semantics (WFS) for normal logic programs. Two of its hallmark characteristics make it a particularly useful system on top of which to implement Acorda, and many of its supporting subsystems. First of all, the tabling mechanism [23], in which the results of particular queries are stored for later reuse, can provide not only an enormous decrease in time complexity, but also allow for solutions to well-known problems in the LP community, such as query loop detection. Secondly, its aiming for external systems interaction eventually resulted in the development of an interface to Smodels [24], one of the most successful implementations of the Stable Models semantics over normal logic programs, also known as the Answer Set semantics. The SM semantics has become the cornerstone for the denition of some of the most important results in logic programming of the past decade, providing an increase in logic program declarativity and a new paradigm for program evaluation.

Many of the

Acorda subsystems are dened on top of the Stable Models (SM) semantics, and as such, this integration proves extremely useful, and even accounts for new and desirable computational properties that neither of the systems could provide on its own. The XASP interface [20] (standing for XSB Answer Set Programming)

4

provides two distinct methods of accessing Smodels .

The rst one is for

using Smodels to obtain the stable models of the so-called residual program, the one that results from a query evaluated in XSB using tabling.

This

residual program is represented by delay lists, that is, the set of undened literals for which the program could not nd a complete proof, due to mutual dependencies or loops over default negation for that set of literals, which are detected by the XSB tabling mechanism. This method allows us to obtain any two-valued semantics in completion to the three-valued semantics the XSB system provides.

The second method is to build up a clause store,

adding rules and facts to compose a generalized logic program that is then parsed and sent to Smodels for evaluation, thereafter providing access to the

4

The XSB Logic Programming system and Smodels are freely available at:

http://xsb.sourceforge.net

and

http://www.tcs.hut.fi/Software/smodels

21

computed stable models back to the XSB system. This kind of integration allows one to maintain the relevance property for queries over our programs, something that the Stable Models semantics does not originally enjoy. In Stable Models, by the very denition of the semantics, it is necessary to compute all the models for the whole program.

In our

system, we sidestep this issue, using XASP to compute the relevant residual program on demand, usually after some degree of transformation. Only the resulting program is then sent to Smodels for computation of possible futures. We believe that such system integrations are crucial in order to extend the applicability of the more rened and declarative semantics that have been developed in the eld of AI.

22

Chapter 3

Implementation 3.1

Implementation of Prospective Logic Programming

Figure 3.1 illustrates the architecture of a prospective logic agent.

Each

agent is equipped with a knowledge base and Bayes Nets as its initial theory. The problem of prospection is then that of nding abductive extensions to this initial theory which are both relevant (under the agent's current goals) and preferred (w.r.t.

the preference rules in its initial theory).

The rst

step is to select the goals that the agent will possibly attend to during the prospective cycle.

Integrity constraints are also considered here to ensure

the agent always performs transitions into valid evolution states. Once the set of active goals for the current state is known, the next step is to nd out which are the relevant abductive hypotheses. This step may include the application of a priori preferences, in the form of contextual preference rules, amongst available hypotheses to generate possible abductive scenaria. Forward reasoning can then be applied to abducibles in those scenaria to obtain relevant consequences, which can then be used to enact a posteriori preferences. These preferences can be enforced by employing utility theory. In case additional information is needed to enact preferences, the agent may consult external oracles. This greatly benets agents in giving them the ability to probe the outside environment, thus providing better informed choices, including the making of experiments. Each oracle mechanism may have certain conditions specifying whether it is available for questioning. Whenever the agent acquires additional information, it is possible that ensuing side-eects aect its original search, e.g. some already considered abducibles may now be disconrmed and some new abducibles are triggered. To account for all possible side-eects, a second round of prospection takes place.

23

Figure 3.1: Prospective Logic Agent Architecture

3.2

Implementation of P-log

The implementation of P-log(XSB) consists of two main modules. The rst module performs the transformation dened in Section 2.2.2 and the second module processes the probabilistic information. The transformation module transforms the original le with P-log syntax into an XASP le which is the input le for the second module called probabilistic information module. This module uses the XASP le to compute all the possible worlds (stable models) of the program, and then computes the probabilistic information for the given query according to the computed stable models. The structure is showed in Figure 3.2. The transformation module (le interpreter.P, implemented in XSB prolog) reads this P-log le and transforms it according to the ve transformation steps described in Section 2.2 into a new le which has the XASP syntax. This le can then be consulted from the probabilistic information module to derive all the stable models with the necessary information for processing the query the atoms for random attributes and probabilistic information, which have been coded using predicates pd/2 and pa/3. The predicate pd/2 is added for each random attribute. Its rst argument keeps the name of the rule and the second argument is for keeping the attribute name. The predicate pa/3 serves to encode probabilistic information in the

24

Figure 3.2: P-log Architecture

program. Having obtained stable models with relevant information, the system is ready to answer the query about probabilistic information coded inside the P-log program.

Whenever there is a query (formula in ASP syntax), the

probabilistic processing module will search the computed stable models for the ones that satisfy the query. In this step, the pd/2 and pa/3 predicates will be taken into account to gain the unnormalized probabilities for each possible world. These unnormalized probabilities will then be summed up to obtain the probability of the query. This approach, using

xnrm as the interface with Smodels as the interface

with Smodels together with the tabling mechanism of XSB, is very ecient for implementing P-log. Particularly, we do not need to compute the whole stable models as in answer set programming. Using XASP to contact with Smodels, we just have to compute the minimal amount of relevant information. Furthermore, with XASP we can compute the stables models and then comfortably process them as lists.

25

3.3

Integration of Prospective Logic Programming with P-log

3.3.1

Overview

Acorda is implemented based on the prospective logic programming architecture dened in Section 3.1. Acorda is the main component of the system

1

with P-log used as probabilistic support in the background . Computation in each component is done independently but they can cooperate in providing the information needed. The presented architecture exposes this separation in a very explicit way. There is indeed a necessary separation between producer and consumer of information.

Acorda and P-log can both act as a producer and consumer

of information. The interfaces between the various components of the the integration of Acorda with P-log are made explicit in Figure 3.3 ( please note that the dashed lines represent communication between Acorda and P-log). At the beginning, Acorda contains all kinds of knowledge including the Bayes Nets.

Acorda sends Bayes Nets information to P-log and later

P-log translates all the information sent by Acorda and keep them for future computation. After Acorda computes all the relevant abducible hypotheses, P-log uses these hypotheses for computing probabilistic information. This time, Acorda acts as the producer of information and P-log consumes it. This role holds also when Acorda nishes computing the abductive scenaria. On the other hand, Acorda sometimes needs probabilistic information in order to apply the a priori or a posteriori preferences. consumes the probabilistic information from P-log.

This time, Acorda

Notice that P-log can

use information indirectly from an oracle through Acorda.

3.3.2

Expected Abducibles

Each abducible needs rst to be expected (i.e. made available) by a model of the current knowledge state.

expect_not/1

This is achieved via the

expect/1

and

clauses which indicate conditions under which an expectable

abducible is indeed expected for an observation given the current knowledge state. Sometimes we are not really sure about the relevant expected abducibles. What we do is to make an aproximation of the expectation of some abducibles.

We expect only those abducibles for which we reach a certain

degree of belief. Acorda handles this using probabilistic information as the pre-condition that must be satised before continuing the computation. This mechanism removes all unnecessary abducibles (i.e. all abducibles with the degree of belief below some particular value).

1

The integration of Acorda with P-log can be accessed at

site/acordaplog/Home

26

http://sites.google.com/

Figure 3.3: Integration Architecture

27

Example 3.1

.

(Cab 1) There was an accident on the street that involved a cab. Two cab companies operate in the city: Blue and Green. Suppose the probability of a Green cab being involved is 70%. Now the police try to catch the driver. Who are the suspects?

1. falsum <- not catch_person. 2. catch_person <- consider(suspect(P)). 3. expect(suspect(P)) <- driver(P, C), cab_company_involved(C, PR), prolog(PR > 0.5). 4. expect_not(suspect(P)) <- have_alibi(person(P)). 5. driver(antonio, blue). driver(berry, blue). driver(charlie, blue). 6. driver(peter, green). driver(robert, green). driver(john, green). 7. have_alibi(person(P)) <- observe(prog,alibi_oracle(P),'Street ABC', true). 8. observe(prog, alibi_oracle(P),Q,R) <- oracle, prolog(oracleQuery(was_not_at_street(P,Q),R)). 9. com_company_involved(C, PR) <- cab(C), prolog(pr(cab(C),PR)). 10.cab(green). cab(blue). 11. beginPr. 12. color = {green, blue}. 13. cab : color. 14. random(rc, cab, full). 15. pa(rc, cab(green), d_(70, 100)). 16.endPr. The abducible literals for this case are

Abs = {suspect(antonio), suspect(berry), suspect(charlie), suspect(john), suspect(peter), suspect(robert)} There are six possible suspects and hence there are 63 non-empty subsets of the set of all possible suspects. On line 3 we state that someone is expected to be a suspect if he or she is one of the drivers of a cab company which is suspected to be involved in the accident by the degree of belief higher than 50%. At this stage, Acorda calls P-log to nd out the probability of each company being involved in the accident. Since the results are:

cab_company _involved(blue, 0.30) and cab_company _involved(green, 0.70), the only expected suspects are Peter, John and Robert, since they work for the Green company (line 6). Hence from 63 possibilities only 7 are remaining and it can make the computation much more ecient. In the next step, the police can question the remaining possible suspects about their alibi, which is modelled as an external oracle.

Based on the

information provided, the set of expected suspects is nally identied.

28

3.3.3

Utility Function

Abduction can also be seen as a mechanism to enable the generation of the possible futures of an agent, with each abductive stable model representing a possibly reachable scenario of interest. Preferring over abducibles in this case is enacting preferences over the imagined future of the agent. In this particular domain, it is unavoidable to deal with uncertainty, a problem that decision theory is ready to address using probability theory coupled with utility functions [25].

Example 3.2

.

(Cab 2)

(Continuation of the Example 3.1) After nding

the suspects and interrogating them, the police make a calculation and and weight who has the highest chance of being a suspect.

The fact that the

accident happened in the night.

2. catch_person <- suspect(P,U). 20.suspect(P,U) <- consider(suspect(P)), prolog(utilityValue(P,U)), prolog(U < 0). 21.beginProlog. 22. utilityValue(P,U) :22. (shift(P, night) -> Rate is 1; Rate is 0), 23. historyRecord(P,HistoryRecord), 24. alcoholRate(P, AlcoholRate), 25. yearsOfExperience(P,YearsOfExperience), 26. U is HistoryRecord - (Rate * AlcoholRate * 1 / YearsOfExperience). 27. 28.

shift(antonio, day). shift(berry, night). shift(charlie, day). shift(peter, night). shift(robert, night). shift(john, day).

29. 30. 31. 32.

historyRecord(antonio, 0.6). historyRecord(berry, 0.65). historyRecord(charlie, 0.7). historyRecord(peter, 0.55). historyRecord(robert, 0.6). historyRecord(john, 0.8).

33. 34.

alcoholRate(antonio, 2). alcoholRate(berry, 2). alcoholRate(charlie, 2). alcoholRate(peter, 2). alcoholRate(robert, 3). alcoholRate(john, 10).

35. yearsOfExperience(antonio, 5). yearsOfExperience(berry, 5). 36. yearsOfExperience(charlie, 5). 37. yearsOfExperience(peter, 5). yearsOfExperience(robert, 2). 38. yearsOfExperience(john, 10). 39.endProlog. Someone is considered as a suspect based on the utility value assigned to him. The police compute the utility value of each driver based on the time of his shift, the history record, the amount of alcohol in his blood and the number of years of experience. We assume that all the drivers are honest and answer all of the questions truthfully. Then it is impossible that somebody

29

who had the day shift is still a suspect. The history record is also important in order to see whether the driver is a good person or not. Further, if he has more experience, it means that he is a good driver and lowers the probability of him being a suspect.

suspect_person(john, 0.8), suspect_person(peter, 0.15), suspect_person(robert, −0.9) and nd that Robert After the computation, the police get the result:

is clearly the best candidate for a suspect.

3.3.4

A Priori Preference

Once the set of relevant considered abducibles is determined from the program's current knowledge state, all that remains is to determine the active a priori preferences that are relevant for that set. This is merely a query for all preference literals whose heads indicate a preference between two abducibles that belong to the set, and whose body is true in the Well-Founded Model of the current knowledge state.

Now, we can dene the preferences using

probabilistic information.

Example 3.3 (Wetgrass).

Suppose that agent Boby wants to have refreshing in the holiday time. There are two options, going to the beach or going to the cinema. Agent Boby prefers to go to the beach if the probability of raining is quite low (less than 40%). Agent Boby also does not want to go to the cinema if the movies are boring. What should agent Boby decide for his refreshing time given the knowledge of wheather forecast and the information that today the grass is wet?

1. falsum <- not refreshing. 2. refreshing <- beach. 3. refreshing <- cinema. 4. expect(beach). 5. expect(cinema). 6. beach <- consider(beach). 7. cinema <- consider(cinema). 8. beach < cinema <- raining(PR), prolog(PR < 0.4). 9. raining(PR) <- wetgrass(X), prolog(pr((rain(t) '|' wetgrass(X)), PR)). 10.wetgrass(t) <- observe(prog, wetgrass_condition, today, true). 11.wetgrass(f) <- observe(prog, wetgrass_condition, today, false). 12.observe(prog, wetgrass_condition, Q, R) <- oracle, prolog(oracleQuery(wetgrass_condition(Q), R)). 13.expect_not(cinema) <- boring_movie. 14.boring_movie <- observe(prog, movie_reference, today, true). 15.observe(prog, movie_reference, Q, R) <- oracle, prolog(oracleQuery(boring_movie(Q), R)). 16.beginPr. 30

17. 18. 19. 20. 21.

bool = {t, f}. cloudy : bool. rain : bool. sprinkler : bool. wetgrass : bool.

22. 23. 24. 25.

random(rc, random(rr, random(rs, random(rw,

26. pa(rc, 27. pa(rc, 28. pa(rs, 29. pa(rs, 30. pa(rr, 31. pa(rr, 32. pa(rw, 33. pa(rw, 34. pa(rw, 35. pa(rw, 36.endPr.

cloudy, full). rain, full). sprinkler, full). wetgrass, full).

cloudy(t), d_(1, 2)). cloudy(f), d_(1, 2)). sprinkler(t), d_(1, 2)) :- cloudy(f). sprinkler(t), d_(1, 10)) :- cloudy(t). rain(t), d_(2, 10)) :- cloudy(f). rain(t), d_(8, 10)) :- cloudy(t). wetgrass(t), d_(0, 1)) :- sprinkler(f), rain(f). wetgrass(t), d_(9, 10)) :- sprinkler(t), rain(f). wetgrass(t), d_(9, 10)) :- sprinkler(f), rain(t). wetgrass(t), d_(99, 100)) :- sprinkler(t), rain(t).

Agent Boby's preference is based on the value of probability raining given the condition of the grass (lines 89). the condition of grass. of

rain(t)

given

Acorda calls the oracle to acquire

Later, P-log computes the conditional probability

wetgrass(X).

All lines between

beginPr

and

endPr

are

reserved for P-log code. The Bayes Nets for this problem is represented in Figure 3.4 and coded on lines 1735.

Figure 3.4: Bayes Nets for Wetgrass Condition

31

3.3.5

A Posteriori Preference

If everything goes well and only a single model emerges from computation of the abductive stable models, the Acorda cycle terminates and the resulting abducibles are added to the next state of the knowledge base. In most cases, however, we cannot guarantee the emergence of a single model, since the active preferences may not be sucient to defeat enough abducibles.

In

these situations, the Acorda system has to resort on additional information for making further choices. A given abducible can be defeated in any one of two cases: either by satisfaction of an

expect_not/1 clause for that abducible, or by satisfaction

of a preference rule that prefers another abducible instead.

However, the

current knowledge state may be insucient to satisfy any of these cases for all abducibles except one, or else a single model would have already been abduced. It is then necessary that the system obtains the answers it needs from somewhere else, namely from making experiments on the environment or from querying an outside entity. Acorda consequently activates its a posteriori choice mechanisms by attempting to satisfy additional selecting preferences. There are two steps, rst Acorda computes the utility value for each abducible that is described later on.

Later Acorda selects amongst them based on the preference function

dened. This steps is coded below in the meta predicate

select/2.

select(Model, NewModel) :select1(Model, Model1), select2(Model1, NewModel). select1(Model, Model1) :addUtilityValue(Model, Model1). select2(Model1, NewModel) :% use preference function to select the model.

Example 3.4

.

(Tea 2) (Continuation of the Example 2.1) Consider a situation where agent Claire usually drinks either coee or tea when thirsty (but not both). She prefers coee to tea when sleepy, but should not drink coee if she nds that her blood pressure is high. Usually, she prefers tea to coee. The availability of tea is around 60%. What do we suggest for agent Claire's beverage?

1. drink <- consider(tea). 2. drink <- consider(coffee). 3. 4. 5. 6. 7.

expect(tea). expect(coffee). expect_not(coffee) <- blood_pressure_high. blood_pressure_high <- observe(prog, blood_oracle, blood_pressure_high, true). observe(prog, blood_oracle, Q, R) <- oracle, 32

prolog(oracleQuery(check_blood_pressure(Q), R)). 8. coffee < tea <- sleepy. 9. sleepy <- observe(prog, newest_condition_oracle, sleepy, true). 10.observe(prog, newest_condition_oracle, Q, R) <- oracle, prolog(oracleQuery(newest_condition(Q), R)). 11.constrain(1, [tea, coffee] , 1). 12.falsum <- thirsty, not drink. 13.thirsty. 14.beginPr. 15. beverage = {tea, coffee}. 16. available : beverage. 17. random(rd, available, full). 18. pa(rd, available(tea), d_(3, 5)). 19.endPr. 20.beginProlog. 21. :- import member/2, length/2, flatten/2 from basics. 22.

initialize :- initialize([utilityRate(tea, 0.8), utilityRate(coffee, 0.7)]).

23. 24.

26.

select(M, Mnew) :initialize, % add utility value to each models based on utility function select1(M, M2), % a posteriori preference amongts models select2(M2, 0, [], Mnew).

27. 28. 29. 30.

select1([], []). select1([X|Xs], [Y|Ys]) :addUtilityValue(X, Y), select1(Xs, Ys).

31. 32. 33. 34.

select2([], _, M, M). select2([M|Ms], Acc, OldM, NewM):member(utilityModel(U), M), U > Acc -> select2(Ms, U, M, NewM); select2(Ms, Acc, OldM, NewM).

25.

35. addUtilityValue([X], [utilityModel(UModel)|[X]]):36. holds utilityRate(X, R), 37. pr(available(X), P), 38. UModel is R * P. 39. addUtilityValue(_, []). 40.endProlog. First, Acorda launches oracles to acquiring information about Claire's condition whether she is sleepy and whether she has high blood pressure. If there is no contrary to each expecting beverage i.e. agent Claire is not sleepy and also does not have high blood pressure, we will have two dierent models:

M1 = {cof f ee}, M2 = {tea}.

Next, Acorda does a posteriori selection.

Our selecting preference amongst models is codied on lines 2243. First we

33

initialize the utility rate for both the beverages. In the predicate, we dene our utility function.

addUtilityValue/2

In this example, we dene the

utility value for each beverage as its probability of availability of that bev-

M1 = {utilityM odel(0.2800), cof f ee}, M2 = {utilityM odel(0.4800), tea}. Predicate select2/2 will select the highest utility model and the nal result is M2 = {utilityM odel(0.4800), tea} which means that based on the a posteerage times its utility rate. After the computation we get the result:

riori preference using our utility function, agent Claire is suggested to have tea. The next example is more complicated with the combination between utility function and a posteriori preference.

Example 3.5 (Holiday). is looking for a trip.

Suppose it is a summer holiday now. Agent Albert

There is a summer oer for Jamaica, Croatia and

Algarve:

•

Jamaica: 14 days, nature holiday and the cost is 725 Euros.

•

Croatia: 7 days, beach holiday and the cost is 333 Euros.

•

Algarve: 4 days, beach holiday and the cost is 123 Euros.

Suppose that agent Albert prefers nature to beach and he prefers to have cheaper holiday based on daily costs. Agent Albert only can select one of the oers. What is the best suggestion for agent Albert?

1. falsum <- not summer_offer. 2. summer_offer <- jamaica. 3. summer_offer <- croatia. 4. summer_offer <- algarve. 5. expect(jamaica(P, D, price(jamaica, 6. expect(croatia(P, D, price(croatia, 7. expect(algarve(P, D, price(algarve,

T)) P), T)) P), T)) P),

8. constrain(1, [jamaica(725, 14, nature), croatia(333, 7, beach), algarve(123, 4, beach)], 1). 9. jamaica <- consider(jamaica(P, D, T)). 10.croatia <- consider(croatia(P, D, T)). 11.algarve <- consider(algarve(P, D, T)). 12.price(jamaica, 725). price(croatia, 333). price(algarve, 123). 13.duration(jamaica, 14). duration(croatia, 7). duration(algarve, 4). 14.type(jamaica, nature). type(croatia, beach). type(algarve, beach). 34

15.beginProlog. 16. :- import member/2, length/2 from basics. 17. 18. 19. 20.

select(M, NewM) :initialize([utilityRate(beach, 0.5), utilityRate(nature, 1)]), select1(M, M1), select2(M1, 1000, [], NewM).

21. 22. 23. 24.

select1([], []). select1([X|Xs], [Y|Ys]) :addUtilityValue(X, Y), select1(Xs, Ys).

25. 26. 27. 28.

select2([], _, M, M). select2([M|Ms], Acc, OldM, NewM):member(utilityModel(U), M), U < Acc -> select2(Ms, U, M, NewM); select2(Ms, Acc, OldM, NewM).

29. 30. 31. 32. 33.

addUtilityValue(M, [utilityModel(UModel)|M]):member(X, M), !, X =.. [Head|[P, D, T]], Q = (holds utilityRate(T, R)), Q, % calculate price pre day times utility rate UModel is R * (P / D).

34.

addUtilityValue(_, []).

35. 36.

initialize(L):Q = (asserts(L), events([])), Q.

37. 38. 39.

update_utilityRate(X, New):Q1 = (holds utilityRate(X, Old)), Q1, Q2 = (asserts([not utilityRate(X, Old), utilityRate(X, New)]), events([])), Q2. 40.endProlog. In this example, summer oer is a goal that should be lauched through the integrity constraint (line 1).

In lines 23, we dene types of summer

oers: Jamaica, Croatia and Algarve.

We expect each summer oer with

constraint that we only can have one oer (lines 58). We can have one of the summer oers (i.e. Jamaica or Croatia or Algarve) if we considered it (lines 911).

We put the information about the duration, type of holiday

and the price for each summer oer in lines 1214. First we dene the a posteriori predicate

select/2.

We initialize the

utility rate based on agent Charlie's preference (line 18).

Next we aug-

ment each model with its utility value (line 19) by using the

select1/2

predicate and select them with the maximum value (line 20) by using the

select2/2 predicate. Later, if we need to update the utility rate, we can use update_utilityRate/2 as a meta predicate for updating utility rate. In this example, we only dene utility rate for the summer holiday type, either

35

nature or beach. All the abducibles that we have are

Abd = {algarve(123, 4, beach), croatia(333, 7, beach), jamaica(725, 14, nature)} We only can have one of those abducibles because of the constraint that we dened previously (line 8). Because of that, we need a future computation using a utility function. Our utility function is described as the daily cost that agent Albert should spend. We dene the utility function for this example as utility rate of the type of holiday times the price per day. Later we call this result with the utility model. After the computation, we get that each model has its utility model which is

M1 = {utilityM odel(15.3750), algarve(123, 4, beach)}, M2 = {utilityM odel(23.7857), croatia(333, 7, beach)}, M3 = {utilityM odel(36.2500), jamaica(725, 14, nature)} and nally we choose the smallest value, i.e. the model

M1 .

Even though

previously agent Albert prefered nature to beach, based on the utility value we got, we suggest that agent Albert should choose the Algarve oer.

3.3.6

Multiple-step Prospection

Each cycle ends with the commitment of the agent to an abductive solution that satises the current active goals. Sometimes we also want to look ahead several steps several steps based on previously abduced variables in combination with the probability and utility functions.

Example 3.6

(Tea 3)

.

(Continuation of the Example 3.4 Suppose after two

times of having tea or coee, agent Claire does not so much prefer to have it and she prefers to try another type of beverage (i.e. if she drank coee 2 or more times, she prefers to have tea in the next round and similarly also for tea). Her preference gets decreased by 40% for coee and 30% for tea. Each ve rounds she resets her preferences. We use the code of Example 3.4, we only change the denition of the selection preference function into the following one:

select(M, Mnew) :initialize([utilityRate(tea, 2), utilityRate(coffee, 1)]), findall (Lits, time_stamp(_, Lits), L), flatten(L, Ln), compute(Ln, tea, 0, NumTea), compute(Ln, coffee, 0, NumCoffee), (NumTea > 2 -> (update_utilityRate(tea, 0.3)) ; true), (NumCoffee > 2 -> (update_utilityRate(coffee, 0.4)) ; true), length(L, N), (modulo5(N) -> 36

(update_utilityRate(tea, 2), update_utilityRate(coffee, 1)) ; true), select1(M, M2), select2(M2, 0, [], Mnew). Acorda calls

acordaSimulation(NumberOfSteps, Abducibles, Options)

for the simulation. The result for a prospection of ten steps ahead is:

Models for step 1: [utilityModel(0.2800), coffee], [utilityModel(0.3771), coffee, tea], [utilityModel(0.4800), tea]] Chosen model for step 1: [utilityModel(0.4800), tea] Models for step 2: [utilityModel(0.2800), coffee], [utilityModel(0.3771), coffee, tea], [utilityModel(0.4800), tea] Chosen model for step 2: [utilityModel(0.4800), tea] Models for step 3: [utilityModel(0.2800), coffee], [utilityModel(0.3771), coffee, tea], [utilityModel(0.4800), tea] Chosen model for step 3: [utilityModel(0.4800), tea] Models for step 4: [utilityModel(0.2800), coffee], [utilityModel(0.2288), coffee, tea], [utilityModel(0.1800), tea] Chosen model for step 4: [utilityModel(0.2800), coffee] Models for step 5: [utilityModel(0.2800), coffee], [utilityModel(0.2288), coffee, tea], [utilityModel(0.1800), tea] Chosen model for step 5: [utilityModel(0.2800), coffee] Models for step 6: [utilityModel(0.2800), coffee], [utilityModel(0.3771), coffee, tea], [utilityModel(0.4800), tea] Chosen model for step 6: [utilityModel(0.4800), tea] Models for step 7: [utilityModel(0.2800), coffee], [utilityModel(0.2288), coffee, tea], [utilityModel(0.1800), tea] Chosen model for step 7: [utilityModel(0.2800), coffee] Models for step 8: [utilityModel(0.1600), coffee], [utilityModel(0.1694), coffee, tea], [utilityModel(0.1800), tea] 37

Chosen model for step 8: utilityModel(0.1800), tea] Models for step 9: [utilityModel(0.1600), coffee], [utilityModel(0.1694), coffee, tea], [utilityModel(0.1800), tea] Chosen model for step 9: [utilityModel(0.1800), tea] Models for step 10: [utilityModel(0.1600), coffee], [utilityModel(0.1694), coffee, tea], [utilityModel(0.1800), tea] Chosen model for step 10: [utilityModel(0.1800), tea] When looking towards the future, an agent is confronted with several scenaria.

Later when more information is available (possibly given by an

oracle), the future prediction can be repeated.

Example 3.7 (Blue Cab Problem).

A cab was involved in a hit-and-run ac-

cident at night. Two cab companies, the Green and the Blue, operate in the city. Imagine you are given the following information: 85% of the cabs in the city are Green and 15% are Blue. A witness identied the cab as a Blue cab. The court tested his ability to identify cabs under the appropriate visibility conditions. When presented with a sample of cabs (half of which were Blue and half of which were Green), the witness made correct identications in 80% of the cases and erred in 20% of the cases. What is the probability that the cab involved in this accident was Blue? 1st : nd the cab suspect. There is an accident, we should nd the suspect

accident. falsum <- accident,

not find_cab_company_involved.

find_cab_company_involved <- find_cab_company_involved(_, _). find_cab_company_involved(X, P) <- consider(suspect(X, P)). expect(suspect(X, P)) <- cab(X), prolog(pr(cab(X) '|' witness(blue), P)). cab(green). cab(blue). constrain(2, [suspect(blue, PB), suspect(green, PG)], 2)
random(r1, witness, full). cab : color. random(r2, cab, full). pa(r2, cab(blue), d_(15, 100)) :- night(f). pa(r2, cab(green), d_(85, 100)) :- night(f). pa(r1, witness(blue), d_(80, 100)) :- cab(blue), normal. pa(r1, witness(blue), d_(20, 100)) :- cab(green), normal. night(t) :- holds night. night(f) :- holds not_night. normal :- holds normal. endPr. 2nd : Judge the quilty From the suspect which one is the quilty

guilty(X) <- find_cab_company_involved(X, P), prolog(P > 0.5). guilty <- guilty(_). falsum <- accident, not observe(guilty). 3rd : Defence The defendant claims that the accident happened in the night. The number each cabs are dierent from the number in the day. For Green company, the cabs are reduced one fth in the night and for the Blue company, the number is reduced into one third.

not_night <- not night. beginPr. pa(r2, cab(blue), d_(X, 100)) :- night(t), compute(blue, X). pa(r2, cab(green), d_(X, 100)) :- night(t), compute(green, X). compute(X, Val) :N1 is 85/5 + 15/3, (X = blue -> Val is 5/N1*100; Val is 17/N1*100). endPr. 4th : Prosecutor the prosecutor said that the reliability of witness in the night is not so good

beginPr. pa(r1, witness(blue), d_(60, 100)) :- cab(blue), night(t), check_witness_reliability. pa(r1, witness(blue), d_(40, 100)) :- cab(green), night(t), check_witness_reliability. check_witness_reliability :- holds check_witness_reliability. endPr. 39

5th : Police Police is trying to nd the suspect

catch_person <- suspect_person(P, U). expect(suspect(P)) <- driver(P, C), find_cab_company_involved(C, PR) , prolog(PR > 0.5). suspect_person(P, U) <- consider(suspect(P)), prolog(utilityValue(P, U)), prolog(U > 0). driver(antonio, blue). driver(berry, blue). driver(charlie, blue). driver(peter, green). driver(robert, green). driver(john, green). person(P) <- driver(P, _). shift(antonio, day). shift(berry, night). shift(charlie, day). shift(peter, night). shift(robert, night). shift(john, day). beginProlog. utilityValue(Person, 1) :holds shift(Person, night). utilityValue(Person, 0) :holds shift(Person, day). endProlog. 6th : Interrogation: asking alibi

find_alibi <- have_alibi(suspect(P)). have_alibi(person(P)) <- observe(prog, alibi_oracle(P), 'Street ABC', true). observe(prog, alibi_oracle(P), Q, R) <- oracle, prolog(oracleQuery(was_not_at_street(P, Q), R)). This case can be simulated in below.

Models for step 1: [suspect(blue, 0.4138), suspect(green, 0.5862)] Continue?(yes/no)yes. Update knowledge?(yes/no)yes. falsum <- accident, not observe(guilty). finish. Models for step 2: [suspect(blue, 0.4138), suspect(green, 0.5862), guilty(green)] Continue?(yes/no)yes. Update knowledge?(yes/no)yes. night. finish. Models for step 3: [suspect(blue, 0.5405), suspect(green, 0.4595), guilty(blue)] Continue?(yes/no)yes. Update knowledge?(yes/no)yes. check_witness_reliability. finish. Models for step 4: [suspect(blue, 0.3061), suspect(green, 0.6939), guilty(green)] Continue?(yes/no)yes. Update knowledge?(yes/no)yes. falsum <- not catch_person. 40

finish. Models for step 5: [suspect(peter), suspect(robert), suspect(blue, 0.3061), suspect(green, 0.6939), guilty(green)], [suspect(peter), suspect(blue, 0.3061), suspect(green, 0.6939), guilty(green)], [suspect(robert), suspect(blue, 0.3061), suspect(green, 0.6939), guilty(green)] Continue?(yes/no)yes. Update knowledge?(yes/no)yes. expect_not(suspect(P)) <- have_alibi(person(P)). finish. Step: 6 Confirm observation: was_not_at_street(john, Street ABC) (true or false)? false. Confirm observation: was_not_at_street(robert, Street ABC) (true or false)? false. Confirm observation: was_not_at_street(peter, Street ABC) (true or false)? true. Models for step 6: [suspect(robert), suspect(blue, 0.3061), suspect(green, 0.6939), guilty(green)] Continue?(yes/no)no.

41

Chapter 4

Futher Examples 4.1

Risk Analysis

The economics of risk has been a fascinating area of inquiry for at least two reasons [26]. First, there is hardly any situation where economic decisions are made with perfect certainty. and pervasive. risk, etc.

The sources of uncertainty are multiple

They include price risk, income risk, weather risk, health

As a result, both private and public decisions under risk are of

considerable interest.

This is true in positive analysis (where we want to

understand human beavior), as well as in normative analysis (where we want to make recommendations about particular management or policy decisions). Second, over the last few decades, signicant progress has been made in understanding human behavior under uncertainty. As a result, we have now a somewhat rened framework to analyse decision-maing under risk. In a sense, the economics of risk is a dicult subject; it involves understanding human decisions in the absence of perfect information. How do we make decisions when we do not know some of the events aecting us? The complexities of our uncertain world certainly make this dicult. In addition, we do not understand well how the human brain processes information. As a result, proposing an analytical framework to represent what we do not know seems to be an imposible task. In spite of these diculties, much progress has been made. First, probability theory is the cornerstone of risk assessment. This allows us to measure risk in a fashion that can be communicated amongst decision makers or researchers. Second, risk preferences are better understood.

This provides useful insights into the economic rationality of

decision-making under uncertainty.

Third, over the last decades, good in-

sights have been developed about the value of information.

This helps us

to better understand the role of information and risk in private as well as public decision-making.

42

The Measurement of Risk We dene risk as representing any situation where some events are not known with certainty. This means that one cannot inuence the prospects for the risk. Risk can relate to weather outcomes (e.g. whether it will rain tomorrow), health outcomes (e.g., whether you will catch a cold tomorrow), time allocation outcomes (e.g., whether you will get a new job next year), market outcomes (e.g., whether the price of wheat will rise next week), or monetary outcomes (e.g., whether you will win the lottery tomorrow). It can also relate to events that are relatively rare (e.g.

whether an

earthquake will occur next month in a particular location, or whether a volcano will erupt next year). The list of risky events is thus extremely long. First, this creates a signicant challenge to measure risky events.

Indeed,

how can we measure what we do not know for sure? Second, given that the number of risky events is very large, is it realistic to think that risk can be measured?

We will present a simple example about decisionmaking in a

company which risk is taken account.

Example 4.1 (Company risk). on government contracts.

A construction company does subcontracting

The construction company's utility function is U (X) = 2 ∗ X − 0.01 ∗ X 2 , (X <= 100), X

approximately represented by

being its income (in thousands of dollars). Suppose the company is considering bidding on a contract. Preparation of a bid would cost 8000, and this would be lost if the bid failed. If the bid succeeded, the company would make 40,000 gain. The company judges the chance of a successful bid as 0.3. What should it do?

1. expect(decide(bid)). 2. expect(decide(not_bid)). 3. constrain(1, [decide(bid), decide(not_bid)], 1). 4. decision <- consider(decide(X)). 5. decide(bid) < decide(not_bid) <- not_take_risk, prolog(pr(goal(success), PS)), prolog(pr(goal(failure), PF)), prolog(PS > PF). 6. decide(not_bid) < decide(bid) <- not_take_risk, prolog(pr(goal(success), PS)), prolog(pr(goal(failure), PF)), prolog(PF > PS). 7. not_take_risk <- not take_risk. 8. falsum <- not decision. 9. beginPr. 10. domain = {success, failure}. 11. goal : domain. 12. random(r, goal, full). 13. pa(r, goal(success), d_(3, 10)). 14.endPr.

43

15.beginProlog. 16. :- import addUtilityValue/2 from myFile2. 17. :- import member/2, length/2 from basics. 18. select(M, Mnew) :19. consult('../examples/myExamples/myUtilityFile.P'), 20. select1(M, M1), 21. select2(M1, 0, [], Mnew). 22.endProlog. When we only reason based on the causal model and not consider taking a risk, we will end up with the decision

not_bid

because the probability of

a successful bid is too low. When we take the risk into account, using the utility function that we dened in a seperate le (myUtilityFile.P), the best decision is

bid

with the result:

M = {utilityM odel(4.4800), expectedP rof it(4.0000), decide(bid)}

Example 4.2

(My utility function le)

company risk is coded below:

.

My utility function for the problem

:- export addUtilityValue/2. addUtilityValue([X], [utilityModel(UModel)|[X]]):holds utilityRate(X, R), pr(drinking(X), P), UModel is R * P.. addUtilityValue([X, Y], [utilityModel(UModel)|[X, Y]]):Q = (holds utilityRate(X, RX), holds utilityRate(Y, RY)), Q, pr(drinking(X), PX), pr(drinking(Y), PY), UModel is ((RX * PX) + (RY * PY) )/2. addUtilityValue(_, []). Utility function can be coded seperately in dierent le. Acorda takes into account the le as Prolog code and compiles it together with the Acorda codes. Using the power of Prolog, we can dene advanced utility function even we can even use with constraint logic programming.

Example 4.3 (Mamamia).

The owner of the Restaurant Mamamia wants to oer a new menu. Before the launch of the new menu, he performed a research. Based on his research, 75% of teenagers prefer the new menu. 80% of adult prefer the original menu. Around 40% of his costumers are teenagers. If he oers the new menu, each new menu will cost 5 EUR. The basic cost that he should spend for 100 new menus is 200 Euros. What is your suggestion for the owner of the Restaurant Mamamia? The owner's 2 utility function is approximately represented by U (X) = 2 ∗ X − 0.01X , (X <= 100), X being his income.

1. expect(decide(launch_new_menu)). 2. expect(decide(not_launch_new_menu)). 44

3. constrain(1,[decide(launch_new_menu), decide(not_launch_new_menu)], 1). 4. decision <- consider(decide(X)). 5. decide(launch_new_menu) < decide(not_launch_new_menu)
not_take_risk, PO)), prolog(PN > PO). not_take_risk, PN)), prolog(PO > PN).

7. not_take_risk <- not take_risk. 8. falsum <- not decision. 9. beginPr. 10. age = {teenager, adult}. 11. offer = {original, new}. 12. 13. 14.

customer : age. random(rc, customer, full). pa(rc, customer(teenager), d_(60,100)).

15. menu : offer. 16. random(ro, menu, full). 17. pa(ro, menu(new), d_(75,100)) :- customer(teenager). 18. pa(ro, menu(original), d_(80,100)) :- customer(adult). 19.endPr. 20.beginProlog. 21. :- import member/2, length/2 from basics. 22. 23. 24.

select(M, Mnew) :select1(M, Mnew), select2(_,0,[],_).

25. 26. 27. 28.

select1([],[]). select1([X|Xs], [Y|Ys]) :addUtilityValue(X,Y), select1(Xs,Ys).

29. 30. 31. 32.

select2([],_,M,M). select2([M|Ms], Acc,OldM,NewM):member(utilityModel(U), M), U > Acc -> select2(Ms, U, M, NewM); select2(Ms, Acc, OldM, NewM).

33.

addUtilityValue([decide(X)], [utilityModel(EU), expectedProfit(Profit),decide(X)]) :expectedProfit(X, Profit), expectedUtility(X, EU).

34. 35. 36. 37. 38. 39.

return(launch_new_menu, 5). return(not_launch_new_menu, 0). cost(launch_new_menu, 2). cost(not_launch_new_menu, 0).

45

40. 41. 42. 43.

expectedProfit(Action, P) :return(Action, R), cost(Action, C), P is R - C.

44. expectedUtility(Action, EU) :45. pr(menu(new), PrN), 46. expectedProfit(Action, PA), 47. EU is (2*PA*PrN - 0.01*PA*PA*PrN). 48.endProlog. In Example 4.3, the expected return value depends on the probability of the number of new menu oers. Given probability information about the age of customers and their behaviour in choosing a menu oer, we compute the expected probability of new menu oers. In this case, the probabilty of new menu oers is 0.42 an the probability of original menu oers is 0.58. The probability of original menu oers is higher than the probability of new menu oers.

Furthermore, if we compute the expected return more

comprehensively, we will get the result:

M1 = {utilityM odel(3.1323), expectedP rof it(3), decide(launch_new_menu)}, M2 = {utilityM odel(0.0000), expectedP rof it(0), decide(not_launch_new_menu)} Based on the computation, it is better if we launch new menu even though it is risky.

4.2

Counterfactual

Reections on what might have been are termed counterfactual thoughts. People engage in counterfactual thinking all the time. These thoughts are sometimes painful, as when they lead to the emotion of regret, for example I should have tried harder .... The rst pop up question is Why are we concerned with what didn't happen?. We do counterfactual questions constantly in our daily lives. It seems we cannot resist imagining the alternative scenaria: what might have happened, if only we had or had not... [27]. We picture ourselves avoiding past blunders, or committing blunders we narrowly avoided. Such thoughts are mere day-dreams.

We know perfectly well that we cannot travel back

in time and do these things dierently. But the business of imagining such counterfactuals is a vital part of the way in which we learn. Because decisions about the future are usually based on weighing up the potential consequences

46

of alternative courses of action, it makes sense to compare the actual outcomes of what we did in the past with the conceivable outcomes of what we might have done. There is a double rationale for counterfactual analysis. From the point of view of a logician, it is a logical necessity when asking questions about causation to pose but for questions, and to try to imagine what would have happened if our supposed cause had been absent. For this reason, we are obliged to construct plausible alternative pasts on the basis of judgements about probability. These can be made only on the basis of historical evidence.

From the point of view of a historian [28], doing counterfactual

reasoning is a historical necessity when attempting to understand how the past actually was. We must attach equal importance to all the possibilities which contemporaries contemplated before the fact, and greater importance to these than to an outcome which they did not anticipate. In the causal models, A caused B means that event A and B both occured, but if event A had not occured, B would not have occured either.

The basic idea of

counterfactual theories of causation is that the meaning of causal claims can be explained in terms of counterfactual conditionals of the form If A had not occurred, B would not have occurred [29]. The best known counterfactual analysis of causation is David Lewis's theory [10]. To illustrate this, we present a simple counterfactual example BettyCharlie taken from [29].

Example 4.4

(BettyCharlie)

.

Betty throws a rock at a bottle.

Charlie

throws a rock at the same bottle. Both of them have excellent aim and can break the bottle almost every time. Betty's rock gets to the bottle rst and breaks it. Charlie's rock would have broken it, but it got there too late. Did Betty's rock cause the bottle to break? Of course. Did charlie's? Of course not. Under counterfactual test, if Betty hadn't thrown her rock, the bottle still would have broken, just like the bottle would likely have broken even if Charlie hadn't thrown his rock.

1. broken_bottle <- consider(hit(_)). 2. expect(hit(X)) <- throw_rock(X). 3. expect_not(hit(betty)) <- first_hit(charlie). 4. expect_not(hit(charlie)) <- first_hit(betty). 5. throw_rock(betty). 6. throw_rock(charlie). 7. constrain(1,[hit(betty),hit(charlie)],1). 8. falsum <- not broken_bottle. 9. first_hit(betty) <- throw_rock(betty). Using Acorda, we modelled the Betty-Charlie example easily. At rst, the model is only

M = {hit(betty)}.

Under a counterfactual test, we assert

47

asserts/1 asserts([notf irst_hit(betty)]). Betty's hit is no longer the rst and hence expect_not(hit(charlie)) is no longer true. Now, the models are M1 = {hit(betty)} and M2 = {hit(charlie)} because both Betty

that Betty is not the rst one who hit the bottle by using the predicate in Acorda:

and Charlie have a chance to hit the bottle. Next, under counterfactual test, if Betty hadn't thrown her rock, what would have happended? We assert result is that the bottle still would

asserts([not throw_rock(betty)]). The have broken with M2 = {[hit(charlie)}

because Charlie had still thrown the rock.

If we come back to the be-

gining, under counterfactual test, if Charlie hadn't thrown his rock, the bottle still would have broken.

We codied this condition by asserting

asserts([not throw_rock(charlie)]) and the result is the model M1 = {hit(betty)}, even in case Betty was not the rst one to hit the bottle.

4.3

Law Application

Probability theory, especially the theory of Bayes Nets is widely misunderstood by the general public. Lawyers are no dierent from ordinary members of the public in falling victim to arguments that have been known to mathematicians for decades to be fallacies. The so-called prosecutor's fallacy, the defendant's fallacy and jury's fallacy [30, 31] are well-known examples that arise from a basic misunderstanding of conditional probability and Bayes' Theorem.

Example 4.5

.

(Court's case)

Suppose a crime has been committed. Blood

is found at the scene for which there is no innocent explanation. It is of a type which is present in 1% of the population. The Prosecutor's fallacy is the assertion: There is a 1% chance that the defendant would have the crime blood type if he were innocent. Thus, there is a 99% chance that he is guilty. The Defendant's fallacy is the assertion: This crime occurred in a city of 800,000 people. This blood type would be found in approximately 8,000 people.

The evidence has provided a probability of 1 in 8,000 that the

defendant is guilty and thus has no relevance.

P (A|B) (the conditional probP (B|A) where A represents the event

The prosecutor's fallacy is to assume that ability of A given B) is the same as

Defendant innocent and B represents the event Defendant has the matching blood type. The defendant's fallacy is to ignore the large change in the odds in favour of the defendant's guilt. Bayesian Probability Theory gives an accurate way of calculating the correct odds. However the courts have ruled that such complex mathematics should not be presented to juries as it could lead to miscarriages of justice for other reasons. The unresolved problem is how to stop juries and lawyers

48

making a range of errors of reasoning like the Prosecutor's fallacy if they do not understand the mathematics. If Bayes Nets approach is adopted by the courts it could help prevent innocent people being jailed.

4.3.1

Jury Observation Fallacy

Example 4.6

(Jury Observation)

.

Consider the following situation that we

shall refer to subsequently as the questionable verdict scenario: The jury, in a serious crime case, has found the defendant not guilty. It is subsequently revealed that the defendant had a previous conviction for a similar crime. We now pose a question that we shall refer to subsequently as the external observer's question: Does the subsequent evidence of a previous similar conviction (in the questionable verdict scenario) make you less condent that the jury were correct in their verdict? Suppose we know that a person has been charged and tried. By an external observer here we mean that the perspective is that of somebody simply observing that a trial has taken place, a not guilty verdict has been delivered and subsequent previous conviction information has been revealed.

The external observer is assumed not to have been a party to

the courtroom proceedings. The fallacy, which we shall refer to as the jury observation fallacy, is that most people answer yes to the observer's question irrespective of a range of underlying assumptions. The Bayes Net showing causal structure of this problem can be seen in Figure 4.1. Associated with each node in a Bayes Nets is a probability table. Except for nodes that have no parents, this table provides the probability distribution for the node values conditional on each combination of parent values. For example, the probability table for the node on the nodes

Charged

and

Hard Evidence)

V erdict (which is conditional

might look like the ones shown

in Table 4.1, Table 4.2, Table 4.3. For nodes without parents (such as Guilty) the table represents the prior probability distribution. For example, if the crime were committed in a city of 10,000 people then the probability table would look like the one shown in Table 4.4. Using Acorda, now we can easily codify this problem.

% facts jury_decided(guilty(f)). falsum <- not do_observation. do_observation <- consider(jury_decision(X)). expect(jury_decision(correct)). expect(jury_decision(incorrect)).

49

Figure 4.1: Causal Structure of Jury Observation

50

Table 4.1: Verdict Probability Table Charged

Yes

No

Hard Evidence

Yes

No

Yes

No

Guilty

0.99

0.01

0

0

Innocent

0.01

0.99

0

0

No Trial

0

0

1

1

Table 4.2: Charged Probability Table Hard Evidence

Yes

Previous Similar Conviction

No

Yes

No

Yes

No

Yes

0.9999

0.99

0.02

0.00001

No

0.0001

0.01

0.98

0.99999

Table 4.3: Hard Evidence Probability Table Guilty Yes

No

Yes

0.95

0.000001

No

0.05

0.999999

Table 4.4: Guilty Probability Table Yes

No

0.0001

0.9999

Table 4.5: Previous Similar Conviction Probability Table Guilty Yes

No

Yes

0.1

0.0001

No

0.9

0.9999

jury_decision(correct) < jury_decision(incorrect) P2). jury_decision(incorrect) < jury_decision(correct)
opposite(guilty(f), guilty(t)). beginPr. bool = {t, f}. type = {guilty, innocent, no_trial}. guilty : bool. random(rg, guilty, full). pa(rg, guilty(t), d_(1, 10000)). previous_similar_conviction : bool. random(rp, previous_similar_conviction, full). pa(rp, previous_similar_conviction(t), d_(1, 10)) :- guilty(t). pa(rp, previous_similar_conviction(t), d_(1, 10000)) :- guilty(f). hard_evidence : bool. random(ph, hard_evidence, full). pa(ph, hard_evidence(t), d_(95, 100)) :- guilty(t). pa(ph, hard_evidence(t), d_(1, 1000000)) :- guilty(f). charge : bool. random(pc, charge, full). pa(pc, charge(t), d_(9999, 10000)) :hard_evidence(t), previous_similar_conviction(t). pa(pc, charge(t), d_(99, 100)) :hard_evidence(t), previous_similar_conviction(f). pa(pc, charge(t), d_(2, 100)) :hard_evidence(f), previous_similar_conviction(t). pa(pc, charge(t), d_(1, 100000)) :hard_evidence(f), previous_similar_conviction(f). verdict : type. random(pv, verdict, full). pa(pv, verdict(guilty), d_(99, 100)) :- charge(t), hard_evidence(t). pa(pv, verdict(guilty), d_(1, 100)) :- charge(t), hard_evidence(f). pa(pv, verdict(innocent), d_(1, 100)) :- charge(t), hard_evidence(t). pa(pv, verdict(innocent), d_(99, 100)) :- charge(t), hard_evidence(f). pa(pv, verdict(no_trial), d_(1,1)) :- charge(f), hard_evidence(t). pa(pv, verdict(no_trial), d_(1,1)) :- charge(f), hard_evidence(f). endPr. We know that the verdict of the jury is innocent. Starting from that, we can decide whether the jury's decision is correct or not by evaluating the probability of guiltiness of the defendant. There are two cases of preference expectation:

•

we prefer our expectation that the jury's decision is correct to an incorrect one if the probability of being innocent given that the verdict is innocent, the person was charged and previously convicted of a similar crime is higher than the probability of being guilty given all the previous conditons,

52

•

we prefer our expectation that the jury's decision is incorrect to a correct one if the probability of being innocent given that the verdict is innocent, the person was charged and previously convicted of a similar crime is lower than the probability of being guilty given all the previous conditons,

From line

beginPr/0

given in previous tables.

to endPr/0 do/1 in

we dene the conditional probability

pr(guilty(t) '|' do(verdict(guilty) & charge(t) & previous_similar_conviction(t)),P1) means we know exactly that the condition of verdict, charge and prvious similar conviction are made true as the result of a deliberate action (see Section 2.2.1 for more explanation). The result of the computation is:

pr(guilty(t) '|' do(verdict(innocent) & charge(t) & previous_similar_conviction(t)), P1). pr(guilty(f) '|' do(verdict(innocent) & charge(t) & previous_similar_conviction(t)), P2). where

P 1 = 0.0503, P 2 = 0.9497.

The jury's decision should be correct because the probability of the defendant being guilty is really low, only 5%. It is reasonable because nding out that the defendant had a previous similar conviction should actually make the observer more convinced of the correctness of the verdict of innocency.

53

Chapter 5

Conclusion and Future Work 5.1

Conclusion

Basically, humans reason using cause and eect models. The imperfect regularities problem in those models can be overcome by using probabilistic models such as Bayes Nets. Unfortunately, the theory of Bayes Nets does not provide tools for generating a scenario that is needed for generating several possible worlds. Those are important to simulate human reasoning not only about the present but also about the future. Acorda is a prospective logic programming language that is able to prospect possible future worlds based on the abduction theory, preference and expectation theory.

But Acorda itself cannot handle probabilistic information.

To deal with this problem, we integrated Acorda with P-log, a declarative logic programming language based on probability theory and Causal Bayes Nets.

Now, using our new system, it is easy to create models using both

causal models and Bayes Nets. The resulting system can benet from the capabilities of the original Acorda for generating scenaria, and is equipped with probabilistic theory as a medium to handle uncertain events.

Using

probability theory and utility functions, the new Acorda is now more powerful in managing quantitative as well as qualitative a priori and a posteriori preferences. Often we face situations when new information is available than can be added to our model in order to perform our reasoning better.

Our new

Acorda also makes it possible to perform a simulation and afterwards add more information directly to the agent. It can also perform several steps of prospection in order to predict the future better. For better understanding, we presented several daily life examples that the system can help people to reason about.

54

5.2

Future Work

When looking ahead a number of steps into the future, the agent is confronted with the problem of having several dierent possible courses of evolution. It needs to be able to prefer amongst them to determine the best courses of evolution from its present state (from any state in general). The (local) preferences, such as the a priori and a posteriori ones presented above are not appropriate enough anymore. The agent should be able to prefer amongst evolutions by their available historical information as well as by quantitatively or qualitatively evaluating their consequences. Sato proposes the notion of logic programs with distribution semantics, which he refers to as PRISM as a short form for PRogramming In Statistical Modeling [32]. P-log and PRISM share a substantial number of common features.

Both are declarative languages capable of representing and rea-

soning with logical and probabilistic knowledge. In both cases logical part of the language is rooted in logic programming. There are also substantial dierences. PRISM allows innite possible worlds, has the ability for statistical parameters learning embedded in its inference mechanism, but limits its logical power to Horn logic programs. The goal of P-log designers was to develop a knowledge representation language allowing natural, elaboration tolerant representation of commonsense knowledge involving logic and probabilities. Innite possible worlds and algorithms for statistical learning were not a priority. Instead the emphasis was on greater logical power provided by Answer Set Programming, on causal interpretation of probability, and on the ability to perform and dierentiate between various types of updates. We can use PRISM ideas to expand the semantics of P-log to allow innite possible worlds. In Section 4.1 we show the analysis of risk behaviour under general risk preferences under the expected utility model.

However, applying this ap-

proach to decision-making under uncertainty requires having good information about the measurement of the probability disribution of

x

and the risk

preferences of the decision-maker as represented by utility function

U (x).

is easier to obtain sample information about the probability distributio of

It

x

than about individual risk preferences. It is possible to conduct risk analysis without precise information about risk prefrences using stocastic dominance. Stochastic dominance provdes a framework to rank choices among alternative risky strategies when preference are not precisely known [33]. It sees the elimination of inferior choices without strong a priori information about risk preferences. For future work, we can extend P-log to handle stochastic processes.

55

Bibliography [1] William H. Calvin. How Brains Think: Evolving Intelligence, Then And

Now. Basic Books, 1996. [2] R.A. Wilson and F.C. Keil, editors. The MIT Encyclopedia of the Cog-

nitive Sciences. The MIT Press, 1999. [3] António Damásio.

Looking for Spinoza: Joy, Sorrow and the Feeling

Brain. Harcourt, 2003. [4] Merlin Donald. A Mind So Rare: The Evolution of Human Conscious-

ness. W. W. Norton & Company, London, 2001. [5] Daniel C. Dennett. Sweet Dreams: Philosophical Obstacles to a Science

of Consciousness. The MIT Press, 2005. [6] Drew V. McDermott. Mind and Mechanism. The MIT Press, 2001. [7] David Hume. An Enquiry Concerning Human Understanding: A Criti-

cal Edition. Oxford Philosophical Texts, 1748. [8] John Mackie. The Cement of the Universe. Oxford: Clarendon Press, 1974. [9] Herbert A. Simon and Nicholas Rescher.

Cause and counterfactual.

Philosophy of Science, Vol. 33, No. 4, pages 323340, December 1966. [10] David Lewis.

Causation and postcripts to causation.

Philosophical

Papers, Vol. II. Oxford: Oxford Univeristy Press, pages 172213, 1986. [11] P. Dell'Acqua and L. M. Pereira. Preferential theory revision (extended version). J. Applied Logic, 2007. [12] L. M. Pereira and A. M. Pinto. Revised stable models - a semantics for logic programs. In 12th Portuguese Intl. Conf. on Articial Intelligence

(EPIA'05), LNAI 3808, pages 2942, Covilhã, December 2005. Springer. [13] J. J. Alferes, A. Brogi, J. A. Leite, and L. M. Pereira. Evolving logic programs.

In S. Flesca, S. Greco, N. Leone, and G. Ianni, editors,

56

Procs. of the 8th European Conf. on Logics in Articial Intelligence (JELIA'02), LNCS 2424, pages 5061, Cosenza, Italy, September 2002. Springer. [14] L. M. Pereira and G. Lopes.

Prospective logic agents.

Procs. 13th

Portuguese Intl.Conf. on Articial Intelligence, pages 7386, December 2007. [15] A. Kakas, R. Kowalski, and F. Toni.

The role of abduction in logic

Handbook of Logic in Articial Intelligence and Logic

programming.

Programing, volume 5, pages 235324, 1998. [16] R. Kowalski.

The logical way to be articially intelligent.

Procs. of

CLIMA VI, LNAI, page 122, 2006. [17] Chitta Baral, Michael Gelfond, and Nelson Rushton. Probabilistic reasoning with answer sets. In LPNMR7, pages 2133, 2004. [18] Chitta Baral, Michael Gelfond, and Nelson Rushton. Probabilistic reasoning with answer sets. journal draft in Theory and Practice of Logic Programming, 2005. [19] Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000. [20] L. Castro, T. Swift, and D. S. Warren. ming with xsb and smodels.

Xasp: Answer set program-

Accessed at

net/packages/xasp.pdf.

http://xsb.sourceforge.

[21] Han The Anh, Carroline D. P. Kencana Ramli, and Carlos Viegas Damasio. An implementation of extended p-log using xasp. In Proceedings of

24th International Conference on Logic Programming, 2008. [22] M. Gelfond, N. Rushton, and W.Zhu. Combining logical and probabilistic reasoning. In AAAI Spring Symposium, 2006. [23] Terrance Swift.

Tabling for non-monotonic programming.

Annals of

Mathematics and Articial Intelligence, 25(3-4):201240, 1999. [24] Ilkka Niemelä and Patrik Simons. Smodels - an implementation of the stable model and well-founded semantics for normal lp. In LPNMR '97:

Proceedings of the 4th International Conference on Logic Programming and Nonmonotonic Reasoning, pages 421430, 1997. [25] Jonathan Baron. Thinking and Deciding. Cambridge University Press, 2000. [26] Jean-Paul Chavas.

Risk Analysis in Theory and Practice.

Press, 2004.

57

Academic

[27] Ruth M. J. Byrne. The Rational Imagination: How People Create Al-

ternatives to Reality. The MIT Press, 2005. [28] Niall Ferguson. Virtual History: Alternatives and Counterfactuals. Basic Books, 2000. [29] Steven Sloman. Causal Models: How People Think About the World and

its Alternatives: How People Think About the World and Its Alternatives. OUP USA, 2005. [30] C. Aitken.

Lies, damned lies and expert witnesses.

In Mathematics

Today, 32(5/6), pages 7680, 1996. [31] Norman Fenton and Martin Neil.

The jury observation fallacy and

the use of bayesian networks to present probabilistic legal arguments. In Mathematics Today, 36(6), pages 180187, 2000. [32] Taisuke Sato.

A statistical learning method for logic programs with

distribution semantics. In In Proceedings of the 12th International Con-

ference on Logic Programming (ICLP95, pages 715729. MIT Press, 1995. [33] G. A. Whitmore and M. C. Findlay.

Stochastic Dominance: An Ap-

proach to Decision-Making Under Risk. Lexington Books, D.C. Heath and Co, Lexington, MA, 1978.

58

Causal Modelling and Probabilistic Causation in ...

... freely available at: http://xsb.sourceforge.net and http://www.tcs.hut.fi/Software/smodels .... pected to be a suspect if he or she is one of the drivers of a cab company which is .... pa(rw, wetgrass(t), d_(99, 100)) :- sprinkler(t), rain(t). 36.endPr.

Download PDF

604KB Sizes 1 Downloads 235 Views

Report

Causal Modelling and Probabilistic Causation in ...

Recommend Documents