Integrating Learning into a BDI Agent for Environments with Changing Dynamics Dhirendra Singh1

Sebastian Sardina1 1 2

Lin Padgham1

Geoff James2

RMIT University, Melbourne, Australia

CSIRO Energy Technology, Sydney, Australia

International Joint Conference on Artificial Intelligence July 2011, Barcelona, Spain

Belief-Desire-Intention (BDI) Agent Architecture SENSORS

Events

Pending Events

Environment

Beliefs

BDI Engine

Intention Stacks

ACTUATORS

Dynamic

Plan Library

Static

Actions

A plan is a programmed recipe for achieving a goal in some situation. We wish to improve plan selection in situations, based on actual experience. Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

1 / 12

Belief-Desire-Intention (BDI) Agent Architecture SENSORS

Events

Pending Events

Environment

Beliefs

BDI Engine

Intention Stacks

ACTUATORS

Dynamic

Plan Library

Static

Actions

A plan is a programmed recipe for achieving a goal in some situation. We wish to improve plan selection in situations, based on actual experience. Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

1 / 12

Plan Selection in a BDI Goal-Plan Hierarchy P1 G2

G4

P2

P4 ×

×

G3

×

× G6

G5 P3 √

×

P5 √

×

P6 √

×

×

For plan P1 to succeed, several correct choices must be made.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

2 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

1

Execute and record outcome

For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.

2

Select a plan probabilistically using selection weights.

3

Execute the plan (hierarchy) and record the outcome(s).

4

Update the plan’s decision tree.

5

Repeat.

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

1

Execute and record outcome

For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.

2

Select a plan probabilistically using selection weights.

3

Execute the plan (hierarchy) and record the outcome(s).

4

Update the plan’s decision tree.

5

Repeat.

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

1

Execute and record outcome

For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.

2

Select a plan probabilistically using selection weights.

3

Execute the plan (hierarchy) and record the outcome(s).

4

Update the plan’s decision tree.

5

Repeat.

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

1

Execute and record outcome

For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.

2

Select a plan probabilistically using selection weights.

3

Execute the plan (hierarchy) and record the outcome(s).

4

Update the plan’s decision tree.

5

Repeat.

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

1

Execute and record outcome

For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.

2

Select a plan probabilistically using selection weights.

3

Execute the plan (hierarchy) and record the outcome(s).

4

Update the plan’s decision tree.

5

Repeat.

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data



• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data



• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data



• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data



• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data



• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data



• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data



• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data



• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

Need a quantitative measure of confidence in the current learning.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

A Dynamic Confidence Measure

We build confidence from observed performance of a plan using:

P G1

• how well-informed were the recent

G2

decisions, or stability-based measure • how well we know the worlds we are

Pa ×

Pb √

Pc √

Pd

Pe

witnessing, or world-based measure

×

×

• an averaging window n and preference

bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

4 / 12

A Dynamic Confidence Measure

We build confidence from observed performance of a plan using:

P G1

• how well-informed were the recent

G2

decisions, or stability-based measure • how well we know the worlds we are

Pa ×

Pb √

Pc √

Pd

Pe

witnessing, or world-based measure

×

×

• an averaging window n and preference

bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

4 / 12

A Dynamic Confidence Measure

We build confidence from observed performance of a plan using:

P G1

• how well-informed were the recent

G2

decisions, or stability-based measure • how well we know the worlds we are

Pa ×

Pb √

Pc √

Pd

Pe

witnessing, or world-based measure

×

×

• an averaging window n and preference

bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

4 / 12

A Dynamic Confidence Measure

We build confidence from observed performance of a plan using:

P G1

• how well-informed were the recent

G2

decisions, or stability-based measure • how well we know the worlds we are

Pa ×

Pb √

Pc √

Pd

Pe

witnessing, or world-based measure

×

×

• an averaging window n and preference

bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

4 / 12

A Dynamic Confidence Measure

We build confidence from observed performance of a plan using:

P G1

• how well-informed were the recent

G2

decisions, or stability-based measure • how well we know the worlds we are

Pa ×

Pb √

Pc √

Pd

Pe

witnessing, or world-based measure

×

×

• an averaging window n and preference

bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

4 / 12

Example: Dynamic Confidence Measure

P G1 Pa ×

Confidence

1 G2

Pb √

Pc √

Pd

Pe

×

×

Solution found at E=10 and full confidence at E=15.

0.5

0 0

5

10 Executions

15

α = 0.5, n = 5

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

5 / 12

Example: Dynamic Confidence Measure

P G1 Pa ×

What if the environment changes after we have learnt the solution?

G2 Pb √

Pc

Pd

×

×

Singh et al. (RMIT & CSIRO)

Pe √

Say after execution 15, plan Pc no longer works for resolving goal G2 , but plan Pe does.

BDI Learning in Changing Environments

IJCAI 2011

5 / 12

Example: Dynamic Confidence Measure

P G1 Pa ×

Confidence

1 G2

Pb √

Pc

Pd

×

×

Pe √

After E=15, Pc starts to fail. The confidence drops, promoting new exploration and re-learning.

0.5

0 0

5

10 15 20 Executions

25

α = 0.5, n = 5

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

5 / 12

Consumption

A Battery Storage Application

ph

0

t1 Demand

Time Battery

t2 Grid Supply

Given net building demand, calculate an appropriate battery response in order to maintain grid power consumption within range [0, ph ].

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

6 / 12

Design: A Battery Storage Application

G(k) Set module k to Charge

Set module k to Discharge

G(k-1)

G(k-1)

Charge module 5

Disconnect module 4

Disconnect module k

Operate & evaluate

G(k-1)

...

Charge module 1

Operate battery

Aim: Learn appropriate plan selection to achieve a desired battery response rate, given the current battery state. State space for a battery with five modules is ≈13 million. Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

7 / 12

Experiment: Capacity Deterioration 1

Success

0.95 0.9 0.85 0.8 0.75 0

5k

10k

15k 20k Episodes

25k

30k

35k

Recovery from deterioration in module capacities at 5k episodes.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

8 / 12

Experiment: Partial Failure with Restoration 1

Success

0.9 0.8 0.7 0.6 0

5k

10k 15k 20k 25k 30k 35k 40k Episodes

Recovery from temporary module failures during [0, 20k], [20k, 40k] episodes.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

9 / 12

Experiment: Complete Failure with Restoration 1

Success

0.8 0.6 0.4 0.2 0 0

5k

10k Episodes

15k

20k

Recovery from complete system failure during [0, 5k] episodes.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

10 / 12

Limitations and Future Work

• We cannot account for inter-dependence between subgoals of a

higher-level plan. • Maintaining full training data is not practical. It becomes obsolete when

the environment changes. We tried an arbitrary scheme to filter “old” data in the battery agent and reduced the data set by 75% with no loss in performance. • Onus is on the designer to select appropriate state representation and

learning parameters. Some of this could be automatically extracted from the BDI program.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

11 / 12

Limitations and Future Work

• We cannot account for inter-dependence between subgoals of a

higher-level plan. • Maintaining full training data is not practical. It becomes obsolete when

the environment changes. We tried an arbitrary scheme to filter “old” data in the battery agent and reduced the data set by 75% with no loss in performance. • Onus is on the designer to select appropriate state representation and

learning parameters. Some of this could be automatically extracted from the BDI program.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

11 / 12

Limitations and Future Work

• We cannot account for inter-dependence between subgoals of a

higher-level plan. • Maintaining full training data is not practical. It becomes obsolete when

the environment changes. We tried an arbitrary scheme to filter “old” data in the battery agent and reduced the data set by 75% with no loss in performance. • Onus is on the designer to select appropriate state representation and

learning parameters. Some of this could be automatically extracted from the BDI program.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

11 / 12

Summary

• Learning BDI plan selection is desirable when plan applicability cannot be

easily programmed or when the environment changes over time. • We present a learning framework for additionally determining a plan’s

applicability conditions using decision trees. Plans are selected probabilistically based on their predicted likelihood of success and our perceived confidence in the learning. • We evaluate the framework empirically in a storage domain where a

battery controller must continually adapt to changes that cause previous solutions to become ineffective.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

12 / 12

Summary

• Learning BDI plan selection is desirable when plan applicability cannot be

easily programmed or when the environment changes over time. • We present a learning framework for additionally determining a plan’s

applicability conditions using decision trees. Plans are selected probabilistically based on their predicted likelihood of success and our perceived confidence in the learning. • We evaluate the framework empirically in a storage domain where a

battery controller must continually adapt to changes that cause previous solutions to become ineffective.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

12 / 12

Summary

• Learning BDI plan selection is desirable when plan applicability cannot be

easily programmed or when the environment changes over time. • We present a learning framework for additionally determining a plan’s

applicability conditions using decision trees. Plans are selected probabilistically based on their predicted likelihood of success and our perceived confidence in the learning. • We evaluate the framework empirically in a storage domain where a

battery controller must continually adapt to changes that cause previous solutions to become ineffective.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

12 / 12

Summary

• Learning BDI plan selection is desirable when plan applicability cannot be

easily programmed or when the environment changes over time. • We present a learning framework for additionally determining a plan’s

applicability conditions using decision trees. Plans are selected probabilistically based on their predicted likelihood of success and our perceived confidence in the learning. • We evaluate the framework empirically in a storage domain where a

battery controller must continually adapt to changes that cause previous solutions to become ineffective.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

12 / 12

References D. Singh, S. Sardina, L. Padgham. Extending BDI plan selection to incorporate learning from experience. Journal of Robotics and Autonomous Systems (RAS), 2010. D. Singh, S. Sardina, L. Padgham, and S. Airiau. Learning context conditions for BDI plan selection. Proceedings of Autonomous Agents and Multi-Agent Systems (AAMAS), 2010. S. Airiau, L. Padgham, S. Sardina, and S. Sen. Enhancing adaptation in BDI agents using learning techniques. International Journal of Agent Technologies and Systems (IJATS), 2009. Poster today in Rooms 133–134 at 16:40–17:20

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

12 / 12

Integrating Learning into a BDI Agent for Environments ...

Inconsistent data. • Non-deterministic actions. • Non-deterministic hierarchies. • Dealing with failure recovery. • Changing environment dynamics. Singh et al.

227KB Sizes 1 Downloads 180 Views

Recommend Documents

A BDI Agent Programming Language with Failure ...
Department of Computer Science & Information Technology .... As a consequence, achievement event-goals enjoy, by default, a certain degree of commitment ...

Learning Context Conditions for BDI Plan Selection
1School of Computer Science & Information Technology. RMIT University, Australia. 2Institute for Logic, Language and Computation. University of Amsterdam ...

A BDI agent system for the cow herding domain
corresponding (new ECLIPSE-based) PDT [6, 9] design tool—both developed ...... and laptop machines, with all agents running in the same hardware (a runtime.

Learning Context Conditions for BDI Plan Selection
plex and dynamic environments with (soft) real-time reasoning and control requirements [2, 7]. A BDI-style agent system consists, ba- sically, of a belief base (the ...

Hierarchical Planning in BDI Agent Programming ...
BDI agent systems have emerged as one of the most widely used approaches to implementing intelligent behaviour in complex dynamic domains, in addition to ...

Reinforcement Learning as a Context for Integrating AI ...
placing it at a low level would provide maximum flexibility in simulations. Furthermore ... long term achievement of values. It is important that powerful artificial ...

Transforming clinic environments into information ...
For example, by re-designing patient-care rooms and installing ..... strategically-placed interactive displays to integrate data .... Visualizing health practice.

Integrating human / robot interaction into robot control architectures for ...
architectures for defense applications. Delphine Dufourda and ..... focusses upon platform development, teleoperation and mission modules. Part of this program ...

Integrating Annotation Tools into UIMA for ... - Semantic Scholar
Garside, A. M. McEnery and A. Wilson. 2006. A large semantic lexicon for corpus annotation. In Proceedings from the Corpus. Linguistics Conference Series ...

Integrating Screening and Referrals for Gender-based Violence into ...
Nov 2, 2014 - A small client card, the size of a business card, which outlined the basic elements of a safety plan on one .... when screening best fits into the standard clinical processes ... obtained from Woman Inc. via their Facebook page:.

Integrating Annotation Tools into UIMA for Interoperability - BOOTSTrep
A practical and flexible approach to solve such problems is required. 1. Introduction. Over the .... levels, including the data level, programming model level .... 3.2 Solutions to integration of different annotation .... SDK (currently Java), or the

Integrating Annotation Tools into UIMA for Interoperability
need to interchange data and resources (such as annotated corpora), this is more easily ..... electronic lexical database. Cambridge, MA: MIT Press. Ferrucci, D.