Integrating Learning into a BDI Agent for Environments ...

Viewer
Transcript

Integrating Learning into a BDI Agent for Environments with Changing Dynamics Dhirendra Singh1

Sebastian Sardina1 1 2

Lin Padgham1

Geoff James2

RMIT University, Melbourne, Australia

CSIRO Energy Technology, Sydney, Australia

International Joint Conference on Artificial Intelligence July 2011, Barcelona, Spain

Belief-Desire-Intention (BDI) Agent Architecture SENSORS

Events

Pending Events

Environment

Beliefs

BDI Engine

Intention Stacks

ACTUATORS

Dynamic

Plan Library

Static

Actions

A plan is a programmed recipe for achieving a goal in some situation. We wish to improve plan selection in situations, based on actual experience. Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

1 / 12

Belief-Desire-Intention (BDI) Agent Architecture SENSORS

Events

Pending Events

Environment

Beliefs

BDI Engine

Intention Stacks

ACTUATORS

Dynamic

Plan Library

Static

Actions

A plan is a programmed recipe for achieving a goal in some situation. We wish to improve plan selection in situations, based on actual experience. Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

1 / 12

Plan Selection in a BDI Goal-Plan Hierarchy P1 G2

G4

P2

P4 ×

×

G3

×

× G6

G5 P3 √

×

P5 √

×

P6 √

×

×

For plan P1 to succeed, several correct choices must be made.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

2 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

1

Execute and record outcome

For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.

2

Select a plan probabilistically using selection weights.

3

Execute the plan (hierarchy) and record the outcome(s).

4

Update the plan’s decision tree.

5

Repeat.

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

1

Execute and record outcome

For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.

2

Select a plan probabilistically using selection weights.

3

Execute the plan (hierarchy) and record the outcome(s).

4

Update the plan’s decision tree.

5

Repeat.

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

1

Execute and record outcome

For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.

2

Select a plan probabilistically using selection weights.

3

Execute the plan (hierarchy) and record the outcome(s).

4

Update the plan’s decision tree.

5

Repeat.

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

1

Execute and record outcome

For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.

2

Select a plan probabilistically using selection weights.

3

Execute the plan (hierarchy) and record the outcome(s).

4

Update the plan’s decision tree.

5

Repeat.

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

1

Execute and record outcome

For every plan whose programmed applicability condition holds, calculate a selection weight based on perceived likelihood of success.

2

Select a plan probabilistically using selection weights.

3

Execute the plan (hierarchy) and record the outcome(s).

4

Update the plan’s decision tree.

5

Repeat.

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data

√

• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data

√

• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data

√

• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data

√

• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data

√

• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data

√

• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

Singh et al. (RMIT & CSIRO)

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data

√

• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

The Learning Framework Augment a decision tree per plan. State representation includes world features, event parameters, and context variables. Select plan probabilistically

Execute and record outcome

Update plan’s decision tree

wa wb ... ... wb ... ... wa ... ...

× × • Incomplete data

√

• Inconsistent data • Non-deterministic actions

• Non-deterministic hierarchies √ • Dealing with failure recovery • Changing environment dynamics

Need a quantitative measure of confidence in the current learning.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

3 / 12

A Dynamic Confidence Measure

We build confidence from observed performance of a plan using:

P G1

• how well-informed were the recent

G2

decisions, or stability-based measure • how well we know the worlds we are

Pa ×

Pb √

Pc √

Pd

Pe

witnessing, or world-based measure

×

×

• an averaging window n and preference

bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

4 / 12

A Dynamic Confidence Measure

We build confidence from observed performance of a plan using:

P G1

• how well-informed were the recent

G2

decisions, or stability-based measure • how well we know the worlds we are

Pa ×

Pb √

Pc √

Pd

Pe

witnessing, or world-based measure

×

×

• an averaging window n and preference

bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

4 / 12

A Dynamic Confidence Measure

We build confidence from observed performance of a plan using:

P G1

• how well-informed were the recent

G2

decisions, or stability-based measure • how well we know the worlds we are

Pa ×

Pb √

Pc √

Pd

Pe

witnessing, or world-based measure

×

×

• an averaging window n and preference

bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

4 / 12

A Dynamic Confidence Measure

We build confidence from observed performance of a plan using:

P G1

• how well-informed were the recent

G2

decisions, or stability-based measure • how well we know the worlds we are

Pa ×

Pb √

Pc √

Pd

Pe

witnessing, or world-based measure

×

×

• an averaging window n and preference

bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

4 / 12

A Dynamic Confidence Measure

We build confidence from observed performance of a plan using:

P G1

• how well-informed were the recent

G2

decisions, or stability-based measure • how well we know the worlds we are

Pa ×

Pb √

Pc √

Pd

Pe

witnessing, or world-based measure

×

×

• an averaging window n and preference

bias α Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success together with the confidence measure.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

4 / 12

Example: Dynamic Confidence Measure

P G1 Pa ×

Confidence

1 G2

Pb √

Pc √

Pd

Pe

×

×

Solution found at E=10 and full confidence at E=15.

0.5

0 0

5

10 Executions

15

α = 0.5, n = 5

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

5 / 12

Example: Dynamic Confidence Measure

P G1 Pa ×

What if the environment changes after we have learnt the solution?

G2 Pb √

Pc

Pd

×

×

Singh et al. (RMIT & CSIRO)

Pe √

Say after execution 15, plan Pc no longer works for resolving goal G2 , but plan Pe does.

BDI Learning in Changing Environments

IJCAI 2011

5 / 12

Example: Dynamic Confidence Measure

P G1 Pa ×

Confidence

1 G2

Pb √

Pc

Pd

×

×

Pe √

After E=15, Pc starts to fail. The confidence drops, promoting new exploration and re-learning.

0.5

0 0

5

10 15 20 Executions

25

α = 0.5, n = 5

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

5 / 12

Consumption

A Battery Storage Application

ph

0

t1 Demand

Time Battery

t2 Grid Supply

Given net building demand, calculate an appropriate battery response in order to maintain grid power consumption within range [0, ph ].

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

6 / 12

Design: A Battery Storage Application

G(k) Set module k to Charge

Set module k to Discharge

G(k-1)

G(k-1)

Charge module 5

Disconnect module 4

Disconnect module k

Operate & evaluate

G(k-1)

...

Charge module 1

Operate battery

Aim: Learn appropriate plan selection to achieve a desired battery response rate, given the current battery state. State space for a battery with five modules is ≈13 million. Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

7 / 12

Experiment: Capacity Deterioration 1

Success

0.95 0.9 0.85 0.8 0.75 0

5k

10k

15k 20k Episodes

25k

30k

35k

Recovery from deterioration in module capacities at 5k episodes.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

8 / 12

Experiment: Partial Failure with Restoration 1

Success

0.9 0.8 0.7 0.6 0

5k

10k 15k 20k 25k 30k 35k 40k Episodes

Recovery from temporary module failures during [0, 20k], [20k, 40k] episodes.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

9 / 12

Experiment: Complete Failure with Restoration 1

Success

0.8 0.6 0.4 0.2 0 0

5k

10k Episodes

15k

20k

Recovery from complete system failure during [0, 5k] episodes.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

10 / 12

Limitations and Future Work

• We cannot account for inter-dependence between subgoals of a

higher-level plan. • Maintaining full training data is not practical. It becomes obsolete when

the environment changes. We tried an arbitrary scheme to filter “old” data in the battery agent and reduced the data set by 75% with no loss in performance. • Onus is on the designer to select appropriate state representation and

learning parameters. Some of this could be automatically extracted from the BDI program.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

11 / 12

Limitations and Future Work

• We cannot account for inter-dependence between subgoals of a

higher-level plan. • Maintaining full training data is not practical. It becomes obsolete when

the environment changes. We tried an arbitrary scheme to filter “old” data in the battery agent and reduced the data set by 75% with no loss in performance. • Onus is on the designer to select appropriate state representation and

learning parameters. Some of this could be automatically extracted from the BDI program.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

11 / 12

Limitations and Future Work

• We cannot account for inter-dependence between subgoals of a

higher-level plan. • Maintaining full training data is not practical. It becomes obsolete when

the environment changes. We tried an arbitrary scheme to filter “old” data in the battery agent and reduced the data set by 75% with no loss in performance. • Onus is on the designer to select appropriate state representation and

learning parameters. Some of this could be automatically extracted from the BDI program.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

11 / 12

Summary

• Learning BDI plan selection is desirable when plan applicability cannot be

easily programmed or when the environment changes over time. • We present a learning framework for additionally determining a plan’s

applicability conditions using decision trees. Plans are selected probabilistically based on their predicted likelihood of success and our perceived confidence in the learning. • We evaluate the framework empirically in a storage domain where a

battery controller must continually adapt to changes that cause previous solutions to become ineffective.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

12 / 12

Summary

• Learning BDI plan selection is desirable when plan applicability cannot be

easily programmed or when the environment changes over time. • We present a learning framework for additionally determining a plan’s

applicability conditions using decision trees. Plans are selected probabilistically based on their predicted likelihood of success and our perceived confidence in the learning. • We evaluate the framework empirically in a storage domain where a

battery controller must continually adapt to changes that cause previous solutions to become ineffective.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

12 / 12

Summary

• Learning BDI plan selection is desirable when plan applicability cannot be

easily programmed or when the environment changes over time. • We present a learning framework for additionally determining a plan’s

applicability conditions using decision trees. Plans are selected probabilistically based on their predicted likelihood of success and our perceived confidence in the learning. • We evaluate the framework empirically in a storage domain where a

battery controller must continually adapt to changes that cause previous solutions to become ineffective.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

12 / 12

Summary

• Learning BDI plan selection is desirable when plan applicability cannot be

easily programmed or when the environment changes over time. • We present a learning framework for additionally determining a plan’s

applicability conditions using decision trees. Plans are selected probabilistically based on their predicted likelihood of success and our perceived confidence in the learning. • We evaluate the framework empirically in a storage domain where a

battery controller must continually adapt to changes that cause previous solutions to become ineffective.

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

12 / 12

References D. Singh, S. Sardina, L. Padgham. Extending BDI plan selection to incorporate learning from experience. Journal of Robotics and Autonomous Systems (RAS), 2010. D. Singh, S. Sardina, L. Padgham, and S. Airiau. Learning context conditions for BDI plan selection. Proceedings of Autonomous Agents and Multi-Agent Systems (AAMAS), 2010. S. Airiau, L. Padgham, S. Sardina, and S. Sen. Enhancing adaptation in BDI agents using learning techniques. International Journal of Agent Technologies and Systems (IJATS), 2009. Poster today in Rooms 133–134 at 16:40–17:20

Singh et al. (RMIT & CSIRO)

BDI Learning in Changing Environments

IJCAI 2011

12 / 12

A BDI Agent Programming Language with Failure ...