Dhirendra Singh Sebastian Sardina Lin Padgham Geoff ...

Viewer
Transcript

Integrating Learning into a BDI Agent for Environments with Changing Dynamics 1

1

Dhirendra Singh

Sebastian Sardina

1

Lin Padgham

2

Geoff James

1RMIT

University, Melbourne, Australia 2CSIRO Energy Technology, Sydney, Australia

Summary

BDI Learning Framework

This paper extends our earlier work integrating learning to improve plan selection in the popular Belief, Desire, Intentions (BDI) agent paradigm.

Our learning framework augments plan’s context conditions with decision trees, allowing plan applicability to be learnt from experience.

Here we address the problem that learning in deployed agents must be continuous rather than a one-off process.

Using a probabilistic plan selection function, the agent balances exploration and exploitation of plans.

Our main contribution is a novel confidence measure which allows the agent to adjust its reliance on the learning dynamically, facilitating in principle infinitely many (re)learning phases.

Record outcomes for chosen plans to train decision trees

We demonstrate the benefits of the approach in an example battery controller for energy management.

A building with local generation and loads is to restrict power consumption to a set range, using a modular battery system that can be charged or discharged as needed.

Probablistically select plans based on ongoing learning Acting and learning are interleaved in an online manner, i.e., current learning influences ongoing choices that impact subsequent learning.

Confidence in Learning BDI Architecture A plan is a rule e : ψ ← δ; program δ is a strategy for goal e when context condition ψ holds. Plans may perform primitive actions or post subgoals that are handled in a hierarchical manner.

We build confidence from observed performance of a plan by evaluating how well-informed were the recent decisions, or stability-based measure, and how well we know the worlds we are witnessing, or world-based measure. Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success and the dynamic confidence measure.

events

Modular Battery Controller

A programmed solution is not ideal since battery performance is susceptible to change over time. We design a learning BDI controller that works to initial specification but also adapts to ongoing changes in the battery system. Scenario 1: Recovery from deterioration in module capacities at 5k episodes. 1 0.95 0.9 0.85 0.8 0.75

0

5k 10k 15k 20k 25k 30k 35k

Scenario 2: Recovery from individual module failures during [0, 20k], [20k, 40k] episodes.

P Pending Events

G1 Pa

Beliefs

Pb √

× BDI engine

G2 Pc

Pd

×

×

Pe √

Plan library

dynamic static Intention Stacks actions Traditionally, BDI agents have no learning ability, and cannot adjust to changes that cause previously successful approaches to fail.

Example: Say plan Pc no longer works for resolving goal G2 after execution 15, and plan Pe does instead. As plan Pc starts to fail, the perceived confidence (y-axis) drops, promoting new exploration and (re)learning. 1 0.8 0.6 0.4 0.2 0

1 0.9 0.8 0.7 0.6 0

3

6

9 12 15 18 21 24

20k

30k

40k

Scenario 3: Recovery from complete system failure during [0, 5k] episodes. 1 0.8 0.6 0.4 0.2 0 0

0

10k

5k

10k

15k

20k

The above experiments plot average success in configuring the battery correctly (y-axis) over the number of episodes (x-axis) for various changes in the environment dynamics.

D. Singh, S. Sardina, L. Padgham, G. James, Integrating Learning into a BDI Agent for Environments with Changing Dynamics. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011.