Integrating Learning into a BDI Agent for Environments with Changing Dynamics 1
1
Dhirendra Singh
Sebastian Sardina
1
Lin Padgham
2
Geoff James
1RMIT
University, Melbourne, Australia 2CSIRO Energy Technology, Sydney, Australia
Summary
BDI Learning Framework
This paper extends our earlier work integrating learning to improve plan selection in the popular Belief, Desire, Intentions (BDI) agent paradigm.
Our learning framework augments plan’s context conditions with decision trees, allowing plan applicability to be learnt from experience.
Here we address the problem that learning in deployed agents must be continuous rather than a one-off process.
Using a probabilistic plan selection function, the agent balances exploration and exploitation of plans.
Our main contribution is a novel confidence measure which allows the agent to adjust its reliance on the learning dynamically, facilitating in principle infinitely many (re)learning phases.
Record outcomes for chosen plans to train decision trees
We demonstrate the benefits of the approach in an example battery controller for energy management.
A building with local generation and loads is to restrict power consumption to a set range, using a modular battery system that can be charged or discharged as needed.
Probablistically select plans based on ongoing learning Acting and learning are interleaved in an online manner, i.e., current learning influences ongoing choices that impact subsequent learning.
Confidence in Learning BDI Architecture A plan is a rule e : ψ ← δ; program δ is a strategy for goal e when context condition ψ holds. Plans may perform primitive actions or post subgoals that are handled in a hierarchical manner.
We build confidence from observed performance of a plan by evaluating how well-informed were the recent decisions, or stability-based measure, and how well we know the worlds we are witnessing, or world-based measure. Plan selection weight, that dictates exploration, is then calculated using the predicted likelihood of success and the dynamic confidence measure.
events
Modular Battery Controller
A programmed solution is not ideal since battery performance is susceptible to change over time. We design a learning BDI controller that works to initial specification but also adapts to ongoing changes in the battery system. Scenario 1: Recovery from deterioration in module capacities at 5k episodes. 1 0.95 0.9 0.85 0.8 0.75
0
5k 10k 15k 20k 25k 30k 35k
Scenario 2: Recovery from individual module failures during [0, 20k], [20k, 40k] episodes.
P Pending Events
G1 Pa
Beliefs
Pb √
× BDI engine
G2 Pc
Pd
×
×
Pe √
Plan library
dynamic static Intention Stacks actions Traditionally, BDI agents have no learning ability, and cannot adjust to changes that cause previously successful approaches to fail.
Example: Say plan Pc no longer works for resolving goal G2 after execution 15, and plan Pe does instead. As plan Pc starts to fail, the perceived confidence (y-axis) drops, promoting new exploration and (re)learning. 1 0.8 0.6 0.4 0.2 0
1 0.9 0.8 0.7 0.6 0
3
6
9 12 15 18 21 24
20k
30k
40k
Scenario 3: Recovery from complete system failure during [0, 5k] episodes. 1 0.8 0.6 0.4 0.2 0 0
0
10k
5k
10k
15k
20k
The above experiments plot average success in configuring the battery correctly (y-axis) over the number of episodes (x-axis) for various changes in the environment dynamics.
D. Singh, S. Sardina, L. Padgham, G. James, Integrating Learning into a BDI Agent for Environments with Changing Dynamics. In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, 2011.