Learning Context Conditions for BDI Plan Selection 1
Dhirendra Singh
1
1
Sebastian Sardina
Lin Padgham
2
Stéphane Airiau
1School
of Computer Science & Information Technology, RMIT University, Australia 2Institute for Logic, Language and Computation, University of Amsterdam, The Netherlands
Summary
Experimentation
Learning Task
We address the plan selection problem in Belief, Desire, Intentions (BDI) Agent Systems.
G
Context conditions of plans determine applicability in given situations, and must be specified upfront. However, new environments often require learning changes to selection conditions. Easing this constraint would allow conditions to be refined once deployed, improving adaptability. Our learning framework augments plan’s context conditions with decision trees, allowing plan applicability to be learnt from experience. Using a probabilistic plan selection function, the agent balances exploration and exploitation of plans, while learning online.
We study the impact of goal-plan structures on learning performance. We use synthetic hierarchies that model some features of real BDI programs.
1
...
P1
? ...
Pi
GA
Pn
2
5
PA
PB ? ×
GA1
GB1
4
√
×
GA2
3
×
How to record training set: We compare two approaches, a conservative one (BUL) that only records failures when all plan choices are considered well-informed, and an aggressive one (ACL) that records all outcomes.
GB
GB2
6
√
×
7
√
×
PB2 √
0 PB2
×
1. The imposed BDI hierarchy implies that high level plans may fail not because they were poor choices for the situation but due to poor choices further below. 2. Learning is performed online while acting in the environment, so care must be taken in how much confidence to put in each decision tree on an ongoing basis.
BDI Architecture A plan is a rule e : ψ ← δ; program δ is a strategy for goal e when condition ψ holds. The burden for the programmer is to perfectly design the logical formula ψ.
BDI Learning Framework Each plan’s logical formula context condition is augmented with a decision tree. A probabilistic plan selection function balances exploitation of ongoing decision tree learning and further exploration of the state space.
events Pending Events
Record outcomes for chosen plans to train decision trees
Beliefs BDI engine
Plan library
Probablistically select plans based on ongoing learning
dynamic static Intention Stacks actions
Acting and learning are interleaved. Ongoing learning impacts the choice of future actions that impact subsequent learning and whether a good solution is eventually found.
How to use decision trees: A confidence measure is applied to the decision tree prediction to calculate plan selection weights. Confidence is related to the coverage of paths below a plan. Success 1.0 0.8 0.6 0.4 0.2 0.0
T1
500
Success 1.0 0.8 0.6 0.4 0.2 0.0 500 Success 1.0 0.8 0.6 0.4 0.2 0.0 1000 Success
T2
1500
Iterations 2500
T3
2500
1.00 0.75 0.50 0.25 0.00
Iterations 4000
T 2, 20%
500 Plans perform primitive actions or post subgoals to be handled in a hierarchical manner.
1000
Iterations 1500
1500
Iterations 2500
Results comparing ACL+coverage (crosses) and BUL (circles) for various goal-plan hierarchies.
D. Singh, S. Sardina, L. Padgham, S. Airiau, Learning context conditions for BDI plan selection. In Proceedings of Autonomous Agents and Multi-Agent Systems (AAMAS), Toronto, Canada, 2010.