Dhirendra Singh Sebastian Sardina Lin Padgham ...

Viewer
Transcript

Learning Context Conditions for BDI Plan Selection 1

Dhirendra Singh

1

1

Sebastian Sardina

Lin Padgham

2

Stéphane Airiau

1School

of Computer Science & Information Technology, RMIT University, Australia 2Institute for Logic, Language and Computation, University of Amsterdam, The Netherlands

Summary

Experimentation

Learning Task

We address the plan selection problem in Belief, Desire, Intentions (BDI) Agent Systems.

G

Context conditions of plans determine applicability in given situations, and must be specified upfront. However, new environments often require learning changes to selection conditions. Easing this constraint would allow conditions to be refined once deployed, improving adaptability. Our learning framework augments plan’s context conditions with decision trees, allowing plan applicability to be learnt from experience. Using a probabilistic plan selection function, the agent balances exploration and exploitation of plans, while learning online.

We study the impact of goal-plan structures on learning performance. We use synthetic hierarchies that model some features of real BDI programs.

1

...

P1

? ...

Pi

GA

Pn

2

5

PA

PB ? ×

GA1

GB1

4

√

×

GA2

3

×

How to record training set: We compare two approaches, a conservative one (BUL) that only records failures when all plan choices are considered well-informed, and an aggressive one (ACL) that records all outcomes.

GB

GB2

6

√

×

7

√

×

PB2 √

0 PB2

×

1. The imposed BDI hierarchy implies that high level plans may fail not because they were poor choices for the situation but due to poor choices further below. 2. Learning is performed online while acting in the environment, so care must be taken in how much confidence to put in each decision tree on an ongoing basis.

BDI Architecture A plan is a rule e : ψ ← δ; program δ is a strategy for goal e when condition ψ holds. The burden for the programmer is to perfectly design the logical formula ψ.

BDI Learning Framework Each plan’s logical formula context condition is augmented with a decision tree. A probabilistic plan selection function balances exploitation of ongoing decision tree learning and further exploration of the state space.

events Pending Events

Record outcomes for chosen plans to train decision trees

Beliefs BDI engine

Plan library

Probablistically select plans based on ongoing learning

dynamic static Intention Stacks actions

Acting and learning are interleaved. Ongoing learning impacts the choice of future actions that impact subsequent learning and whether a good solution is eventually found.

How to use decision trees: A confidence measure is applied to the decision tree prediction to calculate plan selection weights. Confidence is related to the coverage of paths below a plan. Success 1.0 0.8 0.6 0.4 0.2 0.0

T1

500

Success 1.0 0.8 0.6 0.4 0.2 0.0 500 Success 1.0 0.8 0.6 0.4 0.2 0.0 1000 Success

T2

1500

Iterations 2500

T3

2500

1.00 0.75 0.50 0.25 0.00

Iterations 4000

T 2, 20%

500 Plans perform primitive actions or post subgoals to be handled in a hierarchical manner.

1000

Iterations 1500

1500

Iterations 2500

Results comparing ACL+coverage (crosses) and BUL (circles) for various goal-plan hierarchies.

D. Singh, S. Sardina, L. Padgham, S. Airiau, Learning context conditions for BDI plan selection. In Proceedings of Autonomous Agents and Multi-Agent Systems (AAMAS), Toronto, Canada, 2010.