Learning Context Conditions for BDI Plan Selection Dhirendra Singh1
1 School
Sebastian Sardina1 Stéphane Airiau2
Lin Padgham1
of Computer Science & Information Technology RMIT University, Australia
2 Institute
for Logic, Language and Computation University of Amsterdam, The Netherlands
Autonomous Agents and Multiagent Systems May 2010
Learning BDI Plan Selection SENSORS
events
Pending Events
Environment
Beliefs
BDI engine
Intention Stacks
ACTUATORS
dynamic
Plan library static
actions
Plan δ is a strategy to resolve event e whenever context ψ holds. Our focus is the plan selection problem i.e. to learn ψ.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
1 / 14
Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms
(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
2 / 14
Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms
(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
2 / 14
Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms
(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
2 / 14
Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms
(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
2 / 14
Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms
(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.
Nonetheless • Behaviours (plans) and the situations where they apply (context)
are fixed at design time. • For complex domain, it is difficult to specify complete context
conditions upfront. • Once deployed, the agent has no means to adapt to changes in
the initial environment. Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
2 / 14
Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms
(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.
Nonetheless • Behaviours (plans) and the situations where they apply (context)
are fixed at design time. • For complex domain, it is difficult to specify complete context
conditions upfront. • Once deployed, the agent has no means to adapt to changes in
the initial environment. Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
2 / 14
Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms
(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.
Nonetheless • Behaviours (plans) and the situations where they apply (context)
are fixed at design time. • For complex domain, it is difficult to specify complete context
conditions upfront. • Once deployed, the agent has no means to adapt to changes in
the initial environment. Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
2 / 14
Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms
(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.
Nonetheless • Behaviours (plans) and the situations where they apply (context)
are fixed at design time. • For complex domain, it is difficult to specify complete context
conditions upfront. • Once deployed, the agent has no means to adapt to changes in
the initial environment. Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
2 / 14
Learning From Plan Choices G 1
...
P1
...
Pi
Pn
GB
GA
5
2
PA
PB ×
GA1 3
GB1
×
GB2
6
4
√
×
GA2
√
×
7
√
×
PB2 √
0 PB2
×
Execution trace for successful resolution of goal G given world state w. Success means that all correct choices were made.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
3 / 14
Learning From Plan Choices G 1
...
P1
...
Pi ?
GA
Pn
GB 5
2
PA
PB ? ×
GA1 3
GB1
×
GB2
6
4
√
×
GA2
√
×
7
√
×
PB2 √
0 PB2
×
Possible execution trace where goal G is not resolved for w. Should non-leaf plans consider this failure meaningful?
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
3 / 14
Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as
meaningful. • BUL: Conservative approach that records failures only when
choices below are considered to be well-informed. • Success is always recorded for both approaches.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
4 / 14
Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as
meaningful. • BUL: Conservative approach that records failures only when
choices below are considered to be well-informed. • Success is always recorded for both approaches.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
4 / 14
Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as
meaningful. • BUL: Conservative approach that records failures only when
choices below are considered to be well-informed. • Success is always recorded for both approaches.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
4 / 14
Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as
meaningful. • BUL: Conservative approach that records failures only when
choices below are considered to be well-informed. • Success is always recorded for both approaches.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
4 / 14
Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as
meaningful. • BUL: Conservative approach that records failures only when
choices below are considered to be well-informed. • Success is always recorded for both approaches.
2. Using ongoing learning for plan selection • Obtain a numeric measure of confidence in the ongoing learning
output (i.e. a plan’s likelihood of success in the situation). • Use the confidence measure to adjust selection weights during
probabilistic plan selection.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
4 / 14
Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as
meaningful. • BUL: Conservative approach that records failures only when
choices below are considered to be well-informed. • Success is always recorded for both approaches.
2. Using ongoing learning for plan selection • Obtain a numeric measure of confidence in the ongoing learning
output (i.e. a plan’s likelihood of success in the situation). • Use the confidence measure to adjust selection weights during
probabilistic plan selection.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
4 / 14
Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as
meaningful. • BUL: Conservative approach that records failures only when
choices below are considered to be well-informed. • Success is always recorded for both approaches.
2. Using ongoing learning for plan selection • Obtain a numeric measure of confidence in the ongoing learning
output (i.e. a plan’s likelihood of success in the situation). • Use the confidence measure to adjust selection weights during
probabilistic plan selection.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
4 / 14
BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic
decision trees. • Select plans probabilistically based on their ongoing expectation
of success. • Learn context conditions over time by training decision trees using
success/failure outcomes under various situations.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
5 / 14
BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic
decision trees. • Select plans probabilistically based on their ongoing expectation
of success. • Learn context conditions over time by training decision trees using
success/failure outcomes under various situations.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
5 / 14
BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic
decision trees. • Select plans probabilistically based on their ongoing expectation
of success. • Learn context conditions over time by training decision trees using
success/failure outcomes under various situations.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
5 / 14
BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic
decision trees. • Select plans probabilistically based on their ongoing expectation
of success. • Learn context conditions over time by training decision trees using
success/failure outcomes under various situations.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
5 / 14
BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic
decision trees. • Select plans probabilistically based on their ongoing expectation
of success. • Learn context conditions over time by training decision trees using
success/failure outcomes under various situations.
Contributions of this paper • A more principled analysis of the work in [Airiau et al. 2009]. • Learning with applicability filtering (using thresholds to filter plans
that do not apply in a given situation).
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
5 / 14
BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic
decision trees. • Select plans probabilistically based on their ongoing expectation
of success. • Learn context conditions over time by training decision trees using
success/failure outcomes under various situations.
Contributions of this paper • A more principled analysis of the work in [Airiau et al. 2009]. • Learning with applicability filtering (using thresholds to filter plans
that do not apply in a given situation).
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
5 / 14
BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic
decision trees. • Select plans probabilistically based on their ongoing expectation
of success. • Learn context conditions over time by training decision trees using
success/failure outcomes under various situations.
Contributions of this paper • A more principled analysis of the work in [Airiau et al. 2009]. • Learning with applicability filtering (using thresholds to filter plans
that do not apply in a given situation).
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
5 / 14
Assumptions Aim is to understand the nuances of learning under different goal-plan hierarchies using a simplified setting: • Recursive/parameterised events or relational beliefsets not
addressed. • BDI failure recovery mechanism disabled during learning. • Synthetic plan library with empty initial context conditions used. • Simple account of non-determinism: successful actions have a
10% probability of failure.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
6 / 14
Assumptions Aim is to understand the nuances of learning under different goal-plan hierarchies using a simplified setting: • Recursive/parameterised events or relational beliefsets not
addressed. • BDI failure recovery mechanism disabled during learning. • Synthetic plan library with empty initial context conditions used. • Simple account of non-determinism: successful actions have a
10% probability of failure.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
6 / 14
Assumptions Aim is to understand the nuances of learning under different goal-plan hierarchies using a simplified setting: • Recursive/parameterised events or relational beliefsets not
addressed. • BDI failure recovery mechanism disabled during learning. • Synthetic plan library with empty initial context conditions used. • Simple account of non-determinism: successful actions have a
10% probability of failure.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
6 / 14
Assumptions Aim is to understand the nuances of learning under different goal-plan hierarchies using a simplified setting: • Recursive/parameterised events or relational beliefsets not
addressed. • BDI failure recovery mechanism disabled during learning. • Synthetic plan library with empty initial context conditions used. • Simple account of non-determinism: successful actions have a
10% probability of failure.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
6 / 14
Assumptions Aim is to understand the nuances of learning under different goal-plan hierarchies using a simplified setting: • Recursive/parameterised events or relational beliefsets not
addressed. • BDI failure recovery mechanism disabled during learning. • Synthetic plan library with empty initial context conditions used. • Simple account of non-determinism: successful actions have a
10% probability of failure.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
6 / 14
Assumptions Aim is to understand the nuances of learning under different goal-plan hierarchies using a simplified setting: • Recursive/parameterised events or relational beliefsets not
addressed. • BDI failure recovery mechanism disabled during learning. • Synthetic plan library with empty initial context conditions used. • Simple account of non-determinism: successful actions have a
10% probability of failure. Ongoing work aims to relax these constraints towards a more practical system.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
6 / 14
Results: Does Selective Recording Matter? G ...
P1
...
Pi
GA
P4
GB ×3
×3
PA
PB ×
GA1 ×3
√
×
GA2
×
GB1 ×3
√
×
GB2 ×3
×3
√
×
PB2 √
0 PB2
×
Structure where both schemes show comparable performance.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
7 / 14
Results: Does Selective Recording Matter? (cont.) Success 1.0 0.8 0.6 0.4 0.2 Iterations 0.0
1000
2500
4000
Performance of ACL (crosses) vs. BUL (circles). Dashed line shows optimal performance. Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
8 / 14
Results: Learning with Applicability Filtering Plan execution is generally not cost-free, so agent may fail a goal without even trying if it is unlikely to succeed.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
9 / 14
Results: Learning with Applicability Filtering Plan execution is generally not cost-free, so agent may fail a goal without even trying if it is unlikely to succeed. Success 1.25 1.00 0.75 0.50 0.25 0.00 −0.25
Iterations 500
1500
2500
Performance of ACL (crosses) vs. BUL (circles). Dashed line shows optimal performance. Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
9 / 14
Improving Plan Selection Coverage-based confidence measure Idea is that confidence in a plan’s decision tree increases as more choices below the plan are covered. G Pi GA
×
√
GB
×
√
×
×
Highlighted path shows 1/9 possible choices under Pi . Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
10 / 14
Improving Plan Selection (cont.)
How confidence influences plan selection • When the plan has not been tried before (zero coverage) we bias
towards the default weight of 0.5. • As more options are tried (approaching full coverage), we
progressively bias towards the decision tree probability pT (w).
Plan selection weight calculation Ω0T (w) = 0.5 + [cT (w) ∗ (pT (w) − 0.5)] .
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
11 / 14
Results: Goal-Plan Hierarchy B Success 1.0 0.8 0.6 0.4 0.2 0.0
Iterations 500
1500
2500
Performance of ACL+Ω0T (red crosses) vs. previous results in structure that suits the conservative BUL approach. Dashed line shows optimal performance. Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
12 / 14
Results: Learning with Applicability Filtering Success 1.25 1.00 0.75 0.50 0.25 0.00 −0.25
Performance of
Singh et al. (RMIT & UvA)
Iterations 500
1500
ACL+Ω0T
2500
(red crosses) vs. previous results
Learning BDI Plan Selection
AAMAS 2010
13 / 14
Learning Context Conditions for BDI Plan Selection
• Learning BDI plan selection is desirable since designing exact
context conditions for practical systems is non-trivial. • Our approach uses decision trees to learn the context condition of
plans. • We suggest that an aggressive sampling scheme combined with a
coverage-based confidence measure is a good candidate approach for the general hierarchical setting.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
14 / 14
Learning Context Conditions for BDI Plan Selection
• Learning BDI plan selection is desirable since designing exact
context conditions for practical systems is non-trivial. • Our approach uses decision trees to learn the context condition of
plans. • We suggest that an aggressive sampling scheme combined with a
coverage-based confidence measure is a good candidate approach for the general hierarchical setting.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
14 / 14
Learning Context Conditions for BDI Plan Selection
• Learning BDI plan selection is desirable since designing exact
context conditions for practical systems is non-trivial. • Our approach uses decision trees to learn the context condition of
plans. • We suggest that an aggressive sampling scheme combined with a
coverage-based confidence measure is a good candidate approach for the general hierarchical setting.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
14 / 14
References
M. Bratman, D. Israel, and M. Pollack. Plans and resource-bounded practical reasoning. Computational Intelligence, 4(4):349–355, 1988. A.S. Rao AgentSpeak (L): BDI agents speak out in a logical computable language. Lecture Notes in Computer Science, 1038:42–55, 1996. S. Airiau, L. Padgham, S. Sardina, and S. Sen. Enhancing Adaptation in BDI Agents Using Learning Techniques. International Journal of Agent Technologies and Systems, 2009.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
14 / 14
Goal-Plan Structure T1
G
×3
×17 Pi0
Pi
×3 GiA ×8 √
√
Gi0
GiB
×
×8 √
√
×
×
×
×
Structure where one of many complex options has a solution. This configuration suits the aggressive ACL approach.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
14 / 14
Results: Goal-Plan Structure T1 Success 1.0 0.8 0.6 0.4 0.2 Iterations 0.0
500
1000
1500
Performance of ACL (crosses) vs. BUL (circles). Dashed line shows optimal performance. Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
14 / 14
Goal-Plan Structure T2 ×2
G
Pi0
P ×2 ×2
×2 ×2
×2
× ×2
√
×
×3 ×
× ×2
√
×
Structure has solution in one complex option. This configuration suits the conservative BUL approach. Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
14 / 14
Results: Goal-Plan Structure T2 Success 1.0 0.8 0.6 0.4 0.2 Iterations 0.0
500
1500
2500
Performance of ACL (crosses) vs. BUL (circles). Dashed line shows optimal performance. Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
14 / 14
Goal-Plan Structure T3 G ...
P1
...
Pi
GA
P4
GB ×3
×3
PA
PB ×
GA1 ×3
√
×
GA2
×
GB1 ×3
√
×
GB2 ×3
×3
√
×
PB2 √
0 PB2
×
Structure where both schemes show comparable performance.
Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
14 / 14
Results: Goal-Plan Structure T3 Success 1.0 0.8 0.6 0.4 0.2 Iterations 0.0
1000
2500
4000
Performance of ACL (crosses) vs. BUL (circles). Dashed line shows optimal performance. Singh et al. (RMIT & UvA)
Learning BDI Plan Selection
AAMAS 2010
14 / 14