Learning Context Conditions for BDI Plan Selection

Viewer
Transcript

Learning Context Conditions for BDI Plan Selection Dhirendra Singh1

1 School

Sebastian Sardina1 Stéphane Airiau2

Lin Padgham1

of Computer Science & Information Technology RMIT University, Australia

2 Institute

for Logic, Language and Computation University of Amsterdam, The Netherlands

Autonomous Agents and Multiagent Systems May 2010

Learning BDI Plan Selection SENSORS

events

Pending Events

Environment

Beliefs

BDI engine

Intention Stacks

ACTUATORS

dynamic

Plan library static

actions

Plan δ is a strategy to resolve event e whenever context ψ holds. Our focus is the plan selection problem i.e. to learn ψ.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

1 / 14

Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms

(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

2 / 14

Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms

(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

2 / 14

Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms

(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

2 / 14

Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms

(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

2 / 14

Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms

(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.

Nonetheless • Behaviours (plans) and the situations where they apply (context)

are fixed at design time. • For complex domain, it is difficult to specify complete context

conditions upfront. • Once deployed, the agent has no means to adapt to changes in

the initial environment. Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

2 / 14

Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms

(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.

Nonetheless • Behaviours (plans) and the situations where they apply (context)

are fixed at design time. • For complex domain, it is difficult to specify complete context

conditions upfront. • Once deployed, the agent has no means to adapt to changes in

the initial environment. Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

2 / 14

Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms

(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.

Nonetheless • Behaviours (plans) and the situations where they apply (context)

are fixed at design time. • For complex domain, it is difficult to specify complete context

conditions upfront. • Once deployed, the agent has no means to adapt to changes in

the initial environment. Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

2 / 14

Motivation for Learning The Belief-Desire-Intention (BDI) model of agency • Is robust and well suited for dynamic environments. • Has inspired several development platforms

(PRS, AgentSpeak(L), JACK, JASON, SPARK, 3APL and others). • Has been deployed in practical systems like UAVs.

Nonetheless • Behaviours (plans) and the situations where they apply (context)

are fixed at design time. • For complex domain, it is difficult to specify complete context

conditions upfront. • Once deployed, the agent has no means to adapt to changes in

the initial environment. Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

2 / 14

Learning From Plan Choices G 1

...

P1

...

Pi

Pn

GB

GA

5

2

PA

PB ×

GA1 3

GB1

×

GB2

6

4

√

×

GA2

√

×

7

√

×

PB2 √

0 PB2

×

Execution trace for successful resolution of goal G given world state w. Success means that all correct choices were made.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

3 / 14

Learning From Plan Choices G 1

...

P1

...

Pi ?

GA

Pn

GB 5

2

PA

PB ? ×

GA1 3

GB1

×

GB2

6

4

√

×

GA2

√

×

7

√

×

PB2 √

0 PB2

×

Possible execution trace where goal G is not resolved for w. Should non-leaf plans consider this failure meaningful?

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

3 / 14

Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as

meaningful. • BUL: Conservative approach that records failures only when

choices below are considered to be well-informed. • Success is always recorded for both approaches.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

4 / 14

Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as

meaningful. • BUL: Conservative approach that records failures only when

choices below are considered to be well-informed. • Success is always recorded for both approaches.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

4 / 14

Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as

meaningful. • BUL: Conservative approach that records failures only when

choices below are considered to be well-informed. • Success is always recorded for both approaches.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

4 / 14

Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as

meaningful. • BUL: Conservative approach that records failures only when

choices below are considered to be well-informed. • Success is always recorded for both approaches.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

4 / 14

Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as

meaningful. • BUL: Conservative approach that records failures only when

choices below are considered to be well-informed. • Success is always recorded for both approaches.

2. Using ongoing learning for plan selection • Obtain a numeric measure of confidence in the ongoing learning

output (i.e. a plan’s likelihood of success in the situation). • Use the confidence measure to adjust selection weights during

probabilistic plan selection.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

4 / 14

Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as

meaningful. • BUL: Conservative approach that records failures only when

choices below are considered to be well-informed. • Success is always recorded for both approaches.

2. Using ongoing learning for plan selection • Obtain a numeric measure of confidence in the ongoing learning

output (i.e. a plan’s likelihood of success in the situation). • Use the confidence measure to adjust selection weights during

probabilistic plan selection.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

4 / 14

Learning Considerations 1. Collecting training data for learning • ACL: Aggressive approach that considers all failures as

meaningful. • BUL: Conservative approach that records failures only when

choices below are considered to be well-informed. • Success is always recorded for both approaches.

2. Using ongoing learning for plan selection • Obtain a numeric measure of confidence in the ongoing learning

output (i.e. a plan’s likelihood of success in the situation). • Use the confidence measure to adjust selection weights during

probabilistic plan selection.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

4 / 14

BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic

decision trees. • Select plans probabilistically based on their ongoing expectation

of success. • Learn context conditions over time by training decision trees using

success/failure outcomes under various situations.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

5 / 14

BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic

decision trees. • Select plans probabilistically based on their ongoing expectation

of success. • Learn context conditions over time by training decision trees using

success/failure outcomes under various situations.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

5 / 14

BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic

decision trees. • Select plans probabilistically based on their ongoing expectation

of success. • Learn context conditions over time by training decision trees using

success/failure outcomes under various situations.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

5 / 14

BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic

decision trees. • Select plans probabilistically based on their ongoing expectation

of success. • Learn context conditions over time by training decision trees using

success/failure outcomes under various situations.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

5 / 14

BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic

decision trees. • Select plans probabilistically based on their ongoing expectation

of success. • Learn context conditions over time by training decision trees using

success/failure outcomes under various situations.

Contributions of this paper • A more principled analysis of the work in [Airiau et al. 2009]. • Learning with applicability filtering (using thresholds to filter plans

that do not apply in a given situation).

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

5 / 14

BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic

decision trees. • Select plans probabilistically based on their ongoing expectation

of success. • Learn context conditions over time by training decision trees using

success/failure outcomes under various situations.

Contributions of this paper • A more principled analysis of the work in [Airiau et al. 2009]. • Learning with applicability filtering (using thresholds to filter plans

that do not apply in a given situation).

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

5 / 14

BDI Learning Framework Previous work (Airiau et al. 2009) • Augment static logical context conditions of plans with dynamic

decision trees. • Select plans probabilistically based on their ongoing expectation

of success. • Learn context conditions over time by training decision trees using

success/failure outcomes under various situations.

Contributions of this paper • A more principled analysis of the work in [Airiau et al. 2009]. • Learning with applicability filtering (using thresholds to filter plans

that do not apply in a given situation).

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

5 / 14

Assumptions Aim is to understand the nuances of learning under different goal-plan hierarchies using a simplified setting: • Recursive/parameterised events or relational beliefsets not

addressed. • BDI failure recovery mechanism disabled during learning. • Synthetic plan library with empty initial context conditions used. • Simple account of non-determinism: successful actions have a

10% probability of failure.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

6 / 14

Assumptions Aim is to understand the nuances of learning under different goal-plan hierarchies using a simplified setting: • Recursive/parameterised events or relational beliefsets not

addressed. • BDI failure recovery mechanism disabled during learning. • Synthetic plan library with empty initial context conditions used. • Simple account of non-determinism: successful actions have a

10% probability of failure.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

6 / 14

Assumptions Aim is to understand the nuances of learning under different goal-plan hierarchies using a simplified setting: • Recursive/parameterised events or relational beliefsets not

addressed. • BDI failure recovery mechanism disabled during learning. • Synthetic plan library with empty initial context conditions used. • Simple account of non-determinism: successful actions have a

10% probability of failure.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

6 / 14

Assumptions Aim is to understand the nuances of learning under different goal-plan hierarchies using a simplified setting: • Recursive/parameterised events or relational beliefsets not

addressed. • BDI failure recovery mechanism disabled during learning. • Synthetic plan library with empty initial context conditions used. • Simple account of non-determinism: successful actions have a

10% probability of failure.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

6 / 14

Assumptions Aim is to understand the nuances of learning under different goal-plan hierarchies using a simplified setting: • Recursive/parameterised events or relational beliefsets not

addressed. • BDI failure recovery mechanism disabled during learning. • Synthetic plan library with empty initial context conditions used. • Simple account of non-determinism: successful actions have a

10% probability of failure.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

6 / 14

Assumptions Aim is to understand the nuances of learning under different goal-plan hierarchies using a simplified setting: • Recursive/parameterised events or relational beliefsets not

addressed. • BDI failure recovery mechanism disabled during learning. • Synthetic plan library with empty initial context conditions used. • Simple account of non-determinism: successful actions have a

10% probability of failure. Ongoing work aims to relax these constraints towards a more practical system.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

6 / 14

Results: Does Selective Recording Matter? G ...

P1

...

Pi

GA

P4

GB ×3

×3

PA

PB ×

GA1 ×3

√

×

GA2

×

GB1 ×3

√

×

GB2 ×3

×3

√

×

PB2 √

0 PB2

×

Structure where both schemes show comparable performance.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

7 / 14

Results: Does Selective Recording Matter? (cont.) Success 1.0 0.8 0.6 0.4 0.2 Iterations 0.0

1000

2500

4000

Performance of ACL (crosses) vs. BUL (circles). Dashed line shows optimal performance. Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

8 / 14

Results: Learning with Applicability Filtering Plan execution is generally not cost-free, so agent may fail a goal without even trying if it is unlikely to succeed.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

9 / 14

Results: Learning with Applicability Filtering Plan execution is generally not cost-free, so agent may fail a goal without even trying if it is unlikely to succeed. Success 1.25 1.00 0.75 0.50 0.25 0.00 −0.25

Iterations 500

1500

2500

Performance of ACL (crosses) vs. BUL (circles). Dashed line shows optimal performance. Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

9 / 14

Improving Plan Selection Coverage-based confidence measure Idea is that confidence in a plan’s decision tree increases as more choices below the plan are covered. G Pi GA

×

√

GB

×

√

×

×

Highlighted path shows 1/9 possible choices under Pi . Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

10 / 14

Improving Plan Selection (cont.)

How confidence influences plan selection • When the plan has not been tried before (zero coverage) we bias

towards the default weight of 0.5. • As more options are tried (approaching full coverage), we

progressively bias towards the decision tree probability pT (w).

Plan selection weight calculation Ω0T (w) = 0.5 + [cT (w) ∗ (pT (w) − 0.5)] .

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

11 / 14

Results: Goal-Plan Hierarchy B Success 1.0 0.8 0.6 0.4 0.2 0.0

Iterations 500

1500

2500

Performance of ACL+Ω0T (red crosses) vs. previous results in structure that suits the conservative BUL approach. Dashed line shows optimal performance. Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

12 / 14

Results: Learning with Applicability Filtering Success 1.25 1.00 0.75 0.50 0.25 0.00 −0.25

Performance of

Singh et al. (RMIT & UvA)

Iterations 500

1500

ACL+Ω0T

2500

(red crosses) vs. previous results

Learning BDI Plan Selection

AAMAS 2010

13 / 14

Learning Context Conditions for BDI Plan Selection

• Learning BDI plan selection is desirable since designing exact

context conditions for practical systems is non-trivial. • Our approach uses decision trees to learn the context condition of

plans. • We suggest that an aggressive sampling scheme combined with a

coverage-based confidence measure is a good candidate approach for the general hierarchical setting.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

14 / 14

Learning Context Conditions for BDI Plan Selection

• Learning BDI plan selection is desirable since designing exact

context conditions for practical systems is non-trivial. • Our approach uses decision trees to learn the context condition of

plans. • We suggest that an aggressive sampling scheme combined with a

coverage-based confidence measure is a good candidate approach for the general hierarchical setting.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

14 / 14

Learning Context Conditions for BDI Plan Selection

• Learning BDI plan selection is desirable since designing exact

context conditions for practical systems is non-trivial. • Our approach uses decision trees to learn the context condition of

plans. • We suggest that an aggressive sampling scheme combined with a

coverage-based confidence measure is a good candidate approach for the general hierarchical setting.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

14 / 14

References

M. Bratman, D. Israel, and M. Pollack. Plans and resource-bounded practical reasoning. Computational Intelligence, 4(4):349–355, 1988. A.S. Rao AgentSpeak (L): BDI agents speak out in a logical computable language. Lecture Notes in Computer Science, 1038:42–55, 1996. S. Airiau, L. Padgham, S. Sardina, and S. Sen. Enhancing Adaptation in BDI Agents Using Learning Techniques. International Journal of Agent Technologies and Systems, 2009.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

14 / 14

Goal-Plan Structure T1

G

×3

×17 Pi0

Pi

×3 GiA ×8 √

√

Gi0

GiB

×

×8 √

√

×

×

×

×

Structure where one of many complex options has a solution. This configuration suits the aggressive ACL approach.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

14 / 14

Results: Goal-Plan Structure T1 Success 1.0 0.8 0.6 0.4 0.2 Iterations 0.0

500

1000

1500

Performance of ACL (crosses) vs. BUL (circles). Dashed line shows optimal performance. Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

14 / 14

Goal-Plan Structure T2 ×2

G

Pi0

P ×2 ×2

×2 ×2

×2

× ×2

√

×

×3 ×

× ×2

√

×

Structure has solution in one complex option. This configuration suits the conservative BUL approach. Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

14 / 14

Results: Goal-Plan Structure T2 Success 1.0 0.8 0.6 0.4 0.2 Iterations 0.0

500

1500

2500

Performance of ACL (crosses) vs. BUL (circles). Dashed line shows optimal performance. Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

14 / 14

Goal-Plan Structure T3 G ...

P1

...

Pi

GA

P4

GB ×3

×3

PA

PB ×

GA1 ×3

√

×

GA2

×

GB1 ×3

√

×

GB2 ×3

×3

√

×

PB2 √

0 PB2

×

Structure where both schemes show comparable performance.

Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

14 / 14

Results: Goal-Plan Structure T3 Success 1.0 0.8 0.6 0.4 0.2 Iterations 0.0

1000

2500

4000

Performance of ACL (crosses) vs. BUL (circles). Dashed line shows optimal performance. Singh et al. (RMIT & UvA)

Learning BDI Plan Selection

AAMAS 2010

14 / 14