Temporal Plan Adaptor

Viewer
Transcript

Adapting Plans with Durative Actions U Senthil and Deepak Khemani AIDB Lab, Dept. of CS&E, IIT Madras, Chennai – 600036, Tamilnadu, India. Email: senthil AT cs.iitm.ernet.in, khemani AT iitm.ac.in Abstract Plan adaptation is the process of modifying an existing plan to come out with a plan for a new planning problem. The existing plan may not be quite correct for the new planning problem for various reasons. It might be because the situation in the environment has changed since the plan was generated or the plan was retrieved from the agent’s memory. Generating a plan by reusing an existing plan is computationally much cheaper than generating it from scratch. In temporal planning, the actions are durative unlike instantaneous actions in classical planners. In this paper, we propose a domain independent system Temporal Plan Adaptor (TPA) for adapting plans with durative actions. Experimental results show that generating a plan using our adaptation algorithm can be faster than generating a plan from scratch.

Keywords: Plan adaptation, temporal plan adaptation, faulty plan, temporal planning, durative action.

Adapting Plans with Durative Actions Abstract Plan adaptation is the process of modifying an existing plan to come out with a plan for a new planning problem. The existing plan may not be quite correct for the new planning problem for various reasons. It might be because the situation in the environment has changed since the plan was generated or the plan was retrieved from the agent’s memory. Generating a plan by reusing an existing plan is computationally much cheaper than generating it from scratch. In temporal planning, the actions are durative unlike instantaneous actions in classical planners. In this paper, we propose a domain independent system Temporal Plan Adaptor (TPA) for adapting plans with durative actions. Experimental results show that generating a plan using our adaptation algorithm can be faster than generating a plan from scratch.

Keywords: Plan adaptation, temporal plan adaptation, faulty plan, temporal planning, durative action.

1 Introduction Plan adaptation is the process of altering a not quite correct plan to solve a given problem. A plan may not be quite correct (we will call it the faulty plan) for various reasons. The situation in a dynamic environment may have changed since the plan was formulated. For example, a bicycle may confront you with a flat tyre on the way to market. In a dynamic planning environment, the conditions in which the plan was produced may not persist during execution. Producing a fresh plan every time the situation changes may not be viable, and tweaking the plan on the fly may be imperative. For example, a mountaineering expedition has to adapt to changing weather, injury or other extraneous events. A plan may also be faulty plan because it was retrieved from the agent’s memory as is done in case based planning and the current problem is little different. In fact, the adaptation phase is perceived as an integral part of case based reasoning [Leake 1996], even though there are very few systems that do automatic adaptation. The motivation of case based planning [Au et al., 2002] is that tinkering a small part of a long solution is computationally much cheaper than formulating a complete plan ab initio. Classical planning, even in its simplest form has been shown to be PSPACE complete [Bylander, 1994]. And yet, even if we had very fast first principle planners (Graphplan [Blum and Furst, 1997], FF [Hoffmann and Nebel, 2001], RR-F [Kumashi and Khemani, 2002]), there is another motivation for adapting familiar plans, as in case based planning. The reason is that in a multi agent scenario, where other agents may be involved in plan understanding, generating different plans each time may not be desirable. For example, commuters in a suburban transport system may expect a particular fast train on a specific platform. Meanwhile, trains have been known to be late, and thus disrupt existing schedules. In such a scenario even if a new scheduling system may produce an optimum schedule with reallocated platforms, from the point of view of commuter convenience, such a schedule may not be advisable. Temporal planners break away from the STRIPS assumption [Fikes and Nilsson, 1971] of instantaneous actions and plan with actions that extend over a non-zero duration. Such actions are called durative actions. For example, a key aspect of organizing a complete meal is scheduling the different actions each of which may

have a different duration. In this paper, we look at an approach to adapt a faulty temporal plan to produce a correct temporal plan. Experimental studies demonstrate that this approach is a competitive with producing a plan from scratch. The rest of this paper is organized as follows. Section 2 describes some of the previous work related to domain independent plan adaptation. Section 3 gives the overall architecture of Temporal Plan Adaptor (TPA) and describes each component of TPA in detail. Section 4 discusses Relaxed Temporal Planning Graph (RTPG) in brief, which is used the compute the heuristic values of the states in TPA. Finally, we present experimental results in Section 5 and conclude in Section 6.

2 Related Work Hanks and Weld propose an algorithm [Hanks and Weld, 1995] for domain independent plan adaptation. Their algorithm views planning as search through partial plans. In generative planning, the search starts at the root node and moves from one node to another using plan refinement operators. While adapting an existing plan, the search starts at an intermediate node (the faulty plan to be adapted). Generative planning involves adding a new step, adding a link between nodes and resolving a threat. Such constraints are always added in generative planning, whereas, adaptation of a plan also involves retracting such constraints. This system is called as Systematic Plan Adaptor. Multi-Plan Adaptor [Francis and Ram, 1995] merges many partial order plans from past cases to come out with a plan for the new problem. It is an extension of Systematic Plan Adaptor. Gerivini and Serina discuss another algorithm [Gerevini and Serina, 2000] for domain independent plan adaptation. It involves construction of a planning graph for the new problem and the selection of those actions from the faulty plan which occur at the same level in both the planning graph and the faulty plan. The resulting plan contains sequence of actions with few empty windows i.e., there is no link between some pairs of consecutive actions. These windows are planned for and filled with new plans, thus, resulting with a plan for the new problem. The systems discussed here adapt plans involving instantaneous actions and not durative actions.

3 Temporal Plan Adaptor This section defines durative action before describing Temporal Plan Adaptor (TPA) and its components in detail. In classical planning, all the actions are assumed to be instantaneous. But, in planning involving temporal and resource constraints, the actions are durative. Each action has a start time st, duration dur and an end time et, where et = st + dur. Action has a set of preconditions which may be required to be true either at time point st or throughout the duration dur of the action. The effects of an action can either occur at time point st (instantaneous effect) or et (delayed effect). Actions can also consume or produce resources. Several types of checking for resource related preconditions are =, <=, >=, < and >. The types of changes to the values of resources due to the effects of an action are =, +=, −=, ∗= and /=. This definition of durative actions with resources is based on PDDL 2.1 [Fox and Long, 2003]. The architecture of TPA is shown in Figure 1. A planning problem and a faulty temporal plan of a similar problem, TPold are given as inputs to TPA. The problems are similar in the sense that some facts in the initial state and the goal state are different. In addition, description of the domain to which the problem belongs is also given to TPA. The plan TPold consists of a set of actions along with their start times and durations. The actions are ordered according to their start times. The actions with same start time are ordered according to

Planning Problem

Faulty Plan TPold

Plan Trimmer

Plan Duration Adjustor

TPpt

TPpda RTPG

State

Heuristic-Based Plan Trimmer

Heuristic Value

Planning Problem

Sapa TPsapa

TPhbpt

Merger

TPnew Figure 1 Temporal Plan Adaptor their durations. The order of actions does not matter when the start time is same. It is in this order that each component of TPA examines the actions in a given temporal plan. The plan trimmer removes those actions from the faulty plan TPold, which are not applicable in the new problem. Let this plan generated by plan trimmer be called TPpt. The plan TPpt is given as input to plan duration adjustor (PDA). The PDA checks if the durations of the actions in the new problem is different from the durations of actions in TPpt. If the new duration of an action is different from its duration in the plan TPpt, the duration of the action is changed and the start times of the other actions dependent on this action are also modified accordingly. PDA performs this operation for all actions in TPpt and generates the plan, TPpda. The heuristic-based plan trimmer (HBPT) removes some actions from the plan TPpda based on the heuristic values of the states resulting after the application of each action in TPpda starting from the initial state. It constructs plan TPhbpt with those actions from TPpda which were applied to arrive at the state with the lowest heuristic value. A plan is found out from the state with the lowest heuristic value to the goal state of the new problem using Sapa, a temporal planner. Let this plan be called TPsapa. Merger merges this plan TPsapa with TPhbpt and generates the plan TPnew, for the given planning problem. Each state S in TPA is a pair 〈P, M〉, where • P represents a set of predicates true in state S

•

M represents a set of values of the functions in state S. Functions are used to represent values of the resources. An action a is applicable in a state S = if the following conditions are satisfied. • P satisfies all the preconditions of a • M satisfies all the resource conditions of a When an action a is applied to a state S, P′ and M′ in the resulting state S′ are found as follows: • P′ = P + Add-effects (a) – Delete-effects (a) • M′ is found by updating the values of each resource R in M that is used by action a. For each resource R in M, val (R′) = F (val (R)), if R is used by action a. val (R′) = val (R) , if R is not used by a. Action a uses a function F to update the value of resource R, used by it. While updating P and M, both the instantaneous and delayed effects are applied. The resulting state S′ is the state at the end of action a.

3.1 Plan Trimmer This section describes the function of the plan trimmer of TPA: removing the inapplicable actions from the faulty plan, TPold. Figure 2 shows the plan trimmer algorithm. The inputs to plan trimmer are initial state and goal state of the new problem and an existing plan of a similar problem. TPold is the existing faulty splan that is to be adapted to the new problem. Initially, the initial state of the planning problem is the current sate and TPpt, plan to be generated by plan trimmer is empty. The first action from TPold is removed and checked if it is applicable to the current state. If the first action a, removed from TPold is applicable to the current state • Action a is applied to the current state and the resulting state is made the current state. • Action a is added to TPpt. This process is repeated for all actions in TPold and TPpt is given as output by plan trimmer. Algorithm PlanTrimmer (Sinit, TPold) Inputs: Initial state Sinit of new problem and faulty plan TPold Output: Plan TPpt begin Scur = Sinit TPpt = φ while TPold ≠ φ begin a = remove-first-action (TPold) if a is applicable to Scur begin Scur = apply (Scur, a) add a to TPpt end end return (TPpt) end Figure 2 Algorithm for Plan Trimmer

Algorithm PlanDuratoinAdjustor (TPpt) Input: Plan TPpt given by plan trimmer Output: Plan TPpda begin for each action ai in TPpt begin newdur = get-new-duration (ai) if newdur ≠ dur(ai) begin dur(ai) = newdur adjust-start-time (TPpt, i) end end TPpda = TPpt return (TPpda) end Figure 3 Algorithm for Plan Duration Adjustor

3.2 Plan Duration Adjustor Since the durations of the actions in TPpt might be different in the new problem, the durations of all such actions should be changed to new durations and the start times of other dependent actions should also be modified accordingly. This is done by PDA. Figure 3 shows the plan duration adjustor algorithm. Each action ai in TPpt is checked for new duration. The new duration of an action is obtained from the problem file of the new problem. If the new duration is different from its duration in TPpt, duration of ai is changed and the start times of the action ordered after ai are also changed using the function adjust-start-time. Figure 4 shows the recursive algorithm for adjusting the start times of the actions dependent on the action whose duration or start time has been changed. Let pos be the position of the action in the temporal plan AL whose duration or start time has been modified. For each action aj that occurs after the action apos in AL, a set TPprev is constructed with the actions that are ordered immediately before aj. Let end-timemax be the end time of the action ending last in TPprev. If the start time of action aj is different from end-timemax, the start time of the action aj is changed to end-timemax and the start times of the actions which are ordered after aj are changed by a recursive call to adjust-start-time function.

3.3 Constructing the Remaining Plan PDA generated the plan TPpda after adjusting the durations of actions in TPpt to the new durations of actions in the new problem. This section explains the remaining steps of the TPA algorithm i.e., the functions of HBPT and merger. The algorithm is shown in Figure 5. All actions in TPpda (plan given by PDA) are applied by HBPT one after the other starting from the initial state of the new problem. As each action is applied, it is removed from TPpda and added to the plan TPhbpt. A state list SL is formed with the initial state of the new problem and all the states that are obtained after the application of each action in TPpda. The heuristic values of all the states in SL are calculated using Relaxed Temporal Planning Graph (RTPG), which is briefly described in Section 5. The state with the lowest heuristic

Algorithm Adjust-Start-Time (AL, pos) Input: A plan AL and position pos of the action in AL whose duration or start time has been changed for each action aj in AL starting from apos+1 begin TPprev = set of actions ordered immediately before aj end-timemax = max (end-time (ak)) ∀ ak ∈ TPprev if end-timemax ≠ start-time(aj) begin start-time (aj) = end-timemax adjust-start-time (AL, j) end end Figure 4 Recursive algorithm for adjusting the start times of the actions, which are dependent on the effects of the action whose duration and/or start time has been changed value, Smin in SL is chosen. HBPT formulates a planning problem with Smin as initial state and Sgoal as the goal state. If Smin satisfies goal state, the planning problem is not formulated. The set of actions, which were used to arrive at this state Smin, are retained in the plan TPhbpt and the rest are removed from TPhbpt by HBPT. Algorithm HBPTM (TPpda, Sinit, Sgoal) Input: Plan TPpda, initial state Sinit and goal state Sgoal of the new problem Output: Plan TPnew for the new planning problem. begin Scur = Sinit; TPhbpt = φ; State List: SL = {Scur} while TPpda ≠ φ begin a = remove-first-action (TPpda) Scur = apply (Scur, a) add (Scur, SL) add (a, TPhbpt) end Smin = Sk ∈ SL such that ∀Si ∈ SL heuristic (Sk) ≤ heuristic (Si) remove all actions from TPhbpt that are ordered after ak-1 if Smin does not satisfy Sgoal TPsapa = plan (Smin, Sgoal) TPnew = append (TPsapa, TPhbpt) else TPnew = TPhbpt adjust-start-time (TPnew, 1) return (TPnew) end Figure 5 Algorithm for Heuristic-Based Plan Trimmer and Merger

Merger merges the plans produced by HBPT and Sapa. Sapa generates a plan for the planning problem formulated by HBPT. Let this plan be TPsapa. If planning problem has not been formulated by HBPT, TPsapa is null. Merger appends the actions in TPsapa to the actions in TPhbpt, the plan produced by HBPT, resulting in TPnew. The recursive function adjust-start-time is called with TPnew and 1, the position of first action in TPnew as the parameters, to adjust the start times of new actions from TPsapa that were appended to TPhbpt. The set of actions in TPnew is the plan for the new problem. Sapa [Do and Kambhampati , 2003] planner is used to plan from Smin, the state with lowest heuristic value in HBPT to Sgoal, the goal state of the new problem. Sapa uses a forward chaining search algorithm to find out a plan. The search is conducted through a space of time-stamped states. The search proceeds by applying all applicable actions to the current state and queuing up the resulting states into a sorted state queue. The current state is the one taken out from the top of the state queue. Initially, the state queue contains the initial state alone. The state queue is sorted according to the heuristic values of the states. The search continues until either the goal state is reached or the state queue becomes empty, in which case there is no solution.

4 Relaxed Temporal Planning Graph Heuristic Based Plan Trimmer in Temporal Plan Adaptor uses Relaxed Temporal Planning Graph described in [Do and Kambhampati , 2003] to compute the heuristic values of the states. This section describes briefly how to estimate the heuristic value of a given state using RTPG. RTPG is a temporal planning graph in which the delete effects and resource effects of the actions are ignored. Temporal planning graph [Smith and Weld, 1999] is a bi-level graph with one level containing all the facts and the other level containing all the actions in the problem. Each fact is linked to all the actions supporting it and each action is linked to all facts belonging to its preconditions and effects. An event queue is maintained for the RTPG, to support the delayed effects of the actions. Each action a has an execution cost Cexec(a) associated with it. Let C (a, t) be the cost incurred to make possible the execution of action a at time t. Similarly, for each fact f, C (f, t) is the cost incurred to achieve f at time t. C (a, t) and C (f, t) are calculated as shown below: C (a, t) = ΣC (f, t) ∀ f ∈ preconditions (a) C (f, t) = C (a, t) + Cexec (a) where f is one of the effects of a. As the graph is being built, the costs C (f, t) and C (a, t) are updated whenever a better cost is found. The change in the cost is propagated to all other facts and actions. RTPG is built until there are no events in the event queue that can decrease the cost of any fact. A relaxed plan is extracted from the RTPG. The total execution cost of the actions in this relaxed plan gives the heuristic value.

5 Experimental Results The experimental setup to evaluate the performance of TPA is shown in Figure 6. Sapa generates a plan p for a given problem pbm in a particular domain. Similar Problems Generator generates a set of problems, sim-pbm, similar to pbm. The set of problems in sim-pbm is solved by both TPA and Sapa. The original plan p of pbm is given as input to TPA for adaptation. Let Psapa and Ptpa be the set of plans generated by Sapa and TPA respectively. The time taken to generate the plans in Psapa and Ptpa are compared with each other. In addition, the makespans of the plans are also compared. Makespan of a plan is its overall duration i.e., time taken to complete the execution of the plan.

SAPA

Planning Problem

Plan Temporal Plan Adaptor

Similar Problems Generator

Plans

Similar Problems

SAPA

Plans

Figure 6 Experimental Setup Similar problems are generated by applying one or more of the following methods: • By changing the values of functions in the planning problem that are used to represent the resource values and duration of the functions. • By adding new objects and goals to the planning problem. • By removing existing objects and goals. The second and third methods also involve modifying existing goals. Changing the values of a function changes the duration of the action either directly or indirectly. For example, in Satellite Domain slew-time function is used to represent the time taken by the satellite to turn from one direction to another. Changing the value of this function directly changes the duration of the action turn_to. But, changing the value of the distance function in Zeno Travel domain indirectly changes the duration of the action fly i.e., the time taken to fly an aircraft from one city to another. This is because the duration of the action fly is found by dividing the distance between two cities by the aircraft’s speed. The experimental results are shown in Figure 7 for three domains: Depots, Satellite and Zeno Travel. The graphs on the left compare the time taken by TPA and Sapa to generate plans. The graphs on the right compare the makespans of the plans generated by TPA and Sapa. Depots domain consists of pallets, crates, trucks, depots, distributors and hoists. Depots and distributors are locations. Pallets hold the crates in any location. Typical goals involve transporting crates from one location to another. Satellite domain consists of satellites with each of them equipped with different instruments. Goals involve taking image of different directions with these instruments. Images need to be taken in different modes. Certain instruments support certain modes of imaging. The zeno travel domain consists of persons, aircrafts and cities. Goals involve transporting persons from one city to another using the aircrafts. The experiments results show that the time taken to generate plans by TPA is much less than the time taken by Sapa to generate plans. The black line highlights those problems in which Sapa timed out i.e., Sapa did not find a solution within the given time limit of 3600 seconds. Sapa has timed out in 2, 1 and 11 problems in depot, satellite and zeno travel domains respectively. In most of the problems the makespans of the TPA plans are almost same as the makespans of the Sapa plans. In some cases, the makespans of Sapa plans are much lower than the makespans of the TPA plans.

SAPA

TPA

10000

250

1000 100 10

200 150 100 50

1

0 1

3

5

7

9 11 13 15 17 19

1

Problem s Sim ilar to depots-pbm

TPA

5

SAPA

7

9

11 13 15 17 19

TPA

SAPA

500 400

1000

Makespan

Time Taken in seconds (Log Scale)

3

Problem s Sim ilar to depots-pbm

10000

100 10 1

300 200 100 0

1

4

7 10 13 16 19 22 25 28

1

Problem s Sim ilar to sat-pbm

TPA

SAPA

150

Makespan

1000

10 1

7

10 13 16 19 22 25 28

TPA 200

100

4

Problem s Sim ilar to sat-pbm

10000

Time Taken in seconds (Log Scale)

SAPA

300 Makespan

Time Taken in seconds (Log Scale)

TPA

SAPA

100 50 0

1

5

9 13 17 21 25 29 33 37

Problem s Sim ilar to zeno-pbm

1 4 7 10 13 16 19 22 25 28 31 34 37 40 Problem s Sim ilar to zeno-pbm

Figure 7 Experimental results in Depots, Satellite and Zeno Travel domains. The plots on the left show planning times. Plots on the right show makespans of the plans generated.

6 Conclusions Including temporal and resource constraints has taken planners one step closer to the real world problems. This paper explored the process of plan adaptation for domains with durative actions and resource constraints. Plan adaptation is crucial to reuse of retrieved plans or repair of plans in a changing environment. We described the Temporal Plan Adaptor, a domain independent system to adapt plans with durative actions. The approach is to salvage as much of the older plan as possible, and fill in the rest by planning from first principles. The experimental results show that generating a temporal plan by adapting an existing temporal plan using TPA can be much faster than generating a temporal plan from scratch. TPA also adapts plan with resource constraints in addition to temporal constraints. Although there is no special step during the adaptation process, this is taken care of when checking the applicability of actions to a state. Future work may involve improving the efficiency of the algorithm by including a step to handle the resource constraints during the adaptation phase.

References [Au et al., 2002] Tsz-Chiu Au, Hector Muñoz-Avila, Dana S. Nau. On the complexity of plan Adaptation by derivational analogy in a universal classical planning framework. In Proceedings of ECCBR, pages13-27, 2002. [Blum and Furst, 1997] Avrim L. Blum and Merrick L. Furst. Fast planning through planning graph analysis, In Artificial Intelligence, 90:281-300, 1997. [Bylander, 1994] T. Bylander. The computational complexity of propositional STRIPS planning. Artificial Intelligence, 69:161-204, 1994. [Do and Kambhampati , 2003] Minh B. Do and Subbarao Kambhampati. Sapa: a scalable multi-objective metric temporal planner, JAIR, 20:155-194, 2003. [Fikes and Nilsson, 1971] Richard E. Fikes and J. Nilsson. STRIPS: a new approach to the application of theorem proving to problem solving. Artificial Intelligence, 2:189-208, 1971. [Fox and Long, 2003] Maria Fox and Derek Long, PDDL2.1: an extension to PDDL for expressing temporal planning domains, JAIR, 20:61-124, 2003. [Francis and Ram, 1995] Anthony G. Francis, Jr. and A. Ram. A domain-independent algorithm for multi-plan adaptation and merging in least-commitment planners. In D. Aha and A. Rahm (Eds.), AAAI Fall Symposium: Adaptation of Knowledge Reuse, Menlo Park, CA. AAAI Press, 1995. [Gerevini and Serina, 2000] Alfonso Gerevini and Ivan Serina. Fast plan adaptation through planning graphs: local and systematic search techniques", in Proceedings of the 5th International Conference on Artificial Intelligence Planning Systems (AIPS-00), AAAI Press, 2000. [Hanks and Weld, 1995] Steve Hanks and Daniel S Weld. A domain-independent algorithm for plan adaptation. JAIR, 2:319-360, 1995. [Hoffmann and Nebel, 2001] Jorg Hoffmann and Bernhard Nebel. The FF planning system: fast plan generation through heuristic search. JAIR, 14:253-302, 2001. [Kumashi and Khemani, 2002] Praveen K. Kumashi and Deepak Khemani. State space regression planning using forward heuristic construction mechanism. In Proceedings of the International Conference on Knowledge Based Computer Systems, pages 489-499, 2002. [Leake, 1996] David Leake. Case-based reasoning -- experiences, lessons and future directions. AAAI Press/ The MIT Press, 1996. [Smith and Weld, 1999] David E. Smith and Daniel S. Weld. Temporal planning with mutual exclusion reasoning, In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999.

Hanks and Weld propose an algorithm [Hanks and Weld, 1995] for domain independent plan adaptation. Their algorithm views planning as search through partial plans. In generative planning, the search starts at the root node and moves from one node to another using plan refinement operators. While adapting an existing ...

Download PDF

205KB Sizes 1 Downloads 115 Views

Report

Temporal Plan Adaptor

Recommend Documents