if the following conditions are satisfied. • P satisfies all the preconditions of a • M satisfies all the resource conditions of a When an action a is applied to a state S, P′ and M′ in the resulting state S′ are found as follows: • P′ = P + Add-effects (a) – Delete-effects (a) • M′ is found by updating the values of each resource R in M that is used by action a. For each resource R in M, val (R′) = F (val (R)), if R is used by action a. val (R′) = val (R) , if R is not used by a. Action a uses a function F to update the value of resource R, used by it. While updating P and M, both the instantaneous and delayed effects are applied. The resulting state S′ is the state at the end of action a.
3.1 Plan Trimmer This section describes the function of the plan trimmer of TPA: removing the inapplicable actions from the faulty plan, TPold. Figure 2 shows the plan trimmer algorithm. The inputs to plan trimmer are initial state and goal state of the new problem and an existing plan of a similar problem. TPold is the existing faulty splan that is to be adapted to the new problem. Initially, the initial state of the planning problem is the current sate and TPpt, plan to be generated by plan trimmer is empty. The first action from TPold is removed and checked if it is applicable to the current state. If the first action a, removed from TPold is applicable to the current state • Action a is applied to the current state and the resulting state is made the current state. • Action a is added to TPpt. This process is repeated for all actions in TPold and TPpt is given as output by plan trimmer. Algorithm PlanTrimmer (Sinit, TPold) Inputs: Initial state Sinit of new problem and faulty plan TPold Output: Plan TPpt begin Scur = Sinit TPpt = φ while TPold ≠ φ begin a = remove-first-action (TPold) if a is applicable to Scur begin Scur = apply (Scur, a) add a to TPpt end end return (TPpt) end Figure 2 Algorithm for Plan Trimmer
Algorithm PlanDuratoinAdjustor (TPpt) Input: Plan TPpt given by plan trimmer Output: Plan TPpda begin for each action ai in TPpt begin newdur = get-new-duration (ai) if newdur ≠ dur(ai) begin dur(ai) = newdur adjust-start-time (TPpt, i) end end TPpda = TPpt return (TPpda) end Figure 3 Algorithm for Plan Duration Adjustor
3.2 Plan Duration Adjustor Since the durations of the actions in TPpt might be different in the new problem, the durations of all such actions should be changed to new durations and the start times of other dependent actions should also be modified accordingly. This is done by PDA. Figure 3 shows the plan duration adjustor algorithm. Each action ai in TPpt is checked for new duration. The new duration of an action is obtained from the problem file of the new problem. If the new duration is different from its duration in TPpt, duration of ai is changed and the start times of the action ordered after ai are also changed using the function adjust-start-time. Figure 4 shows the recursive algorithm for adjusting the start times of the actions dependent on the action whose duration or start time has been changed. Let pos be the position of the action in the temporal plan AL whose duration or start time has been modified. For each action aj that occurs after the action apos in AL, a set TPprev is constructed with the actions that are ordered immediately before aj. Let end-timemax be the end time of the action ending last in TPprev. If the start time of action aj is different from end-timemax, the start time of the action aj is changed to end-timemax and the start times of the actions which are ordered after aj are changed by a recursive call to adjust-start-time function.
3.3 Constructing the Remaining Plan PDA generated the plan TPpda after adjusting the durations of actions in TPpt to the new durations of actions in the new problem. This section explains the remaining steps of the TPA algorithm i.e., the functions of HBPT and merger. The algorithm is shown in Figure 5. All actions in TPpda (plan given by PDA) are applied by HBPT one after the other starting from the initial state of the new problem. As each action is applied, it is removed from TPpda and added to the plan TPhbpt. A state list SL is formed with the initial state of the new problem and all the states that are obtained after the application of each action in TPpda. The heuristic values of all the states in SL are calculated using Relaxed Temporal Planning Graph (RTPG), which is briefly described in Section 5. The state with the lowest heuristic
Algorithm Adjust-Start-Time (AL, pos) Input: A plan AL and position pos of the action in AL whose duration or start time has been changed for each action aj in AL starting from apos+1 begin TPprev = set of actions ordered immediately before aj end-timemax = max (end-time (ak)) ∀ ak ∈ TPprev if end-timemax ≠ start-time(aj) begin start-time (aj) = end-timemax adjust-start-time (AL, j) end end Figure 4 Recursive algorithm for adjusting the start times of the actions, which are dependent on the effects of the action whose duration and/or start time has been changed value, Smin in SL is chosen. HBPT formulates a planning problem with Smin as initial state and Sgoal as the goal state. If Smin satisfies goal state, the planning problem is not formulated. The set of actions, which were used to arrive at this state Smin, are retained in the plan TPhbpt and the rest are removed from TPhbpt by HBPT. Algorithm HBPTM (TPpda, Sinit, Sgoal) Input: Plan TPpda, initial state Sinit and goal state Sgoal of the new problem Output: Plan TPnew for the new planning problem. begin Scur = Sinit; TPhbpt = φ; State List: SL = {Scur} while TPpda ≠ φ begin a = remove-first-action (TPpda) Scur = apply (Scur, a) add (Scur, SL) add (a, TPhbpt) end Smin = Sk ∈ SL such that ∀Si ∈ SL heuristic (Sk) ≤ heuristic (Si) remove all actions from TPhbpt that are ordered after ak-1 if Smin does not satisfy Sgoal TPsapa = plan (Smin, Sgoal) TPnew = append (TPsapa, TPhbpt) else TPnew = TPhbpt adjust-start-time (TPnew, 1) return (TPnew) end Figure 5 Algorithm for Heuristic-Based Plan Trimmer and Merger
Merger merges the plans produced by HBPT and Sapa. Sapa generates a plan for the planning problem formulated by HBPT. Let this plan be TPsapa. If planning problem has not been formulated by HBPT, TPsapa is null. Merger appends the actions in TPsapa to the actions in TPhbpt, the plan produced by HBPT, resulting in TPnew. The recursive function adjust-start-time is called with TPnew and 1, the position of first action in TPnew as the parameters, to adjust the start times of new actions from TPsapa that were appended to TPhbpt. The set of actions in TPnew is the plan for the new problem. Sapa [Do and Kambhampati , 2003] planner is used to plan from Smin, the state with lowest heuristic value in HBPT to Sgoal, the goal state of the new problem. Sapa uses a forward chaining search algorithm to find out a plan. The search is conducted through a space of time-stamped states. The search proceeds by applying all applicable actions to the current state and queuing up the resulting states into a sorted state queue. The current state is the one taken out from the top of the state queue. Initially, the state queue contains the initial state alone. The state queue is sorted according to the heuristic values of the states. The search continues until either the goal state is reached or the state queue becomes empty, in which case there is no solution.
4 Relaxed Temporal Planning Graph Heuristic Based Plan Trimmer in Temporal Plan Adaptor uses Relaxed Temporal Planning Graph described in [Do and Kambhampati , 2003] to compute the heuristic values of the states. This section describes briefly how to estimate the heuristic value of a given state using RTPG. RTPG is a temporal planning graph in which the delete effects and resource effects of the actions are ignored. Temporal planning graph [Smith and Weld, 1999] is a bi-level graph with one level containing all the facts and the other level containing all the actions in the problem. Each fact is linked to all the actions supporting it and each action is linked to all facts belonging to its preconditions and effects. An event queue is maintained for the RTPG, to support the delayed effects of the actions. Each action a has an execution cost Cexec(a) associated with it. Let C (a, t) be the cost incurred to make possible the execution of action a at time t. Similarly, for each fact f, C (f, t) is the cost incurred to achieve f at time t. C (a, t) and C (f, t) are calculated as shown below: C (a, t) = ΣC (f, t) ∀ f ∈ preconditions (a) C (f, t) = C (a, t) + Cexec (a) where f is one of the effects of a. As the graph is being built, the costs C (f, t) and C (a, t) are updated whenever a better cost is found. The change in the cost is propagated to all other facts and actions. RTPG is built until there are no events in the event queue that can decrease the cost of any fact. A relaxed plan is extracted from the RTPG. The total execution cost of the actions in this relaxed plan gives the heuristic value.
5 Experimental Results The experimental setup to evaluate the performance of TPA is shown in Figure 6. Sapa generates a plan p for a given problem pbm in a particular domain. Similar Problems Generator generates a set of problems, sim-pbm, similar to pbm. The set of problems in sim-pbm is solved by both TPA and Sapa. The original plan p of pbm is given as input to TPA for adaptation. Let Psapa and Ptpa be the set of plans generated by Sapa and TPA respectively. The time taken to generate the plans in Psapa and Ptpa are compared with each other. In addition, the makespans of the plans are also compared. Makespan of a plan is its overall duration i.e., time taken to complete the execution of the plan.
SAPA
Planning Problem
Plan Temporal Plan Adaptor
Similar Problems Generator
Plans
Similar Problems
SAPA
Plans
Figure 6 Experimental Setup Similar problems are generated by applying one or more of the following methods: • By changing the values of functions in the planning problem that are used to represent the resource values and duration of the functions. • By adding new objects and goals to the planning problem. • By removing existing objects and goals. The second and third methods also involve modifying existing goals. Changing the values of a function changes the duration of the action either directly or indirectly. For example, in Satellite Domain slew-time function is used to represent the time taken by the satellite to turn from one direction to another. Changing the value of this function directly changes the duration of the action turn_to. But, changing the value of the distance function in Zeno Travel domain indirectly changes the duration of the action fly i.e., the time taken to fly an aircraft from one city to another. This is because the duration of the action fly is found by dividing the distance between two cities by the aircraft’s speed. The experimental results are shown in Figure 7 for three domains: Depots, Satellite and Zeno Travel. The graphs on the left compare the time taken by TPA and Sapa to generate plans. The graphs on the right compare the makespans of the plans generated by TPA and Sapa. Depots domain consists of pallets, crates, trucks, depots, distributors and hoists. Depots and distributors are locations. Pallets hold the crates in any location. Typical goals involve transporting crates from one location to another. Satellite domain consists of satellites with each of them equipped with different instruments. Goals involve taking image of different directions with these instruments. Images need to be taken in different modes. Certain instruments support certain modes of imaging. The zeno travel domain consists of persons, aircrafts and cities. Goals involve transporting persons from one city to another using the aircrafts. The experiments results show that the time taken to generate plans by TPA is much less than the time taken by Sapa to generate plans. The black line highlights those problems in which Sapa timed out i.e., Sapa did not find a solution within the given time limit of 3600 seconds. Sapa has timed out in 2, 1 and 11 problems in depot, satellite and zeno travel domains respectively. In most of the problems the makespans of the TPA plans are almost same as the makespans of the Sapa plans. In some cases, the makespans of Sapa plans are much lower than the makespans of the TPA plans.
SAPA
TPA
10000
250
1000 100 10
200 150 100 50
1
0 1
3
5
7
9 11 13 15 17 19
1
Problem s Sim ilar to depots-pbm
TPA
5
SAPA
7
9
11 13 15 17 19
TPA
SAPA
500 400
1000
Makespan
Time Taken in seconds (Log Scale)
3
Problem s Sim ilar to depots-pbm
10000
100 10 1
300 200 100 0
1
4
7 10 13 16 19 22 25 28
1
Problem s Sim ilar to sat-pbm
TPA
SAPA
150
Makespan
1000
10 1
7
10 13 16 19 22 25 28
TPA 200
100
4
Problem s Sim ilar to sat-pbm
10000
Time Taken in seconds (Log Scale)
SAPA
300 Makespan
Time Taken in seconds (Log Scale)
TPA
SAPA
100 50 0
1
5
9 13 17 21 25 29 33 37
Problem s Sim ilar to zeno-pbm
1 4 7 10 13 16 19 22 25 28 31 34 37 40 Problem s Sim ilar to zeno-pbm
Figure 7 Experimental results in Depots, Satellite and Zeno Travel domains. The plots on the left show planning times. Plots on the right show makespans of the plans generated.
6 Conclusions Including temporal and resource constraints has taken planners one step closer to the real world problems. This paper explored the process of plan adaptation for domains with durative actions and resource constraints. Plan adaptation is crucial to reuse of retrieved plans or repair of plans in a changing environment. We described the Temporal Plan Adaptor, a domain independent system to adapt plans with durative actions. The approach is to salvage as much of the older plan as possible, and fill in the rest by planning from first principles. The experimental results show that generating a temporal plan by adapting an existing temporal plan using TPA can be much faster than generating a temporal plan from scratch. TPA also adapts plan with resource constraints in addition to temporal constraints. Although there is no special step during the adaptation process, this is taken care of when checking the applicability of actions to a state. Future work may involve improving the efficiency of the algorithm by including a step to handle the resource constraints during the adaptation phase.
References [Au et al., 2002] Tsz-Chiu Au, Hector Muñoz-Avila, Dana S. Nau. On the complexity of plan Adaptation by derivational analogy in a universal classical planning framework. In Proceedings of ECCBR, pages13-27, 2002. [Blum and Furst, 1997] Avrim L. Blum and Merrick L. Furst. Fast planning through planning graph analysis, In Artificial Intelligence, 90:281-300, 1997. [Bylander, 1994] T. Bylander. The computational complexity of propositional STRIPS planning. Artificial Intelligence, 69:161-204, 1994. [Do and Kambhampati , 2003] Minh B. Do and Subbarao Kambhampati. Sapa: a scalable multi-objective metric temporal planner, JAIR, 20:155-194, 2003. [Fikes and Nilsson, 1971] Richard E. Fikes and J. Nilsson. STRIPS: a new approach to the application of theorem proving to problem solving. Artificial Intelligence, 2:189-208, 1971. [Fox and Long, 2003] Maria Fox and Derek Long, PDDL2.1: an extension to PDDL for expressing temporal planning domains, JAIR, 20:61-124, 2003. [Francis and Ram, 1995] Anthony G. Francis, Jr. and A. Ram. A domain-independent algorithm for multi-plan adaptation and merging in least-commitment planners. In D. Aha and A. Rahm (Eds.), AAAI Fall Symposium: Adaptation of Knowledge Reuse, Menlo Park, CA. AAAI Press, 1995. [Gerevini and Serina, 2000] Alfonso Gerevini and Ivan Serina. Fast plan adaptation through planning graphs: local and systematic search techniques", in Proceedings of the 5th International Conference on Artificial Intelligence Planning Systems (AIPS-00), AAAI Press, 2000. [Hanks and Weld, 1995] Steve Hanks and Daniel S Weld. A domain-independent algorithm for plan adaptation. JAIR, 2:319-360, 1995. [Hoffmann and Nebel, 2001] Jorg Hoffmann and Bernhard Nebel. The FF planning system: fast plan generation through heuristic search. JAIR, 14:253-302, 2001. [Kumashi and Khemani, 2002] Praveen K. Kumashi and Deepak Khemani. State space regression planning using forward heuristic construction mechanism. In Proceedings of the International Conference on Knowledge Based Computer Systems, pages 489-499, 2002. [Leake, 1996] David Leake. Case-based reasoning -- experiences, lessons and future directions. AAAI Press/ The MIT Press, 1996. [Smith and Weld, 1999] David E. Smith and Daniel S. Weld. Temporal planning with mutual exclusion reasoning, In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999.