A Software Framework to Support Adaptive Applications in Distributed ...

Viewer
Transcript

2009 11th IEEE International Conference on High Performance Computing and Communications

A Software Framework to Support Adaptive Applications in Distributed/Parallel Computing Hao Liu, Amril Nazir, Søren-Aksel Sørensen Department of Computer Science, University College London United Kingdom {h.liu, a.nazir, s.sorensen}@cs.ucl.ac.uk

Abstract— Resource allocations are performed statically for traditional distributed/parallel applications prior to launching application executions. This limitation could cause long resource waiting time when multiple resources need to be co-allocated. In contrast, the Adaptive Distributed/Parallel Applications (ADA), allowing resources to be added and released during executions, can adapt to the dynamic feature of common distributed computing environments (e.g. the Grid). The goal of the work is to create a tool to allow users to easily develop and run ADAs without dealing with the underneath distributed resource environments. We introduce a novel software package, Application Agent (AA), to support the execution of ADAs, including automatic resource allocation, dynamic process deployment, and enabling process wide-area communication. An AA-enabled application can be started on any internet-connected machines and the AA will dynamically conﬁgure a virtual machine from the local machine to remote available machines to satisfy the execution. The AA is composed of two parts. The ﬁrst part is a library of AA interface routines, which contains user-callable functions for developers to integrate their applications with the AA. This second part is the daemons, which dynamically collect computational resources on the Internet to create a wide-area virtual machine to execute an application.

A large fraction of parallel applications today are loosely synchronous, i.e. computations on one process continually depend on data produced by other processes. For those applications, the resource allocations are normally required to perform statically prior to mapping data and launching application executions. The allocated resource environment cannot be changed during the execution. Previous research developed various technologies including resource co-allocation [5], [6] and Advanced Reservation [5], [7] to insure all the processes are deployed on the qualiﬁed resources before proceeding application execution. One problem of this is that the application have to wait all the required resources to be available otherwise it can’t start execution. Although Advanced Reservation has somewhat relieved the time wastes, it is still not a promising approach in common distributed environments like the Grid, where the resources are heterogeneous and dynamically changing [8].

Index Terms— user-oriented software, Adaptive Distributed/Parallel Applications (ADA), resource allocation, process deployment, agents, wide-area distributed computing.

With adaptive applications, resource allocation is allowed to perform dynamically during the execution rather than statically prior to execution. For example, by using the batch processing approach, break-downed jobs in one application can be scheduled individually. Some jobs can start immediately and others can start later according to local DRM’s global scheduling. The Master-Worker [9] paradigm, is one of the paradigm that provides this feature to enable dynamic resource allocation. In Master-Worker, a master program dispatches pieces of task to several worker programs, which then compute the tasks, send the results to the master, and request for new piece of tasks (and so on). The number of workers can dynamically adapt to resources that discovered by DRMs. The computation scale can expand with more workers added when other resources become available or shrink with dismissal of workers when some of the resources fail. The batch processing and master-worker paradigms are suitable for embarrassingly parallel problems that have no or low task dependence. However, for most parallel applications, the resource re-allocation during the execution may lead to load balancing delivered by process checkpointing, communication preserving and process migration [10], [11]. In particular, the stop/migration process can involve large data transfer and restarting the application can incur expensive startup costs. This is perhaps the reason why the dynamical resource

I. I NTRODUCTION As a result of decreasing of workstation cost and the advent of high-speed networks, it is more feasible to use multiple geographically distributed compute resources to solve complex computational problems - which is called distributed/parallel computing. A distributed/parallel application is normally composed of a number of processes that run simultaneously on multiple computational resources with more or less communications. In order to execute the applications, a lot of Distributed Resource Management System (DRM) 1 are developed to take charge of resource discovery and allocation (e.g. Condor [1], SGE [2], and Globus [3]). Those middleware allocate resources for applications according to advanced provided job descriptions (e.g. ClassAd [1], Job Submission Description Language (JSDL) [4]), deploy application processes collaborating with other communication mechanisms and start applications on behalf of users. 1 A DRM always manages a cluster of resources and is responsible for allocating jobs to the resources using some resource allocation/scheduling policy that may take into resource availability, user priorities, job executing time etc.

978-0-7695-3738-2/09 $25.00 © 2009 IEEE DOI 10.1109/HPCC.2009.30

A. Adaptive Distributed/Parallel Applications (ADA)

563

Authorized licensed use limited to: University College London. Downloaded on September 18, 2009 at 13:17 from IEEE Xplore. Restrictions apply.

Process 1

5 1

Process 2

3 2

Process 3 10

7

by users.

9

B. Objectives

8 4

6

Fig. 1. Distributed automata graph: automata objects are distributed into ”herder” processes.

allocation is not commonly accepted by those application with large communication among processes. Another paradigm is, distributed automata graphs [12], [13] (Figure 1) which is a ﬂexible extension to the standard cellular automata (CA), providing an easy and efﬁcient way to migrate computation loads among computing resources for such applications. In this object-oriented paradigm, users simply create an object interface and connect it to the object through a pipeline that will pass a representation of the interface call to the object. Each of object performs a computational task for the application. This simple action allows users to distribute the automata objects across any collection of resources. An automaton will call the interface of the object it wants to communicate with in the normal fashion and the call will take the form of a message being passed to the target object. If the two objects are part of the same process, the transfer will be internal to the process, else it will use some form of message passing service. The main advantage is that the communication between objects uses reference address only, not physical. By this means, the automata objects can be easily moved among the ”herder” processes that stay on resources without addressing kernel level process checkpointing/migration problems. This allows a resource to be added to the application during the execution: the application simply spawns another herder process on the added resource and then moves some automata objects on the empty process for load balancing. Paper [14], [15] introduce the similar ideas to support the dynamic application reconﬁguration based on objects migration. We generalize the above applications as - Adaptive Distributed/Parallel Applications (ADA). A ADA is deﬁned as a distributed/parallel application that is able to add or release processes/resources at any time during the execution; and autonomously balances the computational load over each process to adapt to current resource conﬁguration (reconﬁgurable). A ADA can be implemented by various approaches, such as batch processing (jobs need to be scheduled by a control program rather than DRM), Master-Worker, distributed automata graphs and etc. A ADA has high ﬂexibility about execution: it is able to tolerate inadequate resources in execution and allow addition resources added on-the-ﬂy to meet certain performance requirements. It perfectly suit to the dynamic nature of the distributed computing environments where resources become available and unavailable frequently. Furthermore, a ADA can start immediately as long as there is one resource available. They are specially suitable for the interactive applications whose results must be seen instantly

Providing dynamic resource allocation support for ADAs is a challenge to current distributed computing systems. To our knowledge, there is no well-deﬁned interface for ADAs to communicate with the resource infrastructure to add/release resources at run-time. In addition, most DRMs only provide global scheduling but they do not support dynamic applicationlevel resource co-allocation and deployment. Once execution is started, they treat later added processes as independent jobs and do not deploy them communicable with previous added processes of an application. Current related research provide very limited functionalities to support ADAs in speciﬁc environments. PVM [16] is a software package that permits a heterogeneous collection of computers hooked together by a network to be used as a single virtual machine. It is provided with features to support ﬂexible process management and virtual machine dynamic conﬁguration. Applications developed by PVM can dynamically add hosts from dedicated host tables, which must be conﬁgured by users. The ePVM [17] architecture is a PVM extension that enable multi-domain computing for PVM applications. However it also requires advanced manual resource conﬁguration and does not work with common DRMs. Condor-PVM [1] provides a dynamic resource management for PVM applications executing in a Condor cluster. Whenever a PVM program asks for nodes, the request is re-mapped to Condor, which then ﬁnds a machine in the Condor pool via the usual mechanisms, and adds it to the PVM virtual machine. Condor provides support only to PVM applications based on the Master-Worker programming paradigm with low task dependence. It is limited to a single domain computing and is restricted to use Condor as the job submitter and scheduler. To overcome this limitation, what we need is a user-oriented software to make the development and execution of ADAs as easy as possible. The application developers should only focus on application functionality rather than on dealing explicitly with the underlying resource management issues. The execution should be immediately available to end users on any machines as long as the machine connects to the Internet: when the application is invoked, the software will act like an agent to automatically ﬁnd resources for the applications and deploy the processes on behalf of applications. What applications need to do is only to balance the computational load (i.e., objects) according to the current resource conﬁguration that is given by the software. The execution can start from the users’ desktop and expand to remote computation resources gradually as usage continues to grow. The execution can outspread to a wide-area distributed environment that is composed of a couple of cluster/domain, to meet the performance requirements. The biggest advantages of the computing model is that all the computing and resource issues are totally transparent to users, who could just sit in front of their desktop to monitor and view their results. To achieve the above objectives, we developed a software framework - Application Agent (AA) to support the ADA

564

Authorized licensed use limited to: University College London. Downloaded on September 18, 2009 at 13:17 from IEEE Xplore. Restrictions apply.

Ap p lic atio n

T he first started p ro cess

LC o mm P MW C o mm

re que st to add a pro c e ss

P ro cess

AA

1

RD

no tifity

Virtual Machine A ho me co mp uter

Re uqe st re so urc e s allo c atio n

C ondor

GR AM

SGE

2 F ro nt-end

O the r M e ta-sc he dule rs 3

3

4

D e plo y pro c e sse s o n be half o f the applic atio n

4

development and execution. The AA is composed of two parts. The ﬁrst part is a library of AA interface routines (API) which enable applications to be integrated with AA. The second part is daemons that reside on the machines making up the execution environments for ADAs. II. A PPLICATION P ROGRAMMING I NTERFACE As introduced, a ADA is composed of a number of processes that are added on-the-ﬂy. We simply make each process to run on one resource. A resource addition action is adding a new resource and spawning a new process with the possibility of later computation migration. To make resources transparent to applications, we let applications only deal with their process management: the application will send request to add processes and the AA will discover resources and will add processes on the allocated resources. Subsequently, the AA will establish a communication link with previous added processes. The application still continues to run during this process. Figure 2 shows an overview of how the AA works for an application, collaborating with DRMs. The AA resides between applications and resource infrastructures to support such transparent addition/release of resources. Each AA takes charge of one application. Application Developers must build applications based on the programming interface provided by the AA. Each process’s binary ﬁle has to be compiled with the AA library. Developers can perform three basic operations from AA API: add processes, stop processes, and exchange messages among processes. This simple feature makes AA very easy to use. The AA allows application developers to start a named process at anytime during the execution. The process request is performed using AddProcess( Name of executable ) which is a non-blocking function. As soon as it is invoked, the AA will request a resource from external DRMs. Applications then can keep running. As in a non-dedicated environment the time it takes for a resources to be allocated is not bound, developers must embed the GetNotiﬁcation() with PROCESS ADDED tag into the execution iteration to keep checking if the process addition has done. If a process is successfully added on a new resource, the GetNotiﬁcation() will return true. Developers can unpack the notiﬁcation message by Upk*(). The ﬁrst data of

6

6

7

7

UC L C o nd o r C luster

S G E C luster

Fig. 2. An overview of AA: The AA does resource allocation, process deployment, and enabling process communication on behalf of an application.

5 F ro nt-end

Internet

Fig. 3. AA daemons build a wide-area virtual machine that contains 9 hosts from different environment. The numbers indicate how the virtual machine grows.

the message is the Unique Process ID (UPID) which is used to identify the added process for communication. Applications can stop a process at any time by calling StopProcess( upid ). This results in the process with the given UPID to be removed from applications and the related host to be disassociated. The host may still be reserved in the Resource Buffer of AA and later returned to applications when a similar resource request is placed [13]. To send a message, a send buffer must be ﬁrst initialized by a call to NewSendBuffer(). The message must be ”packed” into this buffer using any number and combination of PK*() functions. The completed message is sent to another process by calling Send(upid). A message is received by calling either a blocking or non-blocking receive routine and then unpack each of the packed items from the receive buffer by Upk*(). Section V introduces a simple ADA that is developed based on AA API. III. A RCHITECTURE A ADA built with AA library can be easily started on any machine on the internet and the AA virtual machine 2 will automatically grows from 1 (the computer where the ADA is invoked) to whatever size that is necessary for the application’s function. This is achieved by AA daemons. The AA has three types of daemons - LComm, PMWComm, and Resource Discoverer (RD), which respectively take charge of local communication, process management and wide-area communication, and resource acquisition. Figure 3 depicts a wide-area virtual machine that is created by AA daemons. In the following we introduce those daemons in details. A. LComm and UPID Each running process is associated with a LComm daemon, whose main functions is responsible for message passing inside a domain. There are many existing message passing mechanisms that work for local domain, such as PVM and 2 The execution environment for the ADA. A virtual machine is a software abstraction of a distributed computing platform consisting of a set of programs, which together supply the services required to execute an application as if it was on one computer.

565

Authorized licensed use limited to: University College London. Downloaded on September 18, 2009 at 13:17 from IEEE Xplore. Restrictions apply.

MPI [18]. They use certain process identiﬁers to address processes for communication. For example, MPI system assigns each process a integer based on rank beginning with 0. In AA system, we use the Unique Process ID (UPID) 3 to address processes. UPID is made to ﬁt into the largest integer data type (32 bits) available on a wide range of machines. Its design follows the requirement of wide-area cross domain communication. It contains two important information of a process: the domain ID that is used to locate the domain the process belongs to, and the native process ID that is related to the local domain. When applications perform Send(upid), if the sender and the receiver have the same domain ID, the AA will only use LComm for the message passing, otherwise it will use PMWComm to route messages to the designation domain. LComm daemon can borrow the communication function that is provided by PVM or MPI. The library of AA ﬁrstly translates a UPID into a PVM/MPI recognizable identiﬁer (map the UPID to the related tid or rank), and then use PVM/MPI library to request PVM/MPI services to accomplish a communication. The LComm daemon for the ﬁrst process is normally started with the invocation of the process when users start the application. Other LComm daemon is started by PMWComm daemons when deploying related processes. B. PMWComm A domain is usually conﬁgured with only one IP visible frontend machine that hides all its internal hosts from the external world to prevent unauthorized access [19], [20]. In order to leverage hosts on the Internet to build wide-area virtual machines, we must collects machines that are from different domains. We therefore introduced the PMWComm that resides on the frontend of domains acting like a proxy to take charge of the cross-domain computation. Each involved domain is associated with one PMWComm daemon. It controls all the process management insides the domain and also routes messages and AA embedded requests to enable cross-main communication. Process management includes deploying new processes and killing unwanted processes on behalf of applications. When a process of ADA requests to add a new process by AddProcess(), the AA library will ﬁrst interpret this request to be a resource request and pass it to the RD daemon, which will look for a resource. Since the found resource may belong to a domain that is different from the requestor, the process deployment would involve one more PMWComms. Once a qualiﬁed resource is found, RD sends the resource information including its IP address with the original process addition request to the resource associated PMWComm. The PMWComm then ﬁrstly pulls the process’s binary to the destination resource and starts the intended process remotely. Once process start succeeds, the PMWComm assigns a new 3 UPID is called so because it is globally unique, which means no process shares the same ID even though they belong to different applications. It is implemented by adding an application ID segment. This feature enables intercommunications between applications, which is a part of our future work.

found res ourc e A RD

1

4

2

P M W Co m m

5 P1 dom ain1

P M W Co m m

3 new P A dom ain2

Fig. 4. A new process is added in a different domain for process P1, performed by two PMWComms. 1: request a resource. 2: delegate the process addition request to the allocated resource associated PMWComm. 3: remote process start. 4: pass the new process information to the requestor P1 associated PMWComm. 5. notify P1 the new process is added.

UPID including the domain ID to the new process. Finally, the PMWComm needs to report the new process’s information to the requestor. This is performed by sending the information to the requestor associated PMWComm, which will pass the information back to the requestor when the GetNotiﬁcation() is activated (Figure 4). When an application requests to kill a process, the AA library delegates the request to the destination process associated PMWComm, which will ﬁnally accomplish the process release. PMWComms are spawned dynamically by the RD daemon when it discovers resources on the related domains. Each PMWComms is assigned a domain ID starting from 0. When a new PMWComm is spawned, its information (a map of domain ID and contact details, e.g. IP address) will be added into the domain table, which is synchronized over all the AA daemons. If a communication is cross-domain (the sender and the receiver have different domain ID), the sender ﬁrstly extract the domain ID from the receiver’s UPID and then contacts the receiver associated PMWComm according to the domain table to ask it to route the message to the receiver’s LComm, which will ﬁnally accomplish the message passing. C. Multiple PMWComms and Domain Tree Since some domains contain a number of subdomains, each of which have separate ﬁrewalls, multiple PMWComms need to be placed to connect to the computing hosts. For example, the HPC cluster in Computer Science department of UCL is conﬁgured by network address translation (NAT). In order to access the cluster hosts from a machine out of the CS department, PMWComms has to be placed on the frontend of the CS department and also the frontend of the HPC cluster. One of our objectives is to enable application start on any machines on the internet and the resource environment will be automatic conﬁgured. This requires the AA to ﬁnd the path to connect to the actual resource domain and place PMWComms on the relative frontends, regardless where the application starts. We therefore introduced the Domain Tree (DT) (Figure 5) to stores the topology of the domain connections. Each node in DT is a frontend address representing a domain. The root node is PUBLIC, which represents the public Internet domain that everybody has authority to access. A node may have one or more child nodes, meaning that the related domain may have one or more subdomains. Based on

566

Authorized licensed use limited to: University College London. Downloaded on September 18, 2009 at 13:17 from IEEE Xplore. Restrictions apply.

P UBLIC

am y.c s .uc l.ac . uk

gatew ay.ic .ac .u k

m orec am be.c s . uc l.ac .uk

c ondor.c s .uc l.a c .uk

H P C do m ain

Co ndo r do m ain

IC do m ain

U CL CS do m ain P ublic Inte rne t do m ain

Fig. 5.

A Domain Tree.

the DT, a simple algorithm (algorithm 1) is introduced to ﬁnd the route to connect hosts from any two domains. Algorithm 1 Algorithm to ﬁnd the route. Assume: Firewalls only restrict incoming trafﬁc. Let: The start host belongs to domain A, frontend α, the destination host belongs to domain B, frontend β Aim: To ﬁnd the path ρ from the start host to the destination host. Algorithm: 1. In DT, ﬁnd the mutual ancestor node γ of α and β 2. calculate δ = β.depth - γ.depth 3. ρ = {β.parent[δ−1] , β.parent[δ−2] , ..., β.parent, β}

Instead, it only requires the permissions to use certain host (the actual process startup is performed by PMWComm, as introduced). Current DRMs (e.g. Condor, SGE) perform resource allocation along with job deployment but do not issue resource permits to the third-party software. We therefore propose to submit a special job ”probe” to DRMs to simulate the permit acquisition. A probe is a small agent that does not perform computation for the application. It is submitted with the resource requirement (e.g. processor architecture, CPU load, memory size) of an actual application process by the RD to a DRM to request a resource. Once the probe is started on a host, it will contact RD to issue a permit with the host’s information to the AA to use this host. The probe keeps running until AA kills it, in order to hold that host (in Condor). In this manner, DRMs only act like resource matchmakers but do not perform deployment. In some situation, a process spawning would fail even if a host is granted. This is because the granted host becomes unavailable due to sudden change of resource states (e.g. machine shutdown). This failure happens because host information in the AA is outdated by the time the host is returned and the AA attempts to use the host. We reduce the possibility of this failure by making a probe continuously send ”HEARTBEAT” messages to indicate that the host is still alive. The load of the host is also synchronized at each time of ”HEARTBEAT” sending. RD is started on the host where users invoke the application. Sometimes RD may need PMWComms to communicate with DRMs that stay in local domains. In this case PMWComms will be spawned on-demand on the frontend in the connection path, according to the DT algorithm. IV. I MPLEMENTATION

Take the DT in Figure 5 as an example, we start an application in the IC domain and AA discovers resource in the HPC domain. According to the algorithm, the AA ﬁnds the path {amy.cs.ucl.ac.uk, morecambe.cs.ucl.ac.uk} to connect to the hosts from the IC domain. It therefore places two PMWComms on those frontends in the path. The message passing from IC domain to HPC domain then is achieved by requestor → P M W Comm on amy → P W M Comm on morecambe → receiver. Please note message passing from the HPC domain to the IC domain needs involvement of other PMWComms. The DT needs to be conﬁgured by resource providers when they register resource pools to the AA. D. Resource Discoverer (RD) The RD daemon takes charge of resource discovery and allocation for the AA virtual machine. It listens to the resource demands from other daemons and contacts available DRMs to request hosts. A resource request could be made on-demand after a process addition request has been made or intelligently made in advance to hide the resource allocation delay [13]. RD does not submit the actual application processes to DRMs.

The current version of AA is implemented by XML-RPC and PVM. The LComm makes use of PVM pvmd. The AA library wraps PVM library to startup and contact pvmd. After application invocation, the AA library starts a local PVM system by pvm start pvmd(). The local domain communication among process are actual performed by PVM communication mechanism. The PMWComm is implemented as a XML-RPC server and also a PVM task in its local domain. A PMWComm talks to external daemons via XML-RPC, while talks to its internal LComms via LComm communication protocol which is PVM here. To add a process in the local domain, PMWComm ﬁrst adds a granted host into the local virtual machine by pvm addhost(), and spawn the process by pvm spawn(). To add a process in an external domain, AA places PMWComms by SSH to connect the external domain. The destination domain associated PMWComm will be contacted to spawns the process on the allocated host by pvm spawn(). The cross-domain communication is based on XML-RPC and PVM. To send a message to a process belonging to a different domain, the message needs to be packed into a message buffer which will be sent to the related PMWComm

567

Authorized licensed use limited to: University College London. Downloaded on September 18, 2009 at 13:17 from IEEE Xplore. Restrictions apply.

through XML-RPC protocol. The PMWComm then packs the message into a PVM buffer and sends it to the destination process through pvm send(). The RD is also a XML-RPC server, and talks other daemons via XML-RPC. It is spawned on the local machine where users invoke the application via Fork-exec call. The RD contacts DRM through a special program RAS (Resource Acquisition Server) which is pre-placed on the submission machines of DRM (because most DRM do not support remote submission such as via Web Service). A RAS uses the system call to submit probes for resource allocation. When it receives a request from a RD, it generates a related submission ﬁle (e.g. Condor ClassAd), and uses the system() to execute the job submission command which will ﬁnally submit the job. Once a host is allocated, the probe gains the host information (host name, operation system, CPU architecture, and CPU load) by reading Linux /proc/ directory and send the information back to the RD daemon. Communication security and ﬁle transfers are provided by the standard UNIX environment including SSH and RSH protocols.

3000 blocks to the ﬁrst process as soon as it is added. More process can be added during the execution and the data will be autonomously balanced to the under-loaded processes by a local load-balancing policy embedded in each slave process: each process periodically communicates with its two neighbor processes for load balancing. The maximum number of the slave processes are restricted to 20. The control program Man control.cpp is shown as below: / / R e q u e s t t o add 20 p r o c e s s e s . f o r ( i n t i =0 ; i <20; i ++){ aa−>A d d P r o c e s s ( ” S l a v e ” ) ; } / / D i s t r i b u t e s b l o c k s t o t h e f i r s t added p r o c e s s . while ( t r u e ){ i f ( ( aa−>G e t N o t i f i c a t i o n ( PROCESS DEPLOYED NOTIFICATION ) ) > 0 ) { / / Get p r o c e s s ’ s UPID i n t u p i d = aa−>U p k I n t e g e r ( ) ; / / D i s t r i b u t e 3000 b l o c k s t o t h e / / f i r s t process ...

V. E XPERIMENT The experiment demonstrates that an AA-enabled ADA is easy to develop, easy to run, supports dynamic resource addition and wide-area computing. In the experiment, the AA has authority to access resources from two clusters in the computer science department of University College London. The department has a ﬁrewall and only conﬁgures a publicity IP visible frontend named amy.cs.ucl.ac.uk for external machines to access. The ﬁrst cluster, which is called the Condor cluster, is composed of 57 hosts, each equipped with a Intel(R) Celeron(R) CPU 2.80GHz, 502M of RAM, and running Red Hat Enterprise Linux. The cluster is a sub-domain of the department. It has a local ﬁrewall and a frontend named condor.cs.ucl.ac.uk. The resource scheduling is managed by Condor. The AA is conﬁgured as a normal user in this cluster, which means it has to compete resources with other users. The second cluster, which is called HPC cluster, is composed of 198 hosts, each equipped with a number of Intel(R) Xeon(R) 2.66GHz CPUs (1,4,8), RAM from 2GB to 30GB, running GNU Linux. The cluster is conﬁgured by network address translation (NAT) and has a frontend host (morecambe.cs.ucl.ac.uk) that interfaces the cluster to the outside. The resource scheduling is managed by SGE. The AA uses the default queue all.q to submit resource requests in the cluster. A. The ADA We created a ADA based on AA library to draw a Mandelbrot set on a 3000 × 3000 dots canvas with magniﬁcation = 1.0. The benchmark iteration is 500,000. We partition the data to 3000 blocks and each block has 3000 dots to compute. The ADA is composed of a control program (Man control.cpp) which will be started on a desktop and a number of slave processes which will be started on the remote resources to perform the actual computation. The control program distributes the

} ...

break ;

} ... while ( t r u e ){ / / More p r o c e s s e s a r e b e i n g added f o r t h e / / computation i f ( ( aa−>G e t N o t i f i c a t i o n ( PROCESS DEPLOYED NOTIFICATION ) ) > 0 ) { i n t u p i d = aa−>U p k I n t e g e r ( ) ; / / a s s i g n n e i g h b o r p r o c e s s e s ’ UPID t o t h e / / t h e added p r o c e s s f o r l o c a l / / load balancing . ... } / / R e c e i v e r e s u l t s , draw M a n d e l b r o t s e t i f ( aa−>NReceiveFrom ( WILDCARD ) >0){ ...

B. The ADA’s Execution The ADA was started on a laptop that was connected to the Internet through a BT home router. The invocation was simply achieved by ./M an control. When the ADA was started, the AA automatically placed PMWComms on amy.cs.ucl.ac.uk, condor.cs.ucl.ac.uk and morecambe.cs.ucl.ac.uk to connect the resources and added processes for the application. Figure Figure 6 depicts how the AA ﬁnds the path to allocate resources and deploy processes in the two clusters. Processes were dynamically added little by little due to resource competitions from other users. The whole execution ﬁnally leveraged 15 hosts including 5 from the Condor cluster and 10 from the HPC cluster. The execution took about 40 minutes. The dynamic resource conﬁguration and execution is shown in Figure 7. We can see the ﬁrst process was immediately added from the HPC cluster in the beginning. Its load was 3000

568

Authorized licensed use limited to: University College London. Downloaded on September 18, 2009 at 13:17 from IEEE Xplore. Restrictions apply.

Al l ocat e r esour ces

if users require 10 hosts to run, in this situation users have to wait at least 25 minutes to get all the hosts to be synchronized. On the contrary, the ADA fully used the resources and users could see the output of application immediately.

c o ndo r.c s .uc l.ac .uk C ondor

Applica t ion

AA

VI. R ELATED W ORK Depl oy pr ocesses am y.c s .uc l.ac .uk

m o re c am be .c s .uc l..

SG E

Fig. 6. The AA deploys application processes in the remote Condor and HPC cluster.

2000

1500

1000

500

0

0

500

1000

1500

2000

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

2500

Fig. 7. Dynamic process addition for the Mandelbrot ADA. ”Pink” processes were added in Condor cluster, and ”green” processes were added in HPC cluster. Load (left blocks) was redistributed dynamically by the local load balancing policy when more processes were added.

blocks at that moment. The execution continued with only the one process’s computation. After about 4 minutes, 3 hosts in the SGE cluster and 1 host in the Condor cluster became available (probably other users ﬁnished their work on them) and the AA deployed 4 processes on them. The load in the ﬁrst process was immediately redistributed to the 4 processes. During the execution other processes were gradually added for the computations and processes continuously communicated with neighbors to perform load balancing. The control program drew the Mandelbrot set on-the-ﬂy when it received ﬁnished dots. The drawing speed changed when more processes were involved in the computation. The execution started as soon as we invoke the application. Traditional applications require all the processes to be placed on resources otherwise the execution cannot proceed (e.g. gmandel, http://gmandel.sourceforge.net/). For example,

There are many programming frameworks that aim to ease the development and the execution of parallel/distributed applications. Most of them are intended to hide the complexity of accessing distributed resources for users. However few of them is based on the concept of ADA. MW [21] is a usability focused framework which aims to enable a larger community of users to easily build masterworker applications in the grid environments. MW uses Condor scheduling system as the resource management tool, and use either Condor-PVM or MW-File a ﬁle-based, remote I/O scheme for message passing. Grid Application Framework for Java (GAF4J) [22] is a framework that abstracts grid semantics from the application logic and provides a simpler programming model to develop Java grid applications. It abstracts the details of interfacing with a Globus enabled grid infrastructure (GT 2), resulting in a simple programming model that can be helpful for the quicker development of maintainable grid client applications. Both MW and GAF4J make the resource issue transparent to the application developers. Grid Application Development Software (GrADS) [23] project aims to develop an ambitious alternative: replace the discrete, user controlled stages of preparing and executing a grid application with an end-to-end software-controlled process. The project seeks to provide tools that enable the user to focus only on high-level application design without sacriﬁcing application performance; this is achieved by incorporating application characteristics and requirements in decisions about the application??s grid execution. The GrADS architecture incorporates user problem solving environments, grid compilers, schedulers, performance contracts, performance monitors, and re-schedulers into a seamless tool for application development. Charm++ [24], [14] is one of few existing middleware that supports the concept of ADA (based on the notion of Distributed Automata Graphs). It is designed with the goal of enhancing programmer productivity by providing a highlevel abstraction of a parallel program while at the same time delivering good performance. Programs written in Charm++ are decomposed into a number of cooperating message-driven objects called chares. When a programmer invokes a method on an object, the Charm++ runtime system sends a message to the invoked object, which may reside on the local processor or on a remote processor in a parallel computation (the physical object address is transplant to users). The chares in a program are mapped to physical processors by an adaptive runtime system. The mapping of chares can be changed dynamically according to processors’ behavior during program execution. Amber [15] is another object-oriented parallel-distributed programming framework supporting application reconﬁguration to adapt to the change of resource environments. Applications written by Amber are decomposed into objects and threads distributed among the logical nodes distributed across a set

569

Authorized licensed use limited to: University College London. Downloaded on September 18, 2009 at 13:17 from IEEE Xplore. Restrictions apply.

of physical nodes. A running parallel-distributed program can respond to three types of node reconﬁguration: the node set can shrink in size, stay the same size but change member-ship, or grow. The runtime system of Amber provides the mechanism for object virtual addressing and migrating objects, while the application takes charge of object load balancing polices to resolve load imbalances caused by growth and shrinkage of the node set. Different from them, the AA is whereas concentrated on the resource issue and process level support for ADA. It links the dedicated resources based application reconﬁguration framework (charm++ and Amber) with the common distributed computing environment (e.g. the Grid). Charm++/Amber and AA can complement each other’s functionality and could be integrated: Developers create a ADA based on Charm++/Amber’s object migration services and let AA take care of the runtime resource discovery and deployment issues.

VII. C ONCLUSION AND F UTURE W ORK In this paper we introduce a software framework AA that dynamically conﬁgures the execution environment for the ADA, which autonomously re-organizes the computational load according to current resource conﬁguration to optimize performance. Here we simply let the AA provide resources as long as the ADA demands (using AddProcess()), which is not intelligent enough. In the future work, we propose that the AA should decide what resources make up the virtual machine by itself to satisfy the performance requirement while the application only controls the distribution of computing objects. The AA is therefore allowed to add, remove and replace resources without needing application’s requests. It will inform the application that resource X will be removed or replaced with resource Y or that resource Z has been added. The proposal can be achieved based on a simple control theory: the application continuously reports a performance value to the AA, which will rely on the reported value to decide how to reconﬁgure the execution environment to ensure a performance benchmark (e.g. rending N frames per second, running N iteration per second) that required by users. The application will then autonomously transform its objects distribution to adapt to the resource reconﬁguration. Due to the ﬂexibility nature of ADA, the AA could try different resource conﬁguration, and learning from the historical execution reactions from these conﬁgurations it would ﬁnally tune the execution performance to an optimal level. By this means, the resource selection issue is totally out of application’s hand. The application would only take charge of computing and load balancing while the AA will do everything related to resource management on behalf of the application. This enables end users to sit in front of their computers to monitor the results with no knowledge of computing and resources. This novel approach is much different from traditional resource selection approaches (e.g. Condor matchmaking [25] and JSDL [4] ) which require users to explicitly provide exact resource requirements e.g. machine processing ability, the number of machines needed etc..

R EFERENCES [1] Condor, “Condor online manual version 7.0,” http://www.cs.wisc.edu/condor/manual/v7.0/. [2] “N1 grid engine6 administration guide,” Sun Microsystems,Inc, Tech. Rep. [3] I. Foster and C. Kesselman, “The globus toolkit,” pp. 259–278, 1999. [4] M. D. Ali Anjomshoaa, Fred Brisard, “Job submission description language (jsdl) speciﬁcation, version 1.0,” Global Grid Forum, Tech. Rep., 2005. [5] K. Czajkowski, I. T. Foster, and C. Kesselman, “Resource coallocation in computational grids,” in HPDC, 1999. [Online]. Available: citeseer.ist.psu.edu/czajkowski99resource.html [6] N. T. Karonis and et al., “Mpich-g2: A grid-enabled implementation of the message passing interface.” [Online]. Available: citeseer.ist.psu.edu/619632.html [7] C. L. C. L. B. N. K. R. A. Foster, I.; Kesselman, “A distributed resource management architecture that supports advance reservations and co-allocation,” Quality of Service, 1999. IWQoS ’99. 1999 Seventh International Workshop on, pp. 27–36, 1999. [8] A. Nazir, H. Liu, and S.-A. Sørensen, “On-demand resource allocation policies for computational steering support in grids,” in International Conference on High Performance Computing, Networking and Communication Systems, Orlando, USA, 2007. [9] J. Goux, J. Linderoth, and M. Yoder, “Metacomputing and the master-worker paradigm,” 1999. [Online]. Available: citeseer.ist.psu.edu/goux00metacomputing.html [10] S. Vadhiyar and J. Dongarra, “Srs - a framework for developing malleable and migratable parallel applications for distributed systems,” 2002. [Online]. Available: citeseer.ist.psu.edu/vadhiyar02srs.html [11] M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny, “Checkpoint and migration of UNIX processes in the Condor distributed processing system,” University of Wisconsin - Madison Computer Sciences Department, Tech. Rep. UW-CS-TR-1346, April 1997. [12] A. Nazir, H. Liu, and S.-A. Sørensen, “Powerpoint presentation: Steering dynamic behaviour,” in Open Grid Forum 20, Manchester, UK, 2007. [13] H. Liu, A. Nazir, and S.-A. Sørensen, “Preliminary resource management for dynamic parallel applications in the grid,” in Gridnets 2008. LNICST, 2008. [14] O. S. Lawlor and L. V. Kal, “Supporting dynamic parallel object arrays,” in In Proceedings of ACM 2001 Java Grande/ISCOPE Conference, 2001, pp. 21–29. [15] M. J. Feeley, B. N. Bershad, J. S. Chase, and H. M. Levy, “Dynamic node reconﬁguration in a parallel-distributed environment,” in In Proceedings of the 1991 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 1991, pp. 114–121. [16] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. S. Sunderam, PVM: Parallel Virtual Machine: A Users’ Guide and Tutorial for Networked Parallel Computing. Cambridge, MA, USA: MIT Press, 1994. [Online]. Available: citeseer.ist.psu.edu/geist94pvm.html [17] F. Frattolillo, “Running large-scale applications on cluster grids,” Int. J. High Perform. Comput. Appl., vol. 19, no. 2, pp. 157–172, 2005. [18] M. Snir and S. Otto, MPI-The Complete Reference: The MPI Core. Cambridge, MA, USA: MIT Press, 1998. [19] K. Ingham and S. Forrest, “A history and survey of network ﬁrewalls,” 2002. [Online]. Available: http://www.cs.unm.edu/˜treport/tr/0212/ﬁrewall.pdf [20] M. Petrone and R. Zarrelli, “Enabling pvm to build parallel multidomain virtual machines,” in PDP ’06: Proceedings of the 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. Washington, DC, USA: IEEE Computer Society, 2006, pp. 187–194. [21] J.-P. Goux, S. Kulkarni, J. Linderoth, and M. Yoder, “An enabling framework for master-worker applications on the computational grid,” in HPDC, 2000, pp. 43–50. [Online]. Available: citeseer.ist.psu.edu/goux00enabling.html [22] I. A. Technology, “Grid application framework technical white paper,” Tech. Rep., 2004. [23] K. Cooper, “New grid scheduling and rescheduling methods in the grads project,” 2004. [Online]. Available: citeseer.ist.psu.edu/cooper04new.html [24] Adapting to load on workstation clusters. IEEE Computer Society Press, 1999. [25] R. Raman, “Matchmaking frameworks for distributed resource management,” Ph.D. dissertation, 2000, supervisor-Miron Livny.

570

Authorized licensed use limited to: University College London. Downloaded on September 18, 2009 at 13:17 from IEEE Xplore. Restrictions apply.