Rapid Application Configuration in Amazon Cloud ...

Viewer
Transcript

Rapid Application Configuration in Amazon Cloud using Configurable Virtual Appliances Huan Liu Accenture Technology Labs 50 W. San Fernando St., Suite 1200 San Jose, CA 95113

[email protected]

ABSTRACT Virtual Appliance (VA) promises to dramatically change how software is distributed, installed and conﬁgured. Although it simpliﬁes some aspects of the process, it falls short of solving the complete problem. Furthermore, it introduces additional management hassles because of the proliferation of VAs, one for each commonly used scenario. Borrowing the Spring IoC (Inversion of Control) concept from the Java community, we propose a new approach which includes three components: Conﬁgurable VAs (similar to a Java class), separate conﬁguration metadata (similar to a conﬁguration ﬁle), and the Rapid Application Conﬁgurator (RAC) container (similar to the Spring IoC container). The separation of concerns allows each component to be independently developed and the cost to be eﬀectively amortized. We describe the design and implementation of RAC in the Amazon EC2 environment.

Categories and Subject Descriptors D.2.9 [Software Engineering]: Management—Software conﬁguration management; K.6.3 [Management of Computing and Information Systems]: Software Management

General Terms Design, Management

Keywords IoC, Virtual Appliance, Conﬁguration, Cloud

1.

INTRODUCTION

Installing and conﬁguring an application correctly is a time consuming, labor intensive and complex manual process [5], especially for today’s multi-tier applications (e.g., an SAP application). Although Virtual Appliances (VA) [18] – virtual machine images with pre-packaged and pre-installed software components – has addressed some challenges on

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’11 March 21-25, 2011, TaiChung, Taiwan. Copyright 2011 ACM 978-1-4503-0113-8/11/03 ...$10.00.

software installation and conﬁguration, there are still two problems remaining. First, VA causes a proliferation of the machine images. Most software has many components to choose from. During a traditional software installation, the installation program prompts the user to choose the right components relevant to the particular deployment. However, when the VA producers generate the VA, they are not aware of the actual deployment scenario. Since the pre-installed VA can only store one particular combination of software components, they have to generate a large number of VAs to accommodate all potential users. Even though exhaustive enumeration may not be necessary, we still need one VA for each popular combination of software components and deployment scenario. Second, the diﬃculty in setting the correct conﬁguration parameters, in particular, the inter-dependencies among the many VAs remains. In a multi-tier application, a VA has to be aware of other VAs of the same application. For example, the application server VA needs to know the IP address of the database VA, as well as what database software is used so that it can adjust its access parameters. Some prior work [17] attempted to address the problem. They proposed the concept of Virtual Appliance Network (VAN) which packages several VAs and their corresponding network conﬁguration into a bigger VA. Their proposal suﬀers from two drawbacks. First, it is harder to substitute a VA in the VAN, e.g., change the database VA from MySQL to Oracle database. Second, it lacks the ﬂexibility for power users to customize the conﬁguration beyond the default setting. In this paper, we propose a diﬀerent way of software installation and conﬁguration. Our approach is based on separation of concerns, i.e., the conﬁguration of an application should be separated out from the application logic. Today, the conﬁguration is embedded in each VA. When a conﬁguration is changed, it may have to be manually applied to each VA. In contrast, we propose to extract out the conﬁguration as separate conﬁguration metadata. In addition to centralizing the conﬁguration, it can also reduce the amount of customization required. For example, the IP address of each Virtual Machine (the virtual server instantiated from a VA) can be a variable that is automatically set during deployment, thus the user does not have to specify it manually when setting up inter-dependencies. Our proposal not only simpliﬁes application conﬁguration, but also allows application logic and application conﬁguration to evolve independently. Our proposal borrows ideas from the Java programming community, speciﬁcally the IoC (Inversion of Control) con-

tainer for easy J2EE application deployment. We will ﬁrst describe the IoC concept, then our proposal, followed by our prototype implementation in the Amazon EC2 environment and our preliminary experience with a couple of applications.

2.

SPRING IOC CONTAINER FOR JAVA PROGRAMS

We borrow design ideas from the Java programming community, where there is a similar problem associated with the conﬁguration complexity. Since we will draw heavy analogy from that context, we ﬁrst describe their problem and the associated solution. Java is an object oriented programming language. Java programmers focus on developing classes, where each class encapsulates a set of private data and provides a set of public methods to manipulate the data. In general, a class provides a distinct functionality, such as an array implementation. From a class, one can instantiate many objects, where each object instantiates a set of data according to the class description. The data could be set diﬀerently from one object to another, and the object could behave diﬀerently depending on how the data is conﬁgured. A program is typically composed of many objects, who jointly implement the functionality of the program. An object could be dependent on another object in order to implement certain functionalities. As an example, consider the following code:

the programming logic, it is time consuming to make any change, even if it is only a simple object replacement. The diﬃculty in maintaining the initialization code for Java programs is similar to the diﬃculty in conﬁguring applications: the program conﬁguration is mingled with the programming logic. To deal with the diﬃculties, the Spring Framework proposed the IoC (Inversion of Control) concept. Since its inception, the Spring Framework has largely taken over as the de facto standard platform for J2EE development, replacing Enterprise JavaBean. Much of the success can be attributed to IoC, which greatly simpliﬁes the conﬁguration of a J2EE program. IoC is a container (shown in Fig. 1) which instantiates objects and applies dependency between objects, just like a piece of initialization code would do. The user speciﬁes the initialization and conﬁguration through conﬁguration metadata, which can be in several formats, such as a XML-based format or supplied programmatically. The conﬁguration metadata is read by the IoC container to carry out the actions.

Public class Sensor { Adapter adapter; } Public class SerialAdapter implements Adapter{ ...... } Public class ParallelAdapter implements Adapter{ ...... } This code implements a sensor object which can report some measurements. It needs an Adapter object which implements the interface and protocol needed to communicate with the host. Depending on the particular set up, the sensor object may communicate either through a Serial adapter or a Parallel adapter. At the start of a sensor application, a special set of code– the initialization code–is written to instantiate objects from classes and set up dependencies between objects. For example, the sensor object needs a reference to the Adapter object to communicate with the host. A sample initialization code is shown below. Main() { adapter = new SerialAdapter(); sensor = new Sensor(adapter); } The initialization code is cumbersome to write and difﬁcult to maintain. When a program conﬁguration needs to change, e.g., replace the Serial adapter with the Parallel Adapter, the initialization code has to be changed and recompiled. Since the initialization code is often buried inside

Figure 1: Spring IoC Instead of using initialization code, the instantiation and conﬁguration of the Sensor and Adapter objects could be achieved when the IoC container parses the conﬁguration metadata. An example in the XML format would be: The ﬁrst line instantiates a Java object (also called “Java Bean” in J2EE) from the SerialAdapter class. The second line ﬁrst instantiates a Java object from the Sensor class, and then it populates the object’s data member with a reference to the SerialAdapter object. With the XML conﬁguration ﬁle, it is much easier to reconﬁgure the program if necessary. For example, if the ParallelAdapter is used instead, one can simply change the ﬁrst line to instantiate a ParallelAdapter object, and then launch the program again without the need to recompile the program code. By extracting the initialization and conﬁguration of the program from the programming logic and placing them in a single place, we achieve the separation of concerns. It not only allows the program to be reconﬁgured quickly, but it also allows the programming logic and the conﬁguration to evolve independently. Java programmers can focus on developing Plain Old Java Objects (POJO), while business users can focus on reconﬁguring the program for diﬀerent needs.

3.

A NEW CONFIGURATION MECHANISM

We propose a new way of distributing, installing and conﬁguring applications. Our proposal borrows the Spring IoC concept and applies it to application conﬁguration. It contains three distinctive components: 1. Conﬁgurable Virtual Appliance (VA): A conﬁgurable VA is similar to a class in object-oriented programs. Diﬀering from the traditional VA, which has only a ﬁxed functionality, conﬁgurable VA exposes a set of properties indicating what conﬁguration parameters could be changed. This not only allows the conﬁguration to be extracted out of the VAs, but it also allows a program to automatically change the conﬁguration of the Virtual Machine (VM) that is instantiated out of the VA. Conﬁgurable VA is a signiﬁcant departure from today’s VA because of its conﬁgurability. 2. Conﬁguration metadata: The conﬁguration metadata, either in a conﬁguration ﬁle or in some conﬁguration logic, captures an expert’s knowledge on everything that is related to setting up the application. The conﬁguration ﬁle is readable by a human, but more importantly, it is designed to be automatically interpreted by a container program. Similar to the conﬁguration ﬁle for a stand-alone application (e.g., apache.ini for Apache web server), a default conﬁguration metadata covers the majority of use cases, although the users are free to customize as much as needed. But unlike a stand-alone conﬁguration ﬁle, it also captures the inter-dependencies among the VAs. 3. RAC container. The Rapid Application Conﬁgurator (RAC) container reads the conﬁguration metadata, and, based on the metadata, it instantiates new VMs, conﬁgures VMs and sets up dependencies between VMs. Since the expert’s knowledge on conﬁguring this application is captured in a machine-readable format, instead of an installation manual, the RAC container could replace the manual installation and conﬁguration traditionally done by a human. Note that we use the term VA to refer to a virtual machine image (i.e., a class in Java) and the term VM to refer to a virtual machine (i.e., an object in Java). The three components are shown in Fig. 2. They resemble that of the IoC concept as shown in Fig. 1. In the next section, we will describe in details the implementation of the three components.

Figure 2: Rapid Application Conﬁgurator Under our proposal, there are three distinct roles involved in conﬁguring an application.

1. VA producers: They are responsible for releasing and maintaining individual VA. Diﬀering from the traditional VA, the VA producer will release conﬁgurable VA, which exposes a set of features that can be conﬁgured. They are typically the software product companies who package their own software for release. 2. Expert conﬁgurators: They understand the application well and they are fully aware of the many diﬀerent ways of conﬁguring the application. They are responsible for providing a list of standard conﬁguration metadata, each corresponding to a common deployment scenario. They could be part of the software product company or could be services and consulting companies who understand the industry segments. 3. The end users: They deploy the application for their own use. It is expected that most of them will take and use a standard conﬁguration metadata from the Expert Conﬁgurators as it is. However, few may tweak the standard conﬁguration metadata only as needed to suit the particular deployment scenario. It has been recognized that it is important to break down the job of conﬁguration into distinct roles[11] [4], but we further divide the Deployer role into two roles: Expert conﬁgurators and the end users. Having a few expert conﬁgurators allows us to eﬀectively amortize their time and eﬀorts on standard conﬁguration metadata over a large number of end users. In Table 1, we compare the traditional vs. the proposed software release and distribution process. Traditional software is released on a CD or DVD in an installable form. To facilitate installation, an expert from the software company writes a lengthy installation manual, which often contains hundreds of pages, to capture the expert knowledge on how to conﬁgure the application, especially the interdependency. The end user has to go through the document to learn the intricate details about how to conﬁgure the software and then implement the conﬁguration manually. This process often takes weeks because of the steep learning curve. In contrast, we propose software to be released as a conﬁgurable VA, which has both the software and the underlying OS pre-installed. Either the software company or a third party (e.g., a consulting ﬁrm) provides a set of conﬁguration metadata, one for each common usage scenario. Unlike the installation manuals, the conﬁguration metadata could cover several software components that constitute the end application. For example, an SAP Customer Relationship Management (CRM) application could include the SAP NetWeaver application server, the CRM software and an Oracle database. Traditionally, the conﬁguration for NetWeaver, CRM and Oracle are captured in diﬀerent installation manuals. However, in our proposal, they are all captured in the same set of conﬁguration metadata. Compared to the traditional process, our proposal eliminates human from the loop. Instead of letting a human conﬁgurator interpret the installation manual and conﬁgure manually, a program (the Rapid Application Conﬁgurator container) interprets the conﬁguration metadata and conﬁgures automatically. The end users are no longer required to understand the many details of the conﬁguration. Instead, most end users use the conﬁguration metadata as it is, however, the few power users still have the options

Table 1: Comparison between the traditional and the proposed software release/distribution process Traditional process

Proposed process

media

CD/DVD

Conﬁgurable VA

setup

installation manual

Conﬁg metadata

interpreter

end-user (human)

RAC

to customize as much as needed. Note that capturing and automating the conﬁguration was not possible without the virtualization technology because the physical deployment environment is not known beforehand. There are strong incentives for all three groups of people to adopt the new approach. The VA producers would prefer a conﬁgurable VA over many ﬁxed-function VAs for a couple of reasons. First, it is much easier to maintain. Instead of changing the many ﬁxed-function VAs whenever there is a software update or an OS patch, they only need to update one single VA. Second, it is much easier to test. Obviously, there is only one VA to test instead of many. In addition, the exposed properties limit what should be tested. In a ﬁxed-function VA, a user is forced to make arbitrary changes within the VA because there is no other mechanism. Not knowing what the users are going to change make it hard to enumerating testing scenarios. In contrast, in a conﬁgurable VA, all things that potentially need to be changed should be exposed as a property. Since the users are required to change only the conﬁguration exposed by the properties, the number of testing scenarios is greatly reduced. For the expert conﬁgurators and the end users, both the reduced cost and the reduced time for conﬁguration present strong incentives for adoption. First, the expert’s knowledge is captured digitally in the conﬁguration metadata, instead of the expert’s mind, and the cost of such knowledge is effectively amortized over a large number of end users. Second, instead of having a human read through the installation manuals and acquire the necessary knowledge, the RAC container could automate the process, thus greatly reduce the time taken to conﬁgure an application.

4.

DESIGN AND IMPLEMENTATION IN AMAZON CLOUD

In this section, we describe the design and implementation of a prototype implementation in the Amazon cloud environment. Although we have chosen Amazon cloud as the implementation platform for our prototype, we note that we could have chosen any other virtualization platform, such as VMWare or Xen, and the same IoC idea would still apply. Fig 3 shows the various components in our Amazon prototype implementation and steps involved when launching a new application. A conﬁgurable VA is stored in Amazon S3 as an Amazon Machine Image (AMI). We introduce a header ﬁle for each VA which declares the conﬁgurable options supported by the VA. The AMI and the header ﬁle are equivalent to a Java class. However, unlike Java where the declaration and the implementation are in the same ﬁle, we separate out the declaration into the header ﬁle and the implementation logic for each option into the AMI. This is

similar to the implementation in the C programming language. When launched from an AMI, a VM would run in Amazon EC2 and it is equivalent to a Java object. Like a Java object, an external agent, such as the RAC container, could change the member variables. To simplify the design, we require each VM to have a resident agent which polls for conﬁguration changes and reacts to the changes as necessary. Similar to the Spring IoC container, our RAC container is the central agent responsible for interpreting the conﬁguration metadata and conﬁguring the application as speciﬁed.

Figure 3: RAC prototype implementation in Amazon EC2 cloud and steps to instantiate a new conﬁguration The user speciﬁes the location of the conﬁguration metadata when launching a new application. The RAC container ﬁrst reads in the conﬁguration metadata, then it locates the header ﬁles associated with the referenced conﬁgurable VAs. Based on the header ﬁle, it perform an initial validation to check for obvious errors, such as setting a non-existent VA property in the conﬁguration metadata. Then the RAC container launches VMs as speciﬁed by the conﬁguration metadata, and the resident agents on the VMs check for how the VM is conﬁgured and perform the action necessary to carry out the conﬁguration. In the following, we get into more details of the design of the conﬁgurable VA, the conﬁguration metadata speciﬁcation and the RAC container. For ease of discussion, we consider a simple example web application which has two tiers. This web application queries a database to determine what information to display on the web browser based on the user request. The ﬁrst tier is the web server, which contains a web server which serves up user HTTP requests. The second tier is the database, which hosts all data to be queried.

4.1 Configurable VA A conﬁgurable VA exposes a set of properties that could be read and/or set. This is similar to the public data and method that a Java class exposes. Like Java programs, these properties are a declaration to the users that shows what conﬁgurations could be read or changed. The VA producers are responsible for implementing the conﬁgurable VAs. They only focus on exposing and implementing the set of conﬁgurable properties, but they are not aware of how the VA will be conﬁgured. Because the VA is conﬁgurable, the VA producer can implement fewer VAs,

because otherwise, they would have to implement one VA for each combination of conﬁgurations. The separation between the conﬁguration and the VM capability allows the VA producer to focus on producing VAs that are as generic as possible so that they can be used in many applications. For example, Sun Microsystems could produce one MySQL VA with many conﬁgurable properties, and then the same VA could be used in many applications, such as CRM, Enterprise Resource Planning (ERP) and HR applications.

4.1.1 Header file All property declarations are stored in a header ﬁle associated with the VA image, similar to the header ﬁle in C programs. The header ﬁle is a promise to the user what the VA is capable of supporting and what can be conﬁgured in the VA. Direct manipulation of VA beyond what is promised in the header ﬁle is disallowed in order to cleanly separate out the conﬁguration from the logic. Because of this contract, the VA producers could focus on testing only the interface promised to reduce development time and cost. In Amazon EC2, each AMI has a unique AMI identiﬁer. In the conﬁguration metadata, when the users reference an AMI, they have to specify the location of the manifest ﬁle (see Sec. 4.3). For example, the manifest for a database AMI could be stored in bucket/database.manifest.xml, where bucket is a bucket in Amazon S3. The manifest ﬁle contains the list of ﬁles that make up the AMI image which are also stored in the same bucket. For conﬁgurable VAs, we impose that the header ﬁle has to be stored in the same bucket, and it should have the same preﬁx as the manifest ﬁle but end with .hdr suﬃx. For example, bucket/database.hdr is the header ﬁle for the database AMI. When the user provides the location of the AMI manifest ﬁle in the conﬁguration metadata, RAC can locate the header ﬁle by concatenating the preﬁx and “.hdr” extension and then locate the ﬁle in the same bucket. A sample header ﬁle for the web server is shown below. In the following, we will describe the diﬀerent components of the header ﬁle. READ_URL_PORT=8080 READ_URL_PATH=/config/data Enum { MySQL, Oracle } DatabaseType wo Enum DatabaseType database mandatory rw String connectionString

4.1.2 Data types We support four data types for the properties: Enum, String, Int and Bool. The Enum data type can only take on a selected set of values; Int can take on any integer value; and Bool can only be True or False. String is the most generic; it can be any string as long as it does not contain any white spaces. The Enum, Int and Bool types are provided to ease conﬁguration and facilitate error checking. They restrict the set of values that could be assigned, so that the users are less likely to select incorrect settings. In the event that the users make mistakes, RAC can detect the errors early on based on the header ﬁle declaration before even turning on the VMs.

Even though String is the most generic and it can describe any properties, we encourage the VA producers to use the most appropriate data type as much as possible because of the error checking and prevention capabilities. A property has either read-only, write-only or read-write attributes, which are speciﬁed by the keyword “ro”, “wo” or “rw” respectively. For example, the “database” property is read only and the “connectionString” property is read-write. We say a property is writable if it is either “rw” or “wo”. We say a property is readable if it is either “rw” or “ro”. The “mandatory” keyword speciﬁes that a value must be assigned to the property and there is no meaningful default value assumed by the VA.

4.1.3 Read/write mechanism The ﬁrst two lines in the sample header ﬁle specify how RAC could read the properties values from the VM. Each VM must expose a web interface if it has any readable (ro or rw) properties. The web interface’s port number and the path to read the properties are speciﬁed by READ_URL_PORT and READ_URL_PATH respectively. To read a property’s value, RAC concatenates the VM’s IP address, the read port number, the read path and the property name to get the URL to query. For example, to read the connectionString, RAC reads URL http://VM_ip:8080/config/data/ connectionString, where VM_ip is the VM’s IP address. Assigning a value to the property uses a diﬀerent mechanism than reading the property. RAC exposes a web interface for the VMs to read the assigned value for their properties. The base URL for the RAC web interface for a particular VM is passed in the user data – a feature provided by Amazon where a user could pass in up to 16K user data that is accessible to a VM at URL http://169.254.169.254/ latest/user-data. To read a particular property value, the VM concatenates the base URL with the property name and then queries the web interface. One can argue that our design violates the IoC principle because the VM is now aware of the container. However, we are only exposing the property values, not the container internals or the objects it manages. For example, a VM would not be able to look up other VMs instantiated by the container. We have made this conscious choice in order to simplify the design. We could have modiﬁed the VM image and pass in the property values as needed, but it will make the implementation considerably harder. We could also pass in all property values in the user data, but it will limit the ﬂexibility. First, we are limited to how much conﬁguration data we can accommodate because the user data is only limited to 16KB. Second, we will not be able to support dynamic changing property values because the user data can only be set once during launching. Additionally, we could have used the same VM HTTP interface for assigning values, it would have required a dynamic script implementation, such as CGI or PHP, to record HTTP post, which will increase the VM implementation complexity. Furthermore, we would have to wait for the VM web server to be up before we can assign property values. As an example, for our two-tier application, we could pass http://RAC_ip/webapp/webserver as the RAC base URL for the web server, where RAC ip is the RAC container’s IP address. The web server would ask for the property value assignment for the connection string by querying http://RAC_ip/webapp/webserver/connectionString.

4.1.4 VM resident agent There is an agent residing in each VM that is responsible for reporting the value for the readable properties and reacting to the value assigned to writable properties. If there are any readable properties that are explicitly deﬁned in the header ﬁle, the agent must expose a web interface at port READ URL PORT and all property values should be listed under path READ URL PATH. If our example, when RAC queries http://vm_ip:8080/config/data/ connectionString, the current connection string should be returned. During the VM boot up, the web interface may not be available until the web server has been started. RAC will wait and retry when reading a property until the web server responds. If there are any writable properties, the agent must interpret the assigned value and take the appropriate actions. The agent enumerates the properties the VM supports by querying the RAC base URL. If a property value does not exist (i.e., HTTP 404 error), the agent assumes the default value is assigned. Otherwise, the agent reacts accordingly. For our prototype application, the agent queries the property values once at startup, then every second to see if any value has changed. We plan to implement a notiﬁcation interface in the near future in the VM’s web interface so that the RAC container can inform the VMs if any conﬁguration values have changed. The implementation of the agent is the responsibility of the VA producers. As long as the interfaces (exposing the web interface and reading the RAC web interface) and properties are supported, the implementation choice is left for the VA producers to decide.

4.2 RAC container The RAC container is responsible for initializing the application based on the user speciﬁed conﬁguration metadata. In our prototype implementation, RAC container is implemented as a web service running on a dedicated server. A client interacts with the RAC container web services to pass along conﬁguration metadata in order to instantiate a new application or re-conﬁg an existing application. When received new conﬁguration metadata, the RAC container parses and reacts to conﬁguration changes. If a conﬁguration is not valid, it returns error to the caller to report the errors to the user. If a conﬁguration is valid, it will call the Amazon Web Services API to carry out the action as necessary. RAC container interfaces directly with Amazon EC2 in order to have a complete visibility into the instantiated VMs. For example, RAC queries the Amazon web services API to obtain the VM IP addresses, program Elastic IPs, and mount EBS (Elastic Block Storage) volumes as necessary.

4.3 Configuration metadata The conﬁguration metadata captures the conﬁguration one wants to apply to the application. Being able to set the conﬁguration correctly requires detailed knowledge about the application, its various software components, and the many conﬁguration options for each component. Typically, the VA producers or a third party, such as a system integrator, comes up with the default conﬁguration metadata. Just like the Spring IoC container, RAC container can support a variety of forms of metadata. From the metadata, it can build an object model, where each object corre-

sponds to one VM. When reading an object’s property, the RAC container automatically queries the VM’s web interface. Similarly, when writing an object’s property, the RAC container automatically post the value on the RAC’s web interface. The current RAC prototype implementation supports two ways of passing conﬁguration metadata to the RAC container, either pass through a static conﬁguration ﬁle or pass through conﬁguration metadata via a program, such as that written in Java.

4.3.1 Static configuration file The user could pass a static conﬁguration ﬁle to the RAC container through the web services API. We provide a command line tool to hide the complexity of invoking the web services API. The conﬁguration ﬁle format follows closely that of Apache2 web server, i.e., apache2.conf and some aspects of shell script, with the necessary extension to support VA instantiation. An example is shown as follows: elasticIP=75.101.123.11 username=johnsmith password=pswd include webserver.conf include database.conf mountedDrive=device=/dev/sdh,\ volume=vol-VVVV1111 dbip=dbserver.vmip connectionString=user=\$username,\ password=\$password webservers=webserver1.vmip vmip=\$elasticIP The ﬁrst three lines in the sample deﬁne several variables. There are many reasons to deﬁne variables. If a value is used over and over again in diﬀerent parts of the conﬁguration, it is advantageous to deﬁne it in one single place. Variables should also be used if their values are expected to be customized at each deployment. These variables are typically listed at the beginning of the ﬁle so that the end users can easily see that they require customization. The “include” tag allows the conﬁguration data to be spread across several ﬁles to maintain modularity. The ‘VM’ tag is used to launch new VM from a VA. The ‘id’ ﬁeld speciﬁes a handle that could be used to refer to this VM in the future and the ‘image’ ﬁeld speciﬁes the location of the VA manifest ﬁle. From the manifest location, RAC could locate the AMI ID by calling the web services API (ec2-describeimages) and ﬁnding the AMI with the matching manifest ﬁle. ‘image’ is similar to a class in Java program and ‘id’ is similar to an object handle.

The conﬁguration for a VM is speciﬁed between the ‘’ and ‘’ tag in the form of a series of assignments. The properties (the left hand side of the assignments) must have been declared in the header ﬁle for the VA, otherwise, an error message is reported and the conﬁguration aborts. To reference the property of a VM, we use the vm handle.property syntax. For example, ‘dbserver.vmip’ refers to the implicit property ‘vmip’ of the dbserver VM. This sample conﬁguration ﬁle instantiates three VMs. It ﬁrst instantiates a database server, which would mount the EBS volume vol-VVVV1111. Presumably, this EBS volume contains all database data. Second, it instantiates a web server. The web server property ‘dbip’ holds the IP address of the database server. Furthermore, its connection string is set up with the login username and password. Note that the username and password are deﬁned as variables at the beginning of the ﬁle, and they are expected to be customized for each installation. However, the ‘dbip’ variable is automatically set by the container, eliminating the need to manually customize for the deployment environment. Lastly, a software load balancer is instantiated. The ‘webservers’ property holds a list of web servers that the load balancer should load balance to. In this example, we only have one web server. However, if we have more than one webservers, their IP addresses should be supplied on the same line with comma separation. Also, we program the load balancer with an Elastic IP address so that it maintains a consistent web presence. Beyond changing variables to customize the conﬁguration, one can also easily modify the conﬁguration for a diﬀerent setup because all conﬁguration is centralized and it is in one or more easy-to-read text ﬁles. For example, one can easily add another web server by copying and pasting the conﬁguration for webserver1, changing the name to webserver2, then adding its IP address to the load balancer. The conﬁguration ﬁle is processed sequentially. For simplicity, we require that the dependency should be explicitly speciﬁed in the correct order. Not all properties exposed by a VA may be assigned a value in the conﬁguration metadata. If a property is not assigned a value and if it is marked “mandatory”, the RAC container prompts the user for the information. This mechanism may be used for security sensitive information which may not be appropriate to capture in the conﬁguration metadata.

4.3.2 Dynamic configuration metadata Although easy to read and maintain, a static conﬁguration ﬁle can only capture static conﬁguration metadata. If an application needs to be dynamically reconﬁgured, the user could provide the conﬁguration metadata programmatically. With a web services API, we could support any programming language or even a workﬂow engine [12]. Currently, we provide a Java library that hides the details of interfacing with the RAC container’s web services API. To pass conﬁguration metadata, a Java programmer ﬁrst calls GetRACHandle to obtain a reference to the RAC container. Then the programmer can call NewApplication to start a new application or call GetApplication to look up an existing application by name. Both calls return a reference to the application. Using the application handle, the programmer can start a new VM (StartVM), enumerate all existing VMs (ListVM), look up a speciﬁc VM by name (GetVM), shut down a VM (StopVM) and get and set a

property of a VM (GetVMProp, SetVMProp). Besides supporting dynamic changing applications, the web service interface also allows us to apply incremental changes to an existing application with an initial static conﬁguration. This allows us to modify an application’s conﬁguration without the need to shut down the application ﬁrst.

5. RELATED WORK The problem we are addressing – the diﬃculty in conﬁguring applications – has been realized by many. There are various other approaches to address the problem. For example, components based deployment was proposed in[3], where legacy software is wrapped into components to facilitate management. SmartFrog[9] also allows multiple components to be conﬁgured automatically. This is similar to the concept of VA except that VA wraps the software around at the VM level. Similar to ﬁxed-function VA, components only address part of the problem. System administrators face a similar problem as ours – challenges in conﬁguring a cluster of machines. Many solutions, such as LCFG [2] [1], Cfengine [6][7], and Puppet [10], exist. These solutions all require externalizing conﬁguration data. However, they diﬀer from our solution in that they focus on server conﬁguration rather than application conﬁguration. For example, RAC can easily create new VMs on the ﬂy regardless of the underlying physical infrastructure. Talwar et al. [19][20] compared manual, script, language and model based software conﬁguration and deployment methods. Unfortunately, these traditional methods cannot provide the ease of deployment as VM encapsulation could. rBuilder [16] makes it easy to bundle components together into a single VM. RightScale provides script based conﬁguration at the single VM level for VMs running in Amazon cloud. VMPlant[13] and [14] can install application and allow customization during deployment. Unfortunately, none of these approaches separates out the inter-dependency conﬁguration from logic, so they are not able to easily capture expert conﬁguration knowledge on how to wire together the components of an application. Dearle [8] presented a case study of six diﬀerent software deployment technologies. Within Java program and components deployments, he described IoC and its beneﬁts. He also described virtualization as a software deployment method and argued that virtualization has a high potential to impact the software deployment process. The VA concept [18] is the foundation to our proposal. Unfortunately, ﬁxed-function VAs not only lead to the proliferation of the number of VAs – a signiﬁcant management hassle, but they are also limited to stand-alone applications (i.e., not multi-tier). Virtual Appliance Network [17] extends the idea of VA to bundle several VAs and their corresponding network conﬁguration into a coherent application. Their solution is similar to ours in the sense that VAN can be conﬁgured. But their goal is to minimize the number of parameters to be customized and conﬁgured to easy the deployment. Instead, we advocate many conﬁguration properties for each VA and use the conﬁguration ﬁle to capture the best practice. This is evident from their single CVL language design, whereas, we maintain the separation of the header ﬁle and the conﬁguration ﬁle. Similar to VAN, Open Virtualization Format (OVF) [15] speciﬁcation standardizes the format to package several VAs together into an applica-

tion, which will facilitate industry adoption. Like us, many stand-alone applications adopted the conﬁguration ﬁle concept. For example, apache2 web server has many conﬁgurable options and the sample conﬁguration ﬁle captures the conﬁguration for a particular deployment. Most users only do minimal customization to the sample ﬁle, yet power users, who are willing to spend the time learning, have the option to customize as much as needed. RAC is diﬀerent in several key aspects. First, it is a uniform way of conﬁguration, i.e., it is not restricted to one application or component. This allows us to easily compose a multi-tier application. For example, in the web server farm prototype, we can easily swap out the database or load balancer vendor if needed. Second, we extract out the conﬁguration metadata from all components of a complex application together. It allows us to capture the inter-dependencies among the various components, make changes in a single place, and reduce the number of parameters that need to be customized (e.g., VM’s IP address). Third, by adopting a web services API, we are no longer restricted to a static conﬁguration. Our web server farm prototype illustrates that we can dynamically reconﬁg the application, where even VMs may come and go, as the operating condition changes.

6.

CONCLUSION

Rapid Application Conﬁgurator (RAC) borrows ideas from the Java programming community, where the light weight container, exempliﬁed by Spring IoC, has shown great success at solving the complex program conﬁguration problem. We have described the design and implementation of RAC in the Amazon EC2 environment. Choosing EC2 as the platform not only simpliﬁes our implementation, but also enables easy distribution and adoption because EC2 and S3 are universally reachable from the Internet. Our preliminary experience with multi-tier applications has been very positive. It enables us to quickly deploy new applications, as well as replicate existing application environment for debug and testing. Because RAC fully automates the installation and conﬁguration, and removes human from the loop, it could potentially be a new way of solving some of the challenges we face. For example, most Software-As-A-Service (SaaS) solutions today are based on shared application stack. It is diﬃcult to maintain strong isolation and separation in such an environment. However, we can leverage virtualization to provide not only resource sharing but also strong isolation in the infrastructure layer. By using RAC, new application environments could be easily created for each new customer, thus eﬀectively amortizing the cost of the application over many customers to achieve the cost beneﬁts of SaaS.

7.

REFERENCES

[1] Anderson, P., and Scobie, A. Large scale linux conﬁguration with LCFG. In Proc. 4th Annual Linux Showcase and Conference (Oct. 2000). [2] Anderson, P., and Scobie, A. LCFG - the next generation. In UKUUG Winter conference (2002). [3] Bouchenak, S., Palma, N. D., Hagimont, D., and Taton, C. Autonomic management of clustered applications. In Proc. 2006 IEEE International Conference on Cluster Computing (Sep. 2006), pp. 1–11.

[4] Bradshaw, R., Desai, N., Freeman, T., and Keahey, K. A scalable approach to deploying and managing virtual appliances. In Proc. TeraGrid (2007). [5] Brown, A., Keller, A., and Hellerstein, J. A model of conﬁguration complexity and its application to a change management system. In Proc. 9th IFIP/IEEE International Symposium on Integrated Network Management (2005). [6] Burgess, M. A site conﬁguration engine. Computing systems (MIT press: Cambridge MA) (1995). [7] Burgess, M. Recent developments in cfengine. In Unix.nl Conference Proceedings (2001). [8] Dearle, A. Software deployment, past, present and future. In Proc. International Conference on Software Engineering (Future of Software Engineering) (2007). [9] Goldsack, P., Guijarro, J., Lain, A., Mecheneau, G., Murray, P., and Toft, P. Smartfrog: Conﬁguration and automatic ignition of distributed applications. In HP OVUA (2003). [10] Kanies, L. Puppet: Next-generation conﬁguration management. ;login: the USENIX Association newsletter 31, 1 (Feb. 2006). [11] Keahey, K., and Freeman, T. Contextualization: Providing one-click virtual clusters. In Proc. IEEE Int. Conf. on e-Science (2008). [12] Keller, A., Hellerstein, J., Wolf, J., Wu, K., and Krishnan, V. The CHAMPS system: change management with planning and scheduling. In Proc. 9th IFIP/IEEE International Symposium on Network Management and Operations (2004). [13] Krsul, I. V., Ganguly, A., Zhang, J., Fortes, J., and Figueiredo, R. Vmplants: Providing and managing virtual machine execution environments for grid computing. In Proc. Supercomputing 2004 (July 2004). [14] Nishimura, H., Maruyama, N., and Matsuoka, S. Virtual clusters on the ﬂy – fast, scalable and ﬂexible installation. In Proc. CCGrid (2007). [15] Open virtualization format. http://www.dmtf.org/standards/mgmt/vman/. [16] rpath rbuilder. http://www.rpath.com/rbuilder. [17] Sapuntzakis, C., Brumley, D., Chandra, R., Zeldovich, N., Chow, J., Lam, M. S., and Rosenblum, M. Virtual appliances for deploying and maintaining software. In Proc. LISA XVII (Oct. 2003). [18] Sapuntzakis, C., Lam, M. S., and Rosenblum, M. Virtual appliances in the collective: A road to hassle-free computing. In Proc. HotOS IX (2003). [19] Talwar, V., Wenchang, D., and Jung, Y. Approaches for service deployment. IEEE Internet Computing 9, 2 (2005), 70–80. [20] Talwar, V., Wu, Q., Pu, C., Yan, W., Jung, G., and Milojicic, D. Comparison of approaches to service deployment. In Proc. IEEE International Conference on Distributed Computing Systems (2005), pp. 543–552.

Rapid Application Configuration in Amazon Cloud ...

dress of the database VA, as well as what database software is used so ... simplifies application configuration, but also allows applica- tion logic .... and, based on the metadata, it instantiates new VMs, configures ..... For example, RAC queries the Amazon web services API .... This mechanism may be used for security sen-.

Download PDF

227KB Sizes 2 Downloads 141 Views

Report

Rapid Application Configuration in Amazon Cloud ...

Recommend Documents