You are on page 1of 6

Grid Service Hosting on Virtual Clusters

Bobby House and Paul Marshall University of Colorado UCB 430, Boulder CO 80309, USA robert.house@colorado.edu, paul.marshall@colorado.edu Michael Oberg, Henry M. Tufo, and Matthew Woitaszek National Center for Atmospheric Research 1850 Table Mesa Drive, Boulder CO 80305, USA oberg@ucar.edu, tufo@cs.colorado.edu, mattheww@ucar.edu

Abstract
This paper presents an architecture for service hosting on virtual clusters spanning multiple administrative domains that balances the requirements of application developers and resource provider system administrators. The presented architecture and implementation use virtual machines to simplify the deployment of externally-accessible persistent Web and Grid services while allowing resource provider system administrators to monitor hosted virtual machines and perform critical maintenance when necessary. This approach allows developers full control of their distributed resources, species a mechanism for resource provider monitoring and intervention, and reduces the barrier for hosting user-supplied virtual machines on shared resource provider cyberinfrastructure.

1. Introduction
The developers of software workows and serviceoriented architectures (SOAs) often face one particularly irksome constraint when designing software for managing high-performance computing (HPC) applications: they are not in control of the resources where the software will be deployed. Complex scientic workows typically rely on a collection of software components such as databases, Web servers, and Grid middleware to provide basic services, with custom programs strategically invoked to provide core functionality. For these types of systems to be deployed across sites operated by multiple resource providers on a Grid, the developers must limit their software selections to a leastcommon denominator of universally supported components or choose a hosting platform that provides an increased level of developer control.

Current production Grids, such as TeraGrid [1], the Open Science Grid (OSG) [9], and the Distributed European Infrastructure for Supercomputing Applications (DEISA) [8] use common software stacks to provide basic functionality on all systems that are members of their Grid. For example, the Coordinated TeraGrid Software and Services (CTSS) [12] stack contains a suite of middleware components and an authentication mechanism that allows users to access the TeraGrids resources. The DEISA Common Production Environment software stack [6] is much more specic, including tools, compilers, libraries, and applications, with coordinated updates and extensive version checking to ensure a global common environment. Mandated software stacks provide a least-common denominator of expected functionality across systems in a Grid but can lead to several problems. First, because of the large installed base, it may be difcult to obtain the consensus necessary to introduce new components. Second, widely used software stacks require a careful build-and-test methodology to ensure that components properly interact with each other and the systems that rely on them. Developers requiring nonstandard support software must convince individual resource providers to allow and assist with the installation of extra components required by their project. Rather than work within the constraints imposed by a shared software stack on a community resource, it is now attractive for projects to manage their own cyberinfrastructure through the use of virtual machines (VMs). Less expensive than dedicated servers, VMs can be hosted at a variety of resource providers or even leased from an on-demand cloud computing hosting utility. However, the use of private VMs shifts the burden of administration from resource providers to application groups. Rather than just developing software, the application group must maintain staff with system administration expertise to ensure the contin-

ued maintenance of the hosting platform. Moreover, many supercomputer centers are hesitant to allow black-box VMs to be attached to their public network. As projects progress through funding lifecycles, application groups may cease to be responsive to immediate system administration needs (such as security patches) that are of concern to the resource provider. We present an architecture for service hosting across multiple administrative domains, including multiple resource providers and the application development group itself. In our expected deployment scenario, resource providers would create private VMs with full external network connectivity for projects requiring persistent application hosting services. This hosting environment is distinct from other VM-based capacity-on-demand and dynamic resource provisioning solutions (e.g., virtual private sandboxed clusters) in that it addresses the needs of persistent applications, such as workow management solutions that interact with other HPC resources, instead of directly providing an execution environment for capacity computing tasks. The architecture allows developers to exert full control (e.g., root) over their VM hosts while retaining the ability for resource provider staff to assist with critical system administration and security issues when absolutely necessary. Resource providers may choose to deploy external or internal intrusion detection software, and also use VM and host management techniques to perform necessary administrative actions. This type of solution is particularly useful for our work at the National Center for Atmospheric Research (NCAR), which serves as a TeraGrid resource provider and also develops Grid-based applications using service-based approaches.

2. Related work
As the deployment complexity of scientic applications increases, the use of VMs to simplify application distribution becomes attractive. By packaging the entire software stack into a virtual machine, the application and its dependencies from the operating system to the user interface can be shipped as a unit, eliminating the need for manual conguration and compilation. Applications can then be easily deployed on institutional infrastructure or placed on leased resources through a service provider. A number of solutions have been proposed to host applications on virtual clusters. For example, Globus Virtual Workspaces [7] and the recently launched OpenNEbula project [4] have developed the required software to enable the shipping and execution of VMs. Globus Virtual Workspaces supports the dynamic provisioning of HPC resources by creating a virtual cluster with computational nodes and head nodes that completely avoids platform software constraints [5]. The Denali project [13] is focused on

hosting a large number of untrusted applications or services on a single machine, each within their own VM; much of this work focuses on the development of para-virtualization techniques to efciently place a large number of VMs on a single resource. To simplify the use of virtual cluster environments, Sundararaj et al. [11] propose a virtual network solution where applications are hosted in VMs at remote sites, but the VMs appear as though they are hosted directly on the development groups network. Once created, application VMs can be executed on local resources manually or using middleware such as the Globus Virtual Workspace service, or deployed on resources leased from a cloud computing resource provider. One of the most widely known services is Amazons Elastic Compute Cloud (EC2) in combination with their Simple Storage Service (S3) [3]. Amazon provides an unmanaged environment offering virtually unlimited compute and storage capacity that are billed as resources are consumed. Users of the EC2 service may provide their own VM images, use third-party services to construct a VM image, or start with a generic VM image provided by Amazon. Regardless of the VM source, EC2 essentially allows users to purchase virtual hosts on an as-needed basis. One immediate effect of the sudden popularity of cloud computing is that it is trivial for small groups of developers to inexpensively obtain distributed cyberinfrastructure to host their applications. While the ability to fully control private virtual machines eliminates the complexity of constructing an application around limited software stack, it also forces application groups to become experts in Linux computer security and system administration, and to assume the staff burden of continued software maintenance. In commercial hosting environments, such as Amazons current hosting platform, security is explicitly left to the developers. For example, Amazons documentation states that developers should secure your VM instance as you would any other Linux host, and then proceeds to describe operating system patch updates using yum and apt-get, and security considerations such as rewall and sshd conguration [2]. Amazon assumes that owners of virtual machines will be motivated to secure their machines because they are charged for all usage. In more of an academic research setting, many supercomputing center resource providers are hesitant to host black-box VMs on publicly exposed networks because they cannot vouch for the security of userprovided VMs, essentially rejecting a potentially useful distribution technology due to a legitimate security concern. Our proposed solution extends the typical black-box VM distribution mechanism to provide a managed hosting environment for persistent publicly-exposed services. By allowing professional system administration staff at resource providers to choose to assist with the administration the VMs hosted at their site, expertise and responsibility are

Project A

Project B

Project C

RP Site A

RP Site B

Cloud Hosting

Independent Resource Providers

Figure 1. Administrative domains for developers and resource providers.

more effectively balanced between software developers and system administrators. It is important to note that we do not propose outsourcing system administration from application groups to site resource providers. Rather, the primary responsibility and control reside with the application application developers, just as in the case of unmanaged VM shipping. We believe that the combination of a known mechanism for resource provider intervention and increased transparency of the managed VMs will signicantly reduce the barrier for hosting user-provided VMs on common resource provider cyberinfrastructure.

3. Design
Our managed hosting architecture is designed to fulll the following requirements: Give application developers full control. Application developers require full control of the system hosting their project so that they may quickly and easily deploy necessary software without administrator intervention. Retain resource provider authority. Resource providers must have the ability to monitor systems running on their network; they will not blindly trust remote third parties to ensure system security. Leverage resource provider administration best practices and management infrastructure. Minimize resource provider overhead. While resource providers have extensive experience with system administration, they would prefer to minimize staff time commitments. The managed hosting environment should not become an innite sink of staff time.

To meet these objectives, our architecture balances control and responsibility between application developers, resource providers, and the system developers by allowing multiple overlapping administrative domains (see Figure 1). Each participating resource provider maintains servers that host VMs for various projects. The independent project groups may then request the creation of VMs for their projects at resource providers as desired, and choose to use a site-preferred VM or their own system image. By giving project groups full control of their VMs, they are free to deploy software using any technique they prefer. Insofar as the deployed hosting VMs use a consistent distribution, all of the hosts may be administered as a distributed cluster where each node is at a different physical location. For VMs created using an operating system distribution with active patch maintenance, the systems can be easily upgraded by the application group using their preferred schedule, change control process, and deployment technique. Retaining the ability for resource providers to perform administrative actions on the VMs deployed on their systems eliminates concerns about attaching black-box operating systems to a publicly exposed network. Allowing resource provider administration staff to perform critical updates can help reduce windows of software vulnerability caused by variable stafng levels in participating projects. We dont expect site administrators to take ownership of administration on hosted VMs or perform routine maintenance that could affect the operation of the hosted VM (such as accidentally breaking something in an effort to be helpful), but merely to retain the ability to perform administrative actions determined to be critical for the VM to be connected to the external network. To minimize the resource provider administrative overhead, our architecture leverages the utilization of components that are already supported by operating system distribution channels. In our initial implementation, we selected a Linux distribution that facilitates the creation of custom packages that can then be installed on a VM at creation time or runtime. However, as the choice of an operating system distribution can vary widely based on each user groups unique needs, our software infrastructure can support additional guest VMs using distributions as required by our target user groups.

Independent Project Developers

4. Implementation
The implementation of the virtual hosting environment consists of software for the resource provider system administrators to create and congure VMs for projects, administer those VMs as networked hosts, and monitor the activity on those hosts (see Figure 2). For our initial implementation, we select Xen [14] as a VM hosting technology,

VM container at each resource provider (RP) site RP VM management Virtual machine management scripts VM administration using file system override Network configuration Root ssh key insertion RP host management gx-map Site system proxy administation Site administration using root ssh keys Grid mapfile & certificates Critical OS updates

External network Network IDS Network access VM firewall Project software

VM guest for each project

Site security monitoring (file integrity scanning)

Figure 2. High-level software architecture for the managed service hosting platform.

use rBuilder [10] to generate custom VM images, and also support manually-generated Debian-based VM images.

4.1. Resource provider VM management


The rst step in the creation of a persistent networked host for a project is the provisioning of a VM. The usersupplied VM image must be modied so that it contains the conguration required for it to boot properly, communicate over the network, and accept incoming administrative commands. As a resource provider, to create a new VM we allocate an IP address for the project on our project.services.ncar.teragrid.org domain, and then use our VM management scripts to congure the image by establishing the network conguration, setting a root password, and installing the sites root ssh key. To ensure that the resource provider administrators maintain the ability to control the VMs on their systems, the management software periodically veries that it can connect to each host and execute commands with superuser privileges. If the host refuses the connection or does not execute the command, control is re-established by force: the administrator is notied, the VM is halted, its le system is mounted locally, and the necessary ssh keys are installed directly. If this process fails, or the host still does not accept incoming connections, the situation is agged and an administrator is notied. It is important to note that a resource provider may perform this action only on VMs hosted on platforms under their control it may not be used to allow a local administrator to obtain login authority on a VM for the same project hosted somewhere else. Unless a project developer manually installs the ssh keys from one sites administrators on another site or creates ssh keys that can traverse the VMs without restriction, site control is limited to the VMs hosted at that site. At the present time, our resource provider VM machine management tools are implemented as a series of Python scripts. The initial VM setup is performed by our TeraGrid

resource system administrators. Because we expect persistent VMs to be created only upon request from project investigators with existing allocations at sites, this minimal level of administrator intervention requires a small amount of communication between the developers and the administrators that could be benecial should problems later arise.

4.2. Resource provider host management


Once the VM has been established as a networked host, the remainder of the resource provider system administration tasks can be performed using existing cluster management techniques. Resource providers may choose to use their existing conguration and le management infrastructure to distribute important operating system conguration les that control network connectivity and host access. Other than the initial conguration, we expect very resource provider system administrator intervention on the VM. One area where the resource provider may assist with VM administration is the use of existing Grid management software. For example, TeraGrid resource providers run gxmap to propagate distinguished name and certicate authority updates. A gx-map proxy can be used to maintain the Grid authentication infrastructure on a VM host used for a TeraGrid project. Of course, developers may choose to omit this feature and manage their Grid-based authentication directly.

4.3. Client host management


After the VM host has been attached to the network, project developers may manage their host using a variety of tools and techniques. At the most direct level, the developers accounts receive sudo access; the initial accounts, passwords, and ssh keys are established at the time the VM is requested, so developers may then continue to administer the machine (including the designation of other administrators) at their convenience.

Project developers truly do have full control of the VM, and may administer it using the tools of their choice. The developers may control the rewall directly using iptables or select a front-end such as shorewall, or elect to avoid a rewall altogether. Similarly, developers may install software using a package management system or manually by compiling from source. Projects with multiple clients may use traditional cluster administration techniques, such as pdsh, to execute commands on all of their hosts at the same time. For more extensive deployments, the developers may choose to use conguration le management software to maintain image consistency without manual intervention.

4.5. Guest operating system production


The nal piece of the functional implementation is the selection of a guest operating system to serve as the managed hosting platform. In an effort to simplify the creation of virtual machines, as well as facilitate the painless creation of custom software package and the deployment of updated packages across distributed clusters of VMs, we selected rPath Linux as the guest operating system of choice for new VMs at our site. While we were quite pleased with the ability to easily generate packages using rPaths VM management systems, many researchers that we collaborate with expressed a strong preference to use other Linux distributions. Most Linux distributions, such as Debian and SuSE, contain toolsets that enable the straightforward creation of new VM images. We are therefore updating our preliminary software stack to support the maintenance of both rPath-based and Debian-based VM images.

4.4. Resource provider considerations


The managed hosting architecture allows a wide range of administrative activities to be performed by developers or resource provider staff. In practice, the amount of resource provider system administrator intervention will be dened by policy, balancing the resource providers interest in hosted VM monitoring and staff time considerations. For some resource providers, a completely hands-off approach may be feasible, while others may choose to perform more monitoring or even participate in system administration to satisfy local computer policies and procedures. In our prototype deployment, our resource provider system administrators chose to install two software packages on each hosted VM: a le system integrity scanner (Samhain) and a syslog daemon that supports secure remote logging (syslog-ng). The necessary client software is installed on each project VM at the time of its creation, and both Samhain and syslog-ng are congured to communicate with separate servers that maintain the le system integrity database and system log. This provides the system administration staff with notications for critical events, such as the replacement of operating system les and a system log separate from the host being monitored. This level of monitoring is considered the minimum security practice for any host exposed to the Internet at our site. Resource providers may also choose to perform passive monitoring on the VM host container. For example, in our prototype implementation, the network monitoring software Bro is run on the host container and monitors all network trafc in and out of hosted VMs. By using the networkbased intrusion detection software, administrators have another method to identify VMs that may require staff intervention, and can proactively monitor the hosted VMs for suspicious activity. With the ability to actively administer a system, and background monitoring to highlight possible problems, VM hosting is completely transparent both to the resource provider and the developers.

5. Results
To demonstrate the functionality of our managed service hosting architecture, we deployed the system as an alternate container for several projects at NCAR. Currently, these projects are hosted on existing TeraGrid resources. Use of the these resources requires a high level of administration support because of the need to support the software stack for projects developed over the past several years. The current projects of interest include a service oriented architecture that requires a Globus Grid Service, a standalone daemon, and a support database; and a TeraGrid science gateway that requires a web server and a back-end workow management system. Rather than develop custom VMs for these projects, we created a VM image with a selection of software that is useful for all and customized each as necessary. (Of course, the developers could create their own VM and request that it be installed on the target site hosting platform.) Unlike virtual clusters for HPC applications, VMs are instantiated only once for each project at each site, and no regular (or automatic) VM distribution is required. The only requirement of the VM image is that it be based on a Linux distribution that is compatible with our underlying VM management software so that the site administrators may manage all of the VMs running at their site easily. To this point, our testing has been functional in nature. Our software stack is capable of conguring the VMs as fully-exposed networked hosts. The system administration staff can monitor the VM activity using the selected security tools and perform administrative activities as necessary. By placing the projects on the VMs, we can freely give application developers superuser priviliges on Grid-enabled publicly-exposed hosts.

6. Continuing work
Our initial test deployment demonstrated the proper operation of our managed hosting software stack, and we plan to continue expanding the systems operational characteristics to support the needs of the more diverse project and resource provider communities. The most important component of this work is communicating with other potential users of this infrastructure, both project developers and resource providers, to establish which additional features are necessary or desirable in order to make this an attractive hosting platform. We are continuing to expand the software that assists with typical system administration issues for Grid clients. Our current software performs rudimentary /etc/grid-security synchronization, but we would like to expand the softwares capabilities for adding user accounts across independent resource providers. In addition, we plan to more closely examine other Grid VM technologies, such as Globus Workspaces, and utilize their management interfaces when appropriate. In our current implementation, software developers are required to administer clients manually. The developers must remember the names of their assigned hosts, their usernames at each site, and rely on passwords or SSH keys to access each host. We would like to develop a Grid-based management software stack that simplies developer system access and administration. For example, each booted VM for a particular project could register with a Grid metadata service. Developer tools could query the metadata service to identify project VMs and simplify administration.

National Center for Atmospheric Research, the University of Colorado, and a grant from the IBM Shared University Research (SUR) program. We would like to thank Jason Cope for his feedback on targeting hosting platforms for the requirements of Gridbased service-oriented architectures, and Marty Wesley at rPath for his assistance with using rBuilder deploy custom software in virtual machine images.

References
[1] TeraGrid. http://www.teragrid.org/about/. [2] Amazon Web Services Developer Connection. Tips for securing your EC2 instance. http://developer.amazonwebservices.com/connect/entry.jspa ?externalID=1233&categoryID=100. [3] Amazon.com, Inc. Amazon Web Services. http://www.amazon.com/aws/. [4] Distributed Systems Architecture Group at Universidad Complutense de Madrid. OpenNEbula. http://www.opennebula.org/. [5] I. Foster, T. Freeman, K. Keahey, D. Scheftner, B. Sotomayer, and X. Zhang. Virtual clusters for grid communities. In Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid, May 2006. [6] D. Girou. DEISA common production environment, October 2006. http://www.deisa.eu/les/DEISA-TrainingOctober06-General-DCPE.pdf. [7] Globus. Virtual Workspaces. http://workspace.globus.org/. [8] H. Lederer, G. J. Pringle, D. Girou, M.-A. Hermanns, and G. Erbacci. DEISA: Extreme computing in an advanced supercomputing environment. In C. Bischof, M. B ucker, P. Gibbon, G. Joubert, T. Lippert, B. Mohr, and F. Peters, editors, Parallel Computing: Architectures, Algorithms and Applications, volume 38, page 687, 2007. [9] R. Pordes, D. Petravick, B. Kramer, D. Olson, M. Livny, A. Roy, P. Avery, K. Blackburn, T. Wenaus, F. W urthwein, I. Foster, R. Gardner, M. Wilde, A. Blatecky, J. McGee, and R. Quick. The Open Science Grid. Journal of Physics: Conference Series, 78:012057 (15pp), 2007. [10] rPath Inc. rBuilder Online. http://www.rpath.com/rbuilder/. [11] A. Sundararaj and P. Dinda. Towards virtual networks for virtual machine grid computing. In Virtual Machine Research and Technology Symposium, May 2004. [12] TeraGrid. Coordinated TeraGrid Software and Services (CTSS). http://www.teragrid.org/userinfo/software/ctss.php. [13] A. Whitaker, M. Shaw, and S. Gribble. Denali: Lightweight virtual machines for distributed and networked applications. In USENIX Annual Technical Conference, June 2002. [14] XenSource Inc. Xen Community. http://xen.xensource.com/.

7. Conclusions
In this paper, we presented an architecture for service hosting on virtual clusters spanning multiple administrative domains. Intended to provide a persistent managed hosting environment for distributed projects such as Grid gateways and their underlying SOAs, the virtual hosting infrastructure gives application developers the necessary control and exibility to easily deploy software across multiple service providers while maintaining the ability for service providers to assist with system administration when necessary. This reduces the barrier to developing and deploying distributed services, such as Grid-enabled service-oriented architectures, at multiple sites across production Grids.

Acknowledgments
Computer time and support were provided by NSF MRI Grant #CNS-0421498, NSF MRI Grant #CNS-0420873, NSF MRI Grant #CNS-0420985, NSF sponsorship of the

You might also like