You are on page 1of 57

SECURITY AND PRIVACY IN CLOUD COMPUTING

Bevan Barton

Adviser: Professor Amy Briggs

A Thesis Presented to the Faculty of the Computer Science Department of Middlebury College in Partial Fulllment of the Requirements for the Degree of Bachelor of Arts

May 2010

ABSTRACT

Cloud computing is an Internet-based computing paradigm where pooled IT infrastructure is provided to users as a service. The details of the underlying infrastructure are managed by the cloud service provider (CSP) and presented to users as an abstraction. A major advantage of cloud computing over traditional IT services is that it allows minimally trained personnel to rapidly and exibly provision vast amounts of computational resources on a pay-as-you-go basis, eliminating the need to provision and maintain datacenters, or to subscribe to less efcient IT services. Cloud users work within the applications, development platforms, or virtual machines provided by the CSP (depending on the type of cloud service) without regard to any of the underlying infrastructure layers. The efciency of cloud computing stems largely from multitenancy: users data and applications transparently reside on the same physical infrastructure, where resources such as memory and CPU cycles are allocated as needed. Cloud computings shared infrastructure and other characteristics make it a costeffective alternative to traditional IT services, but they also introduce security and privacy issues. In particular, concerns have been raised over the security of cloud computing infrastructure at the network and host levels. As a result of those vulnerabilities, malicious cloud users may be able to monitor other users activities, or gain unauthorized access to the host processes managed by CSPs. Cloud data is usually not encrypted, for various reasons, which creates additional privacy concerns. Furthermore, general characteristics of cloud computing, such as the lack of industry standards, present other security challenges.

ACKNOWLEDGEMENTS I would like to thank Professor Amy Briggs for assisting me in the process of writing my thesis, and for giving me valuable advice on my presentation. I would also like to thank Professor Tim Huang for inspiring me to study Computer Science, and for supporting me in independent studies and other projects. Working on computer go players and starting Appstone, Inc. with Tim were highlights of my time at Middlebury.

TABLE OF CONTENTS 1 Introduction 1.1 What Is Cloud Computing? . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Utility Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NIST Denition and Characteristics 2.1 Denition . . . . . . . . . . . . . . . . . . . 2.2 Characteristics . . . . . . . . . . . . . . . . . 2.2.1 On-Demand Self-Service . . . . . . . 2.2.2 Broad Network Access . . . . . . . . 2.2.3 Resource Pooling . . . . . . . . . . . 2.2.4 Elasticity of Service . . . . . . . . . 2.2.5 Metered Service . . . . . . . . . . . 2.2.6 Massive Scalability . . . . . . . . . . 2.2.7 Characteristics Summary . . . . . . . 2.3 Cloud Service Models . . . . . . . . . . . . . 2.3.1 Software as a Service (SaaS) . . . . . 2.3.2 Platform as a Service (PaaS) . . . . . 2.3.3 Infrastructure as a Service (IaaS) . . . 2.4 Cloud Deployment Models . . . . . . . . . . 2.4.1 Public Cloud . . . . . . . . . . . . . 2.4.2 Private Cloud . . . . . . . . . . . . . 2.4.3 Community Cloud . . . . . . . . . . 2.4.4 Hybrid Cloud . . . . . . . . . . . . . 2.5 Summary of Service and Deployment Models 2.6 Efciency of Cloud Computing . . . . . . . . 2.6.1 Size and Location of Datacenters . . 2.6.2 Statistical Multiplexing . . . . . . . . 2.6.3 Relative Rate of Industry Growth . . Virtualization 3.1 Virtual Machines (VMs) . . . . . 3.2 Multitenancy and Isolation . . . . 3.3 Signicance in Cloud Computing . 3.4 Hypervisor . . . . . . . . . . . . Security and Privacy Overview 4.1 Introduction . . . . . . . . . . . 4.2 What is at stake? . . . . . . . . 4.2.1 Service Downtime . . . 4.2.2 Data Loss . . . . . . . . 4.2.3 Signicance of Incidents 1 1 2 3 4 4 4 5 5 5 6 7 7 8 8 9 9 11 13 13 13 13 14 14 15 15 16 16 18 18 19 19 20 22 22 23 24 25 26

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . 4

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

Cloud Infrastructure Vulnerabilities 5.1 Cross-VM Data Snooping in IaaS . . . . . . . . . 5.1.1 Signicance . . . . . . . . . . . . . . . . . 5.2 Targeted VM Placement . . . . . . . . . . . . . . 5.2.1 Mapping the EC2 Network . . . . . . . . . 5.2.2 Determining Co-Residence . . . . . . . . . 5.2.3 VM placement . . . . . . . . . . . . . . . 5.2.4 Network Vulnerabilities and Possible Fixes 5.2.5 Risk Assessment . . . . . . . . . . . . . . 5.3 Cross-VM Data Leakage . . . . . . . . . . . . . . 5.4 VM Escape . . . . . . . . . . . . . . . . . . . . . 5.4.1 Cross-VM Attacks . . . . . . . . . . . . . 5.4.2 VM-to-Host Attacks . . . . . . . . . . . . 5.4.3 Risk Assessment . . . . . . . . . . . . . . 5.5 Attacks on the Virtualization Layer . . . . . . . . . 5.5.1 Attacks on Virtualized Machines . . . . . . 5.5.2 Vaserv.com Hypervisor Vulnerability . . . 5.5.3 Summary . . . . . . . . . . . . . . . . . . Privacy and Encryption 6.1 Difculty of Encryption . . . . . 6.1.1 Working in the Cloud . . 6.1.2 Working Locally . . . . 6.2 Fully Homomorphic Encryption 6.3 Hierarchical Encryption . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

27 27 28 29 29 31 31 33 35 35 36 36 38 38 39 39 40 41 42 42 42 43 44 45 47 47 47 48 48 49 51

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

General Issues 7.1 Lack of Industry Standards . . . . 7.2 Data Lock-In and Cloudbursting . 7.3 Open Source Initiatives . . . . . . 7.4 Central Points of Internet Control . Conclusion

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

Bibliography

CHAPTER 1 INTRODUCTION

1.1

What Is Cloud Computing?


The interesting thing about cloud computing is that weve redened cloud computing to include everything that we already do. I cant think of anything that isnt cloud computing with all of these announcements. The computer industry is the only industry that is more fashion-driven than womens fashion. Maybe Im an idiot, but I have no idea what anyone is talking about. What is it? Its complete gibberish. Its insane. When is this idiocy going to stop? -Larry Ellison, CEO of Oracle [5]

The term cloud computing receives a lot of criticism for its vagueness and wide scope. The above quote by Larry Ellison represents the frustration that some people have regarding this over used buzz word. As we will see in the following chapter, the denition of cloud computing is indeed quite broad, and encompasses a wide variety of services. Other commentators criticize the term because of the positive connotations of clouds, which they believe should not be associated with a computing paradigm. Ron Rivest, a co-creator of the RSA cryptosystem, suggested that cloud computing be renamed swamp computing, to encourage people to look at cloud computing in a new light [15]. The term, Rivest claims, evokes positive imagery that does not accurately depict the challenges that cloud computing presents. Cloud computing and the cloud are indeed misleading terms. Both are suggestive of the Internet: a vast, amorphous, and decentralized network. The Internet is central to cloud computing since it is the delivery medium of those services. However, the cloud metaphor is misleading because cloud computing is done in proprietary data 1

centers, rather than in decentralized networks of machines. Furthermore, the term the cloud suggests that there is some single entity, when there are actually many proprietary clouds. Furthermore, there are no widely used standards in cloud computing, and different cloud services are generally not designed to be interoperable. The National Institute of Standards and Technology [10] maintains a denition of cloud computing, which shows that it is more than a marketing term. We will discuss that denition, and the various characteristics and avors of cloud computing, in the following chapter. Before that, we will present a high-level denition, and see that cloud computing services offer computation as a utility.

1.2

Utility Computing

Cloud computing describes services that provide users some form of computation as a utility via the Internet [1]. An analogy can be made between a cloud computing company and an energy company. Both companies operate infrastructure that provides an on-demand service to customers. In both cases, the service provider abstracts the need for customers to run the requisite utility-producing infrastructure for themselves. The result is that customers can focus on fullling higher-level responsibilities. The energy company abstracts the need to buy and run a gas generator, allowing customers to focus entirely on running appliances. In the same way, cloud computing providers such as Amazons Elastic Compute Cloud (EC2) abstract the need to buy and run a server, allowing customers to focus entirely on higher-level tasks. Furthermore, in almost every other respect, cloud computing provides computation in a utility-like manner. We will explore the ways in which it does so, and the implications of this utility model, when we discuss the NIST denition and characteristics of cloud computing in the following chapter.

1.3

Roadmap

In this thesis, we will explore what cloud computing is and why it can be more efcient than traditional services. We will discuss security and privacy issues raised by the computing paradigm, and assess some of those risks. In Chapter 1, we take a high-level view of cloud computing: we address the popular conception of the term, and present an intuitive denition. In Chapter 2, we explore a detailed denition of cloud computing; we explain the computing paradigms characteristics and the different service and deployment models that cloud computing services use. In doing so, well show how cloud computing can be more efcient than alternative services and debunk some myths associated with this computing paradigm. Chapter 3 introduces virtualization, a key cloud computing technology that largely enables CSPs to provide computation as a utility. In Chapter 4, we will present an overview of security and privacy concerns specically relevant to this computing paradigm, and discuss what is at stake by analyzing failures of CSPs. In Chapter 5, we will elaborate on cloud computing infrastructure vulnerabilities while discussing the steps required to spy on specic users of IaaS cloud services, and related security issues. In Chapter 6, we will discuss the difculty of using encrypted data in the cloud computing context and see what encryption schemes are being developed to facilitate in-cloud encryption. Chapter 7 explores some general security and privacy issues confronting cloud computing. Finally, Chapter 8 offers some conclusions and perspectives on the many issues examined throughout the thesis, in particular on the risks and challenges associated with cloud computing.

CHAPTER 2 NIST DEFINITION AND CHARACTERISTICS

2.1

Denition

The National Institute of Standards and Technology (NIST) maintains a broad denition of cloud computing. The 2-page document lists the denition, ve essential characteristics, and service and deployment models of the computing paradigm. The NIST document is more accurately a description of existing services rather than an actual definition. This is evidenced by the fact that the document is in its fteenth iteration. The NIST denition is as follows: Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of congurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of ve essential characteristics, three service models, and four deployment models. [10] That denition is a lot to take in, and is somewhat ambiguous. Luckily, NIST expands on it by explaining the ve essential characteristics of cloud computing. Well look at each at those in the following sections to get a better idea of what cloud computing is. After that, well look at the different types of cloud computing, discuss the efciency of cloud infrastructure, and then look at security issues it presents.

2.2

Characteristics

In addition to providing the denition, NIST outlines the ve essential characteristics of cloud computing. These are on-demand self-service, broad network access, resource 4

pooling, rapid elasticity, and metered service [10]. A sixth characteristic, massive scalability, is mentioned in other denitions, and is worth including in this discussion [9]. These characteristics emphasize that CSPs offer computing as a utility, and together these characteristics distinguish cloud computing from other paradigms.

2.2.1

On-Demand Self-Service

On-demand self-service means that computing resources can be rapidly provisioned by an individual without interacting with staff at the service provider. The high availability made possible by this characteristic is an improvement over traditional hosting companies, in which IT staff are often required to initialize service, which may take hours to become available.

2.2.2

Broad Network Access

Broad network access means that CSPs can be accessed across a network via thin clients. Practically speaking, this means that users interact with a CSP through the Internet via a web browser interface. This contrasts with some non-cloud IT services, which may require specic software or devices to access the service provider.

2.2.3

Resource Pooling

Cloud providers employ a multi-tenant model, which means that many users share the same physical infrastructure. An advantage of multitenancy is that resources, such as CPU cycles, memory, and disk space, can be dynamically assigned and reassigned to users as needed. This can be thought of as the ability of cloud users to claim resources as needed; when those resources cease being used, they are automatically returned to the resource pool. This

need-based allocation of resources on shared infrastructure is called statistical multiplexing, and leads to more efcient infrastructure usage than when resources are statically assigned. By using statistical multiplexing over a wide user base, cloud providers ensure that more of their hardware is active more of the time. We will see that statistical multiplexing is a major reason why CSPs can operate more efciently than private datacenters. The alternative of xed resource allocation can lead to customers paying for unused capacity. For example, a non-cloud website hosting service might charge $20 a month to host a website of up to 100MB in size (which is statically allocated to that user). Unless the customer uses all of that 100MB, they are paying for resources they are not using. Cloud computing eliminates that inefciency with resource pooling.

2.2.4

Elasticity of Service

Elasticity of service is the ability to expand and contract services according to the users needs. The ability to scale down is a particularly important feature of cloud computing. When using the less-elastic alternative of outsourcing hosting to traditional hosting providers, companies generally have to provision enough resources for peak server utilization, which can exceed average server load by as much as ten times [1]. New companies must guess their peak trafc load, which can result in under- or over-provisioning. The rapid elasticity of cloud computing eliminates those risks and the inefciency of acquiring or renting resources that will be idle most of the time. The above risks are particularly relevant to companies launching new web applications because application load can be unpredictable. A notable cloud success story is of Animoto, a service that creates music videos from users photos and video clips. After Animoto launched its Facebook application, its server utilization on Amazons Elastic

Compute Cloud (EC2) went from 50 servers to 3500 in three days [1]. The APIs provided by Amazon made it possible for them to automatically provision and deprovision server instances, so the company never had to worry about buying too much or too little capacity. Clearly, it would have been extremely impractical and risky for Animoto to host their own data center to support that usage spike. In fact, down-scaling is a new challenge to developers of cloud applications [4]. For applications that are hosted in-house, down-scaling is not generally a concern; the company already has its infrastructure, so it has limited down-scaling opportunities short of selling that hardware. Cloud computing presents the new incentive of down-scaling, because CSPs provide APIs for commissioning and de-commissioning VM instances.

2.2.5

Metered Service

Cloud services provide metered service: users pay for only the resources they use. This pay-as-you-go model can be more efcient than non-cloud services, whereby users are often charged a xed cost for a certain tier of service. This feature of cloud computing is especially useful for some web applications. Rather than purchasing a xed amount of hosting space from a traditional hosting provider, which will likely not match their needs, they can instead launch their application in the cloud and pay for exactly the computation and transfer that is used.

2.2.6

Massive Scalability

A sixth characteristic of some cloud computing services, which is not mentioned in the NIST denition, is massive scalability [9]. Because cloud infrastructure is abstracted at some level, that abstraction can create the illusion of there being innite resources available. Cloud computings very large scale is an important draw for large organizations

who are considering putting their applications and data in the cloud.

2.2.7

Characteristics Summary

In summary, the ve characteristics of cloud computing, plus massive scalability, emphasize its utility-like availability and ease of access. All forms of cloud computing share the above characteristics. In the next section, well take a look at the different service models that cloud computing services fall under.

2.3

Cloud Service Models

The ve essential characteristics of cloud computing according to NIST, plus massive scalability, describe a wide range of services. Cloud services can be further categorized according to three service models, which are also specied in the NIST document. Each service model offers computational services to the user at different levels of abstraction. The three models are Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). Each service model virtualizes aspects of computation, storage, and networking [1]. Software as a Service providers abstract computation at the software level. Such services let users interact with programs that are running on the CSPs infrastructure; CSPs manage all aspects of running the software. Platform as a Service providers provide a higher level of abstraction and are used by web application developers. A PaaS provider hosts its users web applications, abstracting users needs to maintain physical infrastructure and a software platform. Developers can focus on creating application logic, and make use of development tools that offer automatic scalability. Infrastructure as a Service providers let users (who are application developers) rent

virtual machines (VMs), thereby abstracting users needs to maintain their own physical infrastructure. IaaS services are used for general computation tasks. Well look at characteristics and examples of each service model in the following sections.

2.3.1

Software as a Service (SaaS)

Software as a Service simply describes web applications that users access via the Internet, such as Google, Facebook, or web mail. The user works within the application that the CSP runs on its infrastructure. Users have no concern for the underlying implementation details; they do not manage the software process, development platform, database, operating system, network, or any other supporting infrastructure. For example, Facebook users can post pictures and video without worrying about how that data will be stored across Facebooks servers. Facebook manages all the underlying details and lets the user perform as much computation as they want within the connes of the application. In SaaS, the computational utility that is provided is the applications functionality. Another example is Google Docs. With that service, users no longer have to install and run an ofce suite on their own computers; rather, they can simply access Google Docs, which Google keeps running on its servers. Furthermore, users dont have to worry about losing their les on their computers, because management of document storage is another part of Google Docs utility.

2.3.2

Platform as a Service (PaaS)

Platform as a Service offers a higher level of abstraction than Software as a Service. The PaaS service model provides a software platform and physical infrastructure on top

of which web application developers can build and host their applications. The user does not manage the operating system or storage technology, or any of the underlying physical infrastructure. The software platform provides tools that abstract scalability and network and server failover concerns [1]. A constraint of PaaS services is that they are designed for use with applications that function on a request-reply basis. This means that the hosted applications functionality must be be triggered remotely by users; it cannot run autonomously. The result of that restriction is that PaaS services are only used for hosting web applications, and are not used for general computing tasks. This is because web applications behave in predictable ways. That predictability is what makes auto-scaling mechanisms possible [1]. Another potential downside to PaaS is that the application must be written in a language supported by the CSP. Google App Engine is an example of a PaaS service: it hosts web applications that run in Java and Python run-time environments. The App Engine development platform makes use of technology that underlies Googles own web applications. Starting with their operating system and run-time environments, the platform also includes Googles proprietary le system and data store, which controls how an applications data is stored and retrieved [12]. Those software tools completely abstract scalability, and network and server failover [1]. In particular, AppEngine features a scalable database abstraction called MegaStore, which is based on Googles BigTable data store, that underlies its web applications [12]. As a result, applications that use those tools automatically scale across Googles physical infrastructure. PaaS is a true pay-as-you-go model. Google AppEngine charges for storage and bandwidth by the byte, and computation is billed by CPU cycles used.

10

2.3.3

Infrastructure as a Service (IaaS)

Cloud IaaS services offer the most exibility to users and are suitable for general computing tasks. In this model, users have complete control over virtual machines (VMs), which are software implementations of physical computers (we will discuss virtualization later on). For instance, each instance has an IP address, virtual CPU, and virtual disk. This affords a lot of exibility to users, who can run whatever software they wish on their VMs, starting with the operating system. The service provider manages the underlying physical infrastructure and the virtualization software. Amazons Elastic Compute Cloud (EC2) is perhaps the best-known IaaS cloud service. EC2 lets users provision VMs in minutes via their website, or programmatically, via an API. A great convenience of this and other IaaS services is that VMs are rented by the hour, which contrasts with monthly rentals at traditional hosting companies. Storage and bandwidth usage are billed on a pay-as-you-go basis, but the service charges a xed hourly cost for VM instances, regardless of their activity. IaaS services are increasingly being used by companies as an alternative to buying and maintaining private datacenters [1]. We will see later on how CSPs can be so efcient. Another use of IaaS services is performing computationally expensive operations [1]. Tasks that can be parallelized are well suited to cloud computing, because an otherwise long task (on a single computer) can be divided among many machines at no additional cost. Because CSPs charge according to actual resource usage, it costs just as much to rent one computer for 50 hours as it does to rent 50 computers for one hour each.

A note about application scalability in IaaS Just as the terms cloud computing and the cloud are misleading, so are some of the characteristics often associated with those terms. For instance, IaaS cloud computing

11

services, like Amazon EC2, are often associated with web application scalability. IaaS cloud computing services may provide easy access to vast computation resources, but the programmer is responsible for ensuring that their application can make use of those resources effectively. For web applications, scaling the database is seen as the main scalability challenge, rather than scaling computation. Unfortunately, IaaS CSPs do not offer storage solutions that scale as usefully as their computation resources; scalable database abstractions with functionality on par with SQL are still an open research problem [1]. IaaS providers offer database abstractions that can scale automatically, but they use non-standard APIs and have relatively high latency [1]. For example, Amazons SimpleDB and Scalable Storage Service (S3), which can be used with EC2, offer fully automatic scalability. However, those services suffer in functionality and performance because they must cater to many different types of applications [1]. Full functionality is provided by Amazons Elastic Block Store (EBS), which features a standard API, but it offers no scalability or sharing. Highly scalable database abstractions such as Google AppEngines MegaStore are not available for IaaS services, because they only support web applications that operate in a predictable request-reply manner [1]. Application scalability is an issue at the software platform and application levels. As such, application scalability must still be crafted by the application developer in IaaS services. This lack of scalable storage options is seen as a signicant obstacle to wider adoption of cloud computing [1]. In summary, scalability, when touted as a benet of IaaS cloud computing, refers to elasticity of service rather than the promise of application scalability. Furthermore, just as IaaS users must implement scalability, they must also implement network and server failover. Those issues are also very applicationdependent problems [1].

12

2.4

Cloud Deployment Models

The NIST denition further categorizes cloud computing services according to deployment model. A CSPs deployment model species who can gain access to the cloud service. There are four such models in the NIST document: public cloud, private cloud, community cloud, and hybrid cloud. The public cloud model is the most relevant for this paper, because it presents unique security and privacy concerns.

2.4.1

Public Cloud

In the public cloud deployment model, the cloud infrastructure is available to the general public. As users join, their data is added to the shared public infrastructure. This is the most prevalent deployment model, encompassing all the popular cloud services such as Amazon Web Services (AWS) offerings (such as EC2), and Google Docs.

2.4.2

Private Cloud

In the private cloud model, cloud infrastructure is operated by and for a single organization. For instance, before Amazon opened up their storage cloud (now known as Amazon S3) to the public in 2006, that cloud was private.

2.4.3

Community Cloud

In the community deployment model, cloud infrastructure is shared by several organizations which may have common requirements or concerns. It is a private cloud shared by multiple organizations.

13

2.4.4

Hybrid Cloud

The hybrid deployment model describes two or more clouds (possibly of mixed deployment models) that are bound together by standardized or proprietary technology that enables data and application portability [10]. The bridging of clouds in this way is growing increasingly popular. Companies with private clouds, and that are expecting usage spikes, can divert excess trafc to a public or community cloud. This practice of load-balancing between clouds is called cloud bursting. Cloud bursting can be difcult to orchestrate, because different cloud services are not inherently interoperable [15]. We will see later that cloud bursting is a way to mitigate some risks of cloud computing.

2.5

Summary of Service and Deployment Models

As we have seen, cloud computing services have essential characteristics that emphasize how their services are delivered as utilities. Furthermore, weve explored the different service and deployment models of cloud computing. This paper focuses primarily on the most general cloud computing service model, cloud IaaS, which is of most signicance to large organizations considering cloud computing. Most security and privacy concerns that apply to the SaaS and PaaS models also apply to IaaS, and IaaS presents issues of its own. Furthermore, users of IaaS services are frequently SaaS providers, so IaaS security concerns are therefore relevant to SaaS users as well. For similar reasons, the focus of this paper is on the public cloud deployment model, which is the most popular and vulnerable of the deployment models.

14

2.6

Efciency of Cloud Computing

Weve explored how cloud computing can be an efcient alternative to non-cloud IT solutions. In particular, cloud computing can be a particularly appealing alternative to provisioning and maintaining a private datacenter. However, this is somewhat ironic, because CSPs have their own datacenters to run. How can cloud computing be an efcient alternative to datacenters, when CSPs have to maintain their own? Furthermore, how can CSPs make a prot doing it? The efciency of CSPs is primarily due to the unprecedented economies of scale CSPs have achieved, the location of cloud computing datacenters, and their use of statistical multiplexing to efciently allocate resources [1].

2.6.1

Size and Location of Datacenters

Cloud computing is more of an economic achievement than a technical one [1]. At the hardware level, there is nothing unique about the commodity-computers its datacenters contain. Rather, cloud computing derives most of its efciency from the economies of scale it achieves in those datacenters. These are extremely large-scale networks of computers and disk arrays, housed in massive warehouses. The size of cloud computing facilities is kept secret, but warehouses typically range from 30,000 to 50,000 square feet [12]. Their size allows them to obtain hardware and software very cheaply. The location of cloud computing datacenters is also important, because electricity and bandwidth are much cheaper in some areas. Electricity is an especially important consideration; in 2006, US datacenters consumed 1.5% of the countrys electricity production, at a cost of $4.5 billion [3]. Electricity consumption by CSPs is expected to double by 2011 [3]. To help mitigate the costs of electricity, CSPs build their giant datacenters near hydroelectric dams [3]. For example, many such datacenters are located

15

near rivers in the Pacic Northwest [12]. Because of their size and location, cloud computing centers can be operated at 1420% the relative cost of smaller centers [15].

2.6.2

Statistical Multiplexing

Cloud datacenters also make very efcient use of their infrastructure through statistical multiplexing. Statistical multiplexing is when a single resource is divided and shared among users according to their actual needs. As discussed, this contrasts with traditional hosting services that statically allocate resources to users, which leads to unused capacity. From the perspective of CSPs, statistical multiplexing allows them to cater to more customers than they could through static resource allocation, because no capacity is reserved by idle users [1]. By contrast, companies that maintain private datacenters must provision for peak load, and therefore have a lot of under-utilization in their datacenters. Estimates of private datacenter server utilization range from 5% to 20% of capacity [1]. Very often, statistical multiplexing is achieved via virtualization in the cloud computing context, which we will discuss in the next chapter.

2.6.3

Relative Rate of Industry Growth

As weve seen, the relative efciency of CSPs compared to non-cloud IT services is due to the size and location of cloud datacenters, and to the way CSPs dynamically allocate resources between users. This efciency is evidenced by the growth of the industry. Cloud computing services reported $17 billion in sales in 2009, and that number is expected to increase 159% to $44 billion by 2013 [15]. By comparison, non-cloud IT spending was $359 billion in 2009, and is expected to increase by only 16%, to $416

16

billion, over the same period [15].

17

CHAPTER 3 VIRTUALIZATION Cloud computing is based on pre-existing technologies. As such, the individual components that make up cloud computing are mostly familiar. However, the virtualization layer is an especially important aspect of cloud computing, and deserves a closer look. In this section well discuss what virtualization is, how and why its used by CSPs, and why its signicant in IaaS cloud computing in particular.

3.1

Virtual Machines (VMs)

Virtualization is the software implementation of a physical computer. It allows for the creation of virtual machines (VMs), which are software processes that each mimic the functionality of a physical computer. A VM runs its own operating system and applications; all software running on the virtualization layer is indifferent to the fact that the hardware responsibilities are being fullled by a software intermediary. Physical resources, such as disk drives, network cards, CPUs, and memory, are mimicked by the virtualization software. VMs interact with those virtualized resources, which in turn interact with their physical counterparts on a time-sharing basis. All interaction between VMs and physical resources takes place through the virtualization layer. Virtualization is used by the vast majority of public IaaS cloud services, which rent VMs to users [1]. Virtual machines create exibility for users, who can run arbitrary software (operating systems, web servers, databases, etc...) on them. This contrasts with non-virtualized hosting services, which may force users to use a particular operating system and software stack.

18

3.2

Multitenancy and Isolation

Since VMs are software processes, several of them can run simultaneously on one physical server. This condition is called multitenancy. Multitenancy is an important aspect of cloud computing because it allows for more efcient use of physical resources. First, it enables IaaS CSPs to offer a lot of granularity when renting out their computing power. For example, a CSP may have datacenters comprised entirely of servers with 8 CPU cores and 16 GBs of RAM. With virtualization, they can partition those high-end physical servers into two, four, eight, or any other number of more modest VMs, which helps them cater to a wider customer base. For example, Amazon EC2 offers Standard, HighMemory, and High-CPU VM instances, which also come in a variety of sizes (Small through 4XL), for a total of eight different types of VM. Also, multitenancy via virtualization increases the exibility of the underlying physical infrastructure, which can simultaneously host VMs that run different operating systems. VMs running on the same physical machine are isolated from one another. This means that processes running in one VM are not aware of processes running in coresident VMs. Even though the underlying hardware is shared by all co-resident VMs, the virtualization layer creates isolation between the virtualized versions of those resources. For example, the virtual disk spaces of two VMs do not overlap. A result of this isolation is that co-resident VMs are not affected if one of them crashes. This isolation of processes between co-resident VMs contrasts with processes in non-virtual environments, which can see and sometimes even communicate with each other [13].

3.3

Signicance in Cloud Computing

Virtualization makes possible all of the features of IaaS cloud computing. This is because VMs, being software processes, are signicantly easier to manage than physical

19

servers. For instance, they can be started, paused, stopped, copied, and relocated, all from a central control point. Furthermore, virtualization makes it possible for IaaS CSPs to give users control over their own instances. Notably, on-demand self-service is made possible via virtualization: CSPs can augment their virtualization software to allow users to initiate their own instances via a website and API. The control provided to users in IaaS is signicant from a security standpoint, because unlike PaaS and SaaS services, there is just one software layer separating users from the underlying infrastructure. This should not be an issue if virtualization software works as it is supposed to. In practice, however, virtualization software security is a major issue, which well see. Virtualization is also used in other cloud service models to efciently distribute resources between users [1]. In PaaS and SaaS, the cloud provider may use virtualization behind-the-scenes, in order to make better use of their own hardware. Virtualization makes little difference to the users of those services, because they are provided higher levels of abstraction to work with and never come into contact with the virtualization layer.

3.4

Hypervisor

A hypervisor, or virtual machine monitor, is the software process that manages the VM instances on a single physical computer. This software takes the place of platoons of system administrators, who would have to fulll the corresponding management roles in non-virtualized environments [12]. It directs the sharing of physical resources between co-resident VMs, creating the illusion that each VM is dealing with dedicated hardware. It is also responsible for isolating co-resident VMs from one another and from itself, so that a failure in one VM instance does not affect the host or the coresident VMs [1]. Furthermore, the hypervisor can perform security functions to prevent

20

malicious activity on its VMs. Additionally, administrative actions on VMs are conducted through the hypervisor, which has near complete control over the VMs it oversees. It can start, stop, pause, and restart its resident VMs. If congured with enough privileges, the hypervisor can monitor its instances application usage, and view and modify data on their virtual disks. Also, it can potentially monitor network trafc to and from its VMs, since all network trafc generally passes through the hypervisor [13]. IaaS cloud users are provided limited access to the hypervisor to manage their instances. For example, users must be able to start and stop their VMs, and track their resource usage. However, most of the functionality of the hypervisor is barred from users to prevent them from affecting other users domains. Users interact with the hypervisor via a web interface or API.

21

CHAPTER 4 SECURITY AND PRIVACY OVERVIEW

4.1

Introduction

The efciency of CSPs can be a double-edged sword. Cloud infrastructure can include more points of failure than traditional IT options, including hosting a private datacenter [1]. For example, by using a CSP to host their data or applications, organizations undertake risks associated with transferring data to and from the CSP. Also, the technologies used by CSPs, such as virtualization, can introduce vulnerabilities that are not concerns when using alternative services. Other cloud security issues are not unique to cloud computing. For example, when organizations host their data and applications with a CSP, they face risks associated with extending their network to include the CSPs network [9]. One such risk is that they will employ substandard identity and access management (IAM) mechanisms, so that people may gain unauthorized access to their cloud data [9]. That risk is also faced by users of non-cloud services. This paper focuses on identifying and assessing risks that are specic to cloud computing, or that are at least made more signicant by it.

Roadmap We will start by discussing what is at stake in cloud computing. We will look at cases of CSP downtime and data loss, which illustrate the potential fragility of cloud infrastructure. While those cases of downtime and data loss are not necessarily caused by security issues, they illustrate what little is required to compromise an entire cloud network. After that, we will look at vulnerabilities in cloud infrastructure that could lead to downtime, data loss, or data theft. We will rst examine the possibility of performing a cross-VM data snooping attack on a specic IaaS cloud user; in other words, we

22

will see how an attacker can use their VM(s) on Amazon EC2 to spy on the VM(s) of another user. That analysis will describe and assess the risks of targeted VM placement, cross-VM data leakage, and VM escape. Then, we will examine other risks related to virtualization. Notably, we will see that IaaS CSPs can introduce serious vulnerabilities into their network by augmenting their hypervisor. We will then discuss issues more specic to privacy. We will see why organizations are hesitant to entrust CSPs with their sensitive data and why encryption is difcult. Hierarchical encryption and fully homomorphic encryption will be discussed as partial solutions to that problem. Finally, we will examine some more general security and privacy concerns confronting cloud computing. These are the lack of industry standards and the risk of CSPs becoming central points of Internet control. We will see how these risks might be mitigated; namely, with open-source projects, and with services that create interoperability between CSPs.

4.2

What is at stake?

In order to understand the importance of security and privacy in cloud computing, it is important to look at what is at stake. In this section, we will look at CSP downtime and data loss incidents to see some of the effects cloud vulnerabilities can have. Most of the incidents described below were not caused by security vulnerabilities, but they show that CSPs are not immune to downtime or data loss and that elements of cloud infrastructure can be quite fragile. We will see in following sections how cloud security vulnerabilities might similarly threaten CSPs. Also, the incidents below show the wide-ranging impact of cloud vulnerabilities: problems at the infrastructure, platform, and application levels affect all the levels above, and ultimately affect the web application end-user. For instance, if Amazon EC2 were

23

to go down, its users (many of whom are SaaS providers) are affected, which in turn affects the end-users of the SaaS applications. This section focuses on vulnerabilities that were the responsibilities of the CSPs, but cloud users bear those responsibilities in different cloud service models. For instance, well see that Google AppEngine, a PaaS CSP, experienced downtime due to a platform programming error. However, in IaaS services, cloud users bear responsibility over the software platform. Even though CSPs were responsible for the problems below, cloud users bear responsibility for the platform and application levels in different service models. So, it is important to keep cloud infrastructure secure at each level, which may be the responsibility of the CSP or the cloud user.

4.2.1

Service Downtime

Cloud vulnerabilities have led to several well-publicized incidents of service downtime, which ultimately result in downtime of SaaS web applications hosted on those services. While not specically a security or privacy issue, this availability risk underscores the ease with which disruptions, malicious or not, can affect all users of a cloud service. There have been several well-publicized failures of popular cloud services. Below are some more notable incidents. All Google services interrupted for 14% of customers on May 14th, 2009, because Web trafc was accidentally redirected through Asia [11]. Apples MobileMe failed to provide access to e-mail (and some email was lost) shortly after launch, on February 15th, 2008 [11]. Amazon S3 downtime due to authentication service overload: 2 hours of downtime (2/15/08) [1] Amazon S3 downtime due to a single bit error: 6-8 hours of downtime (7/20/08) [1]

24

AppEngine downtime due to programming error: 5 hours of partial downtime (6/17/08) [1] Gmail downtime due to contacts system failure: 1.5 hours of downtime (8/11/08) [1] These examples illustrate the potential consequences of malicious activity against CSPs. However, its worth mentioning that despite these and other failures, cloud providers have a relatively good track record of availability compared to their non-cloud counterparts [1]. Nonetheless, it is still important for critical applications to avoid single points of failure. In fact, availability concerns are seen as the number one obstacle preventing further adoption of cloud services by large organizations. Such organizations need contingency plans for such outages, and cannot rely on a single provider [1]. In order to mitigate the risk of cloud downtime (regardless of the cause), some cloud users (web-application developers) are using technologies such as Google Gears, Adobe AIR, and Curl that let cloud-based applications run locally [11]. Those tools give the application access to local computing resources, increasing the exibility of those apps, and decreasing the end-users dependence on a cloud services availability. Some of those technologies even allow cloud-based applications to run off-line [11]. A more general solution for achieving very high availability is to gravitate towards using multiple cloud providers, in the same way that Internet service providers rely on multiple network providers [1]. The main challenge of this approach lies in creating standards; todays clouds are not inherently interoperable, which is one of the main issues on the table for cloud computing [1]. We will discuss the lack of industry standards later on.

4.2.2

Data Loss

The risk of data loss, like that of downtime, is not new. Also, these examples of data loss show the widespread and catastrophic result of malicious activity. There have been 25

several notable instances of data loss, listed below. Ma.gnolia.com, a bookmark sharing site, lost half a terabyte of users data in January 2009. This failure caused the service to shut down for eight months [2]. Danger, a subsidiary of Microsoft that provides cloud storage, experienced a server failure in October 2009. As a result, a million T-Mobile Sidekick phones lost data, some of which was not recoverable [15]. Vaserv.com, a website hosting service, had a vulnerability that resulted in the destruction of 100,000 sites it was hosting [9]. These examples illustrate the potentially catastrophic consequences of data loss. Preventing data loss, like ensuring service availability, is an operational issue for CSPs that is not unique to the cloud. However, these examples show the potential results of malicious activity against CSPs. In fact, the Vaserv.com example was the result of an attack against that provider, which well discuss later.

4.2.3

Signicance of Incidents

Regardless of whether downtime and data loss incidents stem from mistakes or maliciousness, they are no less signicant. In this section, we considered well-publicized cases of such incidents, to show what is at stake in cloud computing. In the next chapters, we will see how security vulnerabilities could lead to similar incidents of downtime and data loss, as well as data theft.

26

CHAPTER 5 CLOUD INFRASTRUCTURE VULNERABILITIES Cloud infrastructure presents security issues at the network, host, and application levels [9]. Network-level security issues stem from the remote location of cloud computing centers, and the network topology of CSPs. Host-level issues stem primarily from virtualization vulnerabilities. Application-level concerns stem from software development challenges presented by cloud computing, such as scalability. This section addresses network- and host-level vulnerabilities of IaaS services such as Amazon EC2. Application-level concerns are generally not specic to cloud computing, and fall outside the scope of this paper. In the next section, we will look at how an attacker might go about stealing data from other Amazon EC2 users. This attack involves targeted VM placement and VM escape, which illustrate network- and host-level vulnerabilities respectively. Following that, we will look at other host-level vulnerabilities in the virtualization layer, and briey look at some application-level vulnerabilities.

5.1

Cross-VM Data Snooping in IaaS

The risk of cross-VM data snooping in IaaS has been raised in popular and academic articles [14] [15]. In this attack, a malicious user of an IaaS cloud service, such as Amazon EC2, spies on another users activity. The attacker must initialize a VM on the same server as a target VM (targeted VM placement), and then spy on that VM via cross-VM data leakage or VM escape [14]. In this section, we will look at the steps necessary to carry out the attack, and assess the risk or likelihood of each step.

27

5.1.1

Signicance

This attack is specic to IaaS cloud services, such as Amazon EC2, that use virtualization (most if not all IaaS services use it). However, the risk of this attack applies to more than IaaS users; it also applies to users of services that are hosted on IaaS infrastructure. For instance, popular SaaS services, such as Twitter and Target.com, use one or more IaaS services offered by Amazon Web Services (AWS). Therefore, such an attack would affect users of those services as well. The possibility of such an attack stems primarily from two aspects of IaaS cloud computing [14]. First, it is difcult for IaaS services to conceal their network topology. This makes it possible for an attacker to discover the location of a particular VM (the target VM) on the network and to determine whether their own VM resides on the same physical server as the target VM. Coupled with the ease of rapidly provisioning many VMs, the relative transparency of IaaS network topology may make it possible for an attacker to efciently initialize a malicious VM on the target VMs server. The autoscaling feature that IaaS services provide may increase this risk. Second, the risk of data snooping stems from multitenancy; users of IaaS services rent VMs that run alongside other VMs on the same hardware. Despite the theoretical isolation of VMs from one another, it may be possible to escape from those connes. We will now explore how one group demonstrated targeted VM placement, the rst step in a cross-VM data snooping attack, on Amazon EC2. Following the explanation of the attack, we will assess its risk. Then, we will address VM escape, the second part of cross-VM data snooping attack.

28

5.2

Targeted VM Placement

Targeted VM placement is when an attacker places a malicious VM on the same physical machine of a target applications VM. This is a network vulnerability of IaaS cloud services that increases the risk of cross-VM data snooping attacks against specic targets. Targeted VM placement involves mapping the public cloud infrastructure, locating the target applications VM, and initializing a malicious VM on its host machine. This sort of attack is not supposed to be possible, because IaaS CSPs attempt to keep their network topology private; users can only specify the rough geographic region (such as US West or US East) where their VM is hosted. This vulnerability was nonetheless demonstrated last year by researchers at UCSD and MIT, who mounted such an attack on Amazons EC2 [14]. Even though this attack was conducted on EC2, it is thought to be possible on other IaaS services, such as Microsofts Azure and Rackspaces Mosso, because they provide the same functionality that made the EC2 attack possible [14]. This section describes that groups approach, as presented in [14]. First, the team probed the EC2 network to gain an understanding of the internal IP addressing scheme. Then, they gured out how to determine whether their malicious VM was on their target VMs physical server, a condition called co-residence. Finally, they derived a strategy to place their malicious VM on the same physical machine as the target.

5.2.1

Mapping the EC2 Network

The teams rst step was to map out EC2s network topology. Their goal was to be able to derive the VM type and availability zone of any target web application hosted on EC2. VM type corresponds to the computing resources available to a VM; at the time, Amazon offered ve options ranging from Small to XXL. VM type is important to know, because EC2 servers house VMs of the same type; if the target VM is of type Medium,

29

the attacker must also initialize Medium VMs. Availability zone is also important; it is the only geographic placement option offered in EC2, and zones are isolated from one another. Users can choose from US West, US East, Europe, or Asia-Pacic. The research groups approach was to initialize VMs of different types in different zones, in order to see if there was any correlation between those VMs internal IP addresses (internal to the EC2 system) and VM type or zone. They found that EC2s internal address space was indeed partitioned according to both VM zone and type. From the data they gathered, the researchers could determine the VM type and zone of any VM hosted on EC2. Another important step is to determine the internal IP address of an EC2-hosted service. This is a trivial step, because all EC2 users have access to a DNS server that maps external (public-facing) IPs to internal ones. So, any EC2 user can run a DNS query on the external IP of an EC2-hosted web application to obtain the internal IP. Also, running this on any public web service will reveal if that service is hosted on EC2. With their map of the EC2 network and EC2s DNS server, the group could identify whether an application was hosted on EC2, and the availability zone and type of its VM. The research teams success in mapping EC2s topology was partly due to the fact that EC2s internal address space was partitioned according to availability zones and instance types. In their paper, the group encouraged the development of a more opaque IP addressing scheme. In addition, they suggested that use of EC2s DNS server be restricted by using VLANs or bridging. However, even with those two suggested patches in place, the group suggested that EC2s topology could still be derived by using ping timing measurements and traceroute analysis, because such an approach would be independent of the internal IP addressing scheme. So, even though Amazon announced that the VM-placement vulnerability has since been xed (it did not disclose how), the precursor problem of mapping out a public cloud seems like it should always be possi-

30

ble [15]. This demonstrates that such vulnerabilities are probably inherent to IaaS cloud computing.

5.2.2

Determining Co-Residence

Determining a test for co-residence, whether a malicious VM is on the same physical server as the target VM, was also simple. The researchers noted that EC2s Xen-based hypervisor process ran in its own VM, alongside users VMs on a server. Also, the hypervisor routes network trafc between VMs and the network. This means that the VM housing the hypervisor has an IP address that can be used to identify the server. Furthermore, the hypervisors VMs IP address can be found with a traceroute analysis, which returns the IPs of routers between the malicious VM and the target. However, it is sufcient to check the number of IPs returned by a traceroute analysis: if a message only passes through one router before arriving at the target VM, then that router must correspond to that machines hypervisor, thereby indicating that co-residence was achieved. The group also demonstrated that packet round-trip times could be used as a reliable indicator of co-residence. This shows that the ability to determine co-residence is probably an inherent vulnerability of IaaS services.

5.2.3

VM placement

The UCSD/MIT team now knew what type of VM instance to initialize in which availability zone, and had a reliable test for co-residence. The next step for the group was to nd a way to initialize a malicious VM on the same physical machine as the target. They explored a brute force approach and a targeted approach.

31

Brute-force approach In their brute force approach, they would simply initialize malicious VMs of the same type and in the same availability zone as the target VM(s). They would do this iteratively, launching VMs just long enough to test for co-residence. The group performed this test with a target set of 1686 VMs (public services they had determined were hosted on EC2), launching a total of 1785 probe VMs over 18 days. They found that these probe VMs achieved at least 8.4% coverage of the target set. The group used very conservative parameters in conducting their experiment, so actual coverage was probably signicantly higher. This indicates that even simple brute-force approaches can successfully achieve co-residence against a large target set.

Targeted approach In their targeted approach, the group took advantage of the auto-scaling feature of EC2, and the tendency of newly initialized VMs to be located on the same server. The researchers forced their target application to spawn new VM instances, by overloading its existing VMs with requests. Simultaneously, they engaged in instance ooding, simultaneously initializing as many malicious VMs as possible. They found that VMs initialized around the same time had a higher likelihood of spawning on the same server. This strategy successfully placed malicious VMs 40% of the time when instance ooding with only 20 malicious VMs. The groups success with this approach was dependent on the target application having an auto-scaling system, so that a spike in trafc would automatically initialize new VM instances. It is reasonable to assume that many web applications hosted on EC2, especially large ones, initialize VMs via auto-scaling, because it is a major feature of IaaS CSPs. For example, the Animoto example illustrates the usefulness of this feature. However, a similar approach can be used if the target has no auto-scaling system. By

32

repeatedly querying the target application, an attacker can determine when a VM instance is taken down, and when a new instance appears (by noting when the IP address changes). When a new instance is spawned, the attacker can engage in instance ooding to the same effect. Demonstrating targeted VM placement was a signicant result. Notably, it showed that IaaS network topology is fairly vulnerable to mapping. Also, it made the threat of cross-VM attacks more imminent, because VM escape, the other step required in such attacks, is thought to be possible. Despite Amazons undisclosed x to the problem of targeted VM placement, it seems likely that it remains a possibility. New VM instances must appear somewhere; by triggering auto-scaling scripts while simultaneously instance ooding with many VMs, it seems that determined attackers could still achieve co-residence with their targets. In the worst case, the attacker could simply carry out a brute-force approach. As the UCSD group demonstrated, even a very naive brute-force approach can be fairly successful.

5.2.4

Network Vulnerabilities and Possible Fixes

Another of EC2s weaknesses was the tendency of servers to have several vacancies. Also, VM instances initialized around the same time would often be serially assigned to those vacancies. Without multiple vacancies per server, and serial placement of new VMs in those vacancies, targeted placement would be a lot more difcult to achieve. A partial x to this problem would be to ensure that servers do not accept several new VMs at once. This could be achieved by placing all servers with vacancies into a queue. The server at the head of the queue accepts the next new VM, and then returns to the tail of the queue if it still has vacancies, or is removed from the queue. That way, each machine could accept at most one new VM at a time. However, this is still a difcult problem, and no placement algorithm is likely to pre-

33

vent the possibility of targeted VM placement. This is because there arent overly many physical servers of a particular instance type in each availability zone. For instance, in their brute-force VM placement attempt, the UCSD/MIT researchers initialized VMs on only 78 physical servers. So, when a target VM is initialized on a server, it would be fairly easy for the attacker to cycle through the queue, placing a malicious VM on each server in the queue. Instance ooding in that manner would achieve co-residence unless the targets server was removed from the queue (for having no more vacancies). The solution suggested by the UCSD/MIT research team was to let users control which servers their VM instances appear on. Users could either pay for the servers unused capacity, or specify a list of approved co-residents. In other words, their suggestion was for Amazon to avoid the problem altogether, by offering both private- and community-cloud services in addition to their public cloud service. Even though their one proposed solution seems like a cop-out, it is nonetheless a guaranteed patch for all VM placement vulnerabilities. Also, their proposed solution suggests that the public cloud model of IaaS may simply not be worth the risk for large organizations, given the much more secure option of a fenced-off cloud. Amazon has taken the research groups suggestion, which is very telling of the public clouds network vulnerabilities. Amazons Virtual Private Cloud (VPC) service, which features network isolation and secure connectivity, is currently in its beta release. Following the announcement of the service, the vice president of Amazon Web Services indirectly acknowledged the imperfect security of the standard EC2 service, while explaining that there is a set of customers and class of applications asking for even more enhanced levels of security than our existing services provided [15]. It remains to be seen exactly how vulnerable public IaaS cloud services are, but it is clear that the more isolated virtual private cloud model avoids concerning issues. Also, it does so at little extra cost. Renting VMs by the server-load, rather than individually,

34

would not be prohibitive for companies like Animoto that use hundreds of servers worth of VMs.

5.2.5

Risk Assessment

The UCSD/MIT research group unveiled network-level vulnerabilities that are inherent to the public IaaS service model. IaaS CSPs like Amazon EC2 can make targeted VM placement more difcult, but cannot prevent it outright. Therefore, this is a risk that confronts users of such services. However, VM placement is benign by itself. An attacker achieves nothing with coresidence alone; they must also launch an attack from their malicious VM. Furthermore, Amazons VPC service eliminates the vulnerability entirely. VPC forsakes multitenancy and its benets, but the efciency loss of renting VMs by the server-load is trivial for large applications. In summary, targeted VM placement is easily avoidable through VPC, but nonetheless confronts many IaaS users. In the next sections, we will look at the next steps required for cross-VM data snooping.

5.3

Cross-VM Data Leakage

The UCSD/MIT group that demonstrated targeted VM placement also demonstrated how cross-VM data leakage could be exploited to monitor the targets activity. They used side channels to monitor activity of co-resident VMs to a very limited extent. Because statistical multiplexing is performed to allocate resources between co-resident VMs (such as bandwidth and CPU memory caches), they found that they could derive co-resident VMs activity levels from the extent to which they could access those resources. They would regularly poll those resources to judge their availabilities or laten-

35

cies, which would indicate to what extent co-resident VMs are using them. The group demonstrated that they could determine the rate of web trafc of a co-resident target VM if that target was the only other VM on the server. Like targeted VM placement, exploiting cross-VM information leakage is very hard for IaaS services to prevent. It can be done, but may require too much overhead. For instance, they could insert random time delays into their statistical multiplexing algorithms, or make it more difcult for a VM to measure resource latency. Either option would hamper the services functionality. A target VMs activity level can be used as part of an attack, but its very unlikely that any more meaningful information could be gleaned from side channels. This problem is difcult, because you must be assured that the malicious VM and the target are the only two VMs on the server. Testing for that condition can be difcult, unless an adversary has a priori information about load variation on the target and this load variation is (relatively) unique to the target [14]. It seems there would be all manner of other challenges, like how to prevent other VMs from initializing on that machine. So, the practical implications of cross-VM data leakage on EC2 are very limited.

5.4

VM Escape

It may be possible to derive more than the activity level of a co-resident target VM. VM escape involves accessing a co-resident VMs data or processes. To do this, the attacker must escape the isolation the hypervisor is supposed to create.

5.4.1

Cross-VM Attacks

Isolation of co-resident VMs from one another, and from the host system, is a major feature of virtualized environments. Such isolation means that applications running on

36

one VM are invisible to those running on a co-resident VM, and have well-dened interactions with the hypervisor. However, this isolation may not always be assured; virtualization software can have vulnerabilities that compromise the compartmentalization of user domains [13]. As we saw, it is possible to glean some information about co-resident VMs activity levels via cross-VM data leakage. In VM escape, an attacker more explicitly gains access to a co-resident VMs data or processes, or gains unauthorized access to the hypervisor. Cross-VM attacks can occur if virtualization software fails to sufciently isolate coresident VMs from one another. Since several VMs can reside on one physical server, that servers computing resources, such as its CPU data caches, memory, and networking infrastructure, are shared by all its users. Although the hypervisor is meant to keep VM instances isolated from one another, it has been suggested that it is possible to extract data from shared resources (e.g., CPU data caches) [15]. This speculation is based on research that shows its possible to extract sensitive data, such as RSA and AES secret keys, from CPU data caches in non-virtualized environments [14]. Though it hasnt yet been demonstrated, it has been suggested that such data snooping is also possible in virtualized environments [14]. Supporting that possibility is Microsoft, which is exploring technologies that are expected to make such attacks impossible in a couple of years [15]. Cross-VM attacks can theoretically be conducted in other ways as well. For example, in some hypervisor implementations, VMs are connected to the host machine via a virtual hub or switch, creating the possibility of data snooping on other VMs network trafc [13]. This could allow VMs to snoop data packets intended for co-resident VMs, or even to redirect those VMs packets via ARP poisoning [13]. However, this is not likely in a well-managed cloud computing environment, because properly conguring network trafc authentication can prevent that possibility [13]. However remote, pos-

37

sibilities like this underscore the dangers of improper conguration of the omnipotent hypervisor. The examples of downtime and data loss we saw show that small mistakes like that can occur.

5.4.2

VM-to-Host Attacks

Another form of VM escape is when a malicious VM gains access to the host machines processes [13]. This VM-to-host attack is more dangerous than cross-VM attacks. By accessing the host, the attacker effectively gains root access to all other user domains (VM instances) on that machine [13]. Furthermore, gaining access to the hypervisor process may provide wider access to the entire cloud network, as well see in the next section. Clearly, this is worrying in a public cloud computing context. Luckily, this form of attack, like cross-VM attacks, would not be easy to carry out. It may only be possible if the VM software has bugs or is improperly congured [13]. Nonetheless, those possibilities should not be discounted. As well see, a bug in Vaserv.coms virtualization software resulted in an attack that destroyed 100,000 websites that website hosted.

5.4.3

Risk Assessment

Despite the possibility of VM escape being mentioned in several articles on cloud computing, it is not a likely threat in that context. This is because it has not been demonstrated in a meaningful way in a non-cloud setting. Virtualization software is not immune to bugs, and improperly congured virtualization software could make VM escape possible. However, these risks do not seem greater than with any other software. Since VM escape is next to impossible, cross-VM data leakage may be the only way to spy on another users VM. As we saw, exploiting data leakage via side channels does not reveal much beyond the target VMs activity level, and testing for that condition

38

can be difcult. So, from a practical standpoint, this means that data snooping in IaaS services is not a realistic security concern.

5.5

Attacks on the Virtualization Layer

Virtualization software, which is invisible to the cloud user and managed by the CSP, is a fundamental component of many cloud services. As weve seen, the virtualization layer sits right above the hardware layer and is responsible for creating, managing, monitoring, and deleting VM instances. Because virtualization software plays such a critical role, there is an ongoing arms race between would-be attackers and CSPs to discover vulnerabilities in virtualization applications [9].

5.5.1

Attacks on Virtualized Machines

Virtualized machines (physical ones running virtualization software) are subject to the same risks as non-virtualized ones [13]. However, the risk of an attack against a virtualized environment are greater, because more users may be affected when a virtualized machine is compromised. Furthermore, the virtualization layer introduces potential points of failure that can facilitate such attacks. The hypervisor, which manages a physical servers VM instances, is the additional attack surface presented by virtualization [13]. This risk is particularly relevant to cloud computing, because in order to provide on-demand service, IaaS services must augment their virtualization software. In doing so, they can introduce potential avenues for unauthorized access to the hypervisor. IaaS necessitates more interaction between the hypervisor and users than is required in other virtualized environments, which introduces additional risks. More communication is necessary between users and the hypervisor because IaaS services, like all cloud

39

computing services, allows for on-demand self-service. Cloud users can manage their own VMs via web interfaces and APIs, without relying on the IT staff at the CSP. This contrasts with non-cloud companies that offer virtual private hosting; such companies manage their clients VM instances themselves. The additional functionality offered by IaaS CSPs creates more potential for vulnerabilities in the virtualization layer. Also, many (if not all) IaaS CSPs allow users to set up auto scaling functions, to automatically commission or decommission VM instances as needed. Auto scaling requires that the hypervisor respond to cues from the VMs, increasing the complexity of interaction, and thus the potential for errors, between those fairly compartmentalized entities.

5.5.2

Vaserv.com Hypervisor Vulnerability

Preventing attacks on the hypervisor is of particular concern in cloud computing; such breaches can compromise an entire clouds data [9]. One example in particular illustrates the danger posed by bugs in hypervisor software. Vaserv.com used a virtualization application called HyperVM to manage its public hosting service. In 2009, a zero-day vulnerability (one that was known to hackers, but not to the applications developers at the time) in HyperVMs hosting management interface was successfully exploited [9]. This bug enabled the attackers to execute privileged commands, including rm -rf (which recursively deletes all les), on the companys entire hosting network. The attack was conducted through a hosting management interface [9]. This attack destroyed 100,000 websites hosted by Vaserv.com; half of that data was irrecoverable [9]. Although bugs in the virtualization layer are not uniquely relevant to cloud computing, this example shows the extreme impact that hypervisor vulnerabilities can have on cloud computing. Also, such attacks against the hypervisor underscore the primary importance of securing the virtualization layer, even above the individual virtual machines [13].

40

5.5.3

Summary

In summary, virtualization software creates security concerns specic to IaaS cloud computing services because those services must augment the hypervisor. In doing so, they create more potential points of failure, as we saw in the case of Vaserv.com.

41

CHAPTER 6 PRIVACY AND ENCRYPTION Organizations are hesitant to entrust CSPs with unencrypted sensitive data for various reasons. They may worry about data theft, data misuse by the CSP, or the CSP receiving a subpoena, which could force it to reveal all its data [1]. One study identies data condentiality and auditability issues as the third most signicant obstacle to wider adoption of cloud computing [1].

6.1

Difculty of Encryption

Encrypting cloud data is a solution to the condentiality problem for some organizations. Storing encrypted data in the cloud is very safe; it can even be safer than storing unencrypted data locally [1]. However, encrypting data limits its usefulness in the cloud. This is because encrypted documents must rst be decrypted before they can be searched or otherwise manipulated. Performing decryption and encryption on large data sets, or very frequently, can be prohibitively expensive and time consuming. As a result, organizations tend to use public cloud services only for storing unencrypted cloud data [15]. Organizations that are still inclined to work with encrypted cloud data can do so in two ways. The rst option is to decrypt that data in the cloud, work on it remotely, and then re-encrypt the modied data. The second option is to transfer the encrypted data to a local machine before decrypting it and working on it locally.

6.1.1

Working in the Cloud

The rst option, of performing cryptographic functions in the cloud, is fairly secure. This involves decrypting data in the cloud, working on it locally, and then re-encrypting it. Of course, this means that cloud data will be unencrypted for a period of time. However, we saw that VM data snooping attacks are unlikely, and this momentary exposure 42

isnt seen as a great risk. The problem with performing encryption and decryption on cloud infrastructure is that those operations are resource-intensive. Cloud users pay for the resources they consume, so this option is generally prohibitive [15]. The cost of performing cryptographic functions is evidenced by some CSPs lack of support for encrypted storage. For example, Amazons Elastic Block Storage (EBS), a database abstraction meant for use with EC2, does not provide the option of encryption. They suggest on their website that users instead run an encrypted le system on top of their EBS volumes, thereby off-loading the responsibility to those users. EBS lack of support for encryption emphasizes that encrypting and decrypting in the cloud is not free.

6.1.2

Working Locally

The alternative to remote encryption/decryption is to perform those operations locally, storing only encrypted data in the cloud. However, this approach can be even more prohibitive than performing those operations in the cloud. This is because data transfer costs are relatively high; costs can build up quickly when transferring large amounts of data. For instance, Amazons S3 service charges between $80 and $150 per terabyte retrieved from their servers (transfers to S3s servers are currently free). Ironically, S3s per-terabyte transfer cost is very near the cost of purchasing a consumer disk drive of that size. In fact, the high cost of data transfer means that performing cryptographic functions locally, on large amounts of data, can be astronomically inefcient. This point was dramatically illustrated in a 2009 study by cloud computing researchers at Berkeley [1]. The group found that shipping ten one-terabyte hard disks from UC Berkeley to Amazon S3 servers in Washington state via overnight delivery was vastly more cost-effective than

43

sending the same amount of data electronically. Even when assuming an optimistic 20 Mbit/second connection (actual speeds were about half that), transferring 10 terabytes of data would take over 45 days, at a cost of $1000. By comparison, they found that shipping ten disks overnight would have cost $400, and the effective bandwidth would have been about 75 times greater. Amazon has responded to this issue with their ImportExport service, which facilitates sending hard disks back and forth via airmail. This problem stems from the fact that network cost-performance is growing more slowly than disk capacity; pricing trends suggest that shipping disks via overnight delivery will become increasingly attractive, despite the risks that option creates [1]. In fact, data transfer bottlenecks are a signicant obstacle to wider adoption of cloud computing [1]. Clearly, it behooves large organizations to limit unnecessary data transfer via the Internet. For this reason, shuttling encrypted data back and forth to work on it locally is not a viable option for many organizations. As we saw, performing cryptographic functions on cloud infrastructure can also be prohibitive. The result is that cloud computing services are not generally used for sensitive applications, such as medical and nancial ones [15]. Despite the difculties of storing encrypted data on cloud infrastructure, encryption nonetheless ensures privacy, and is appropriate for some applications [1]. The issue is efciency, which is impacted by bandwidth limitations and computational resource costs. As well see in the next section, researchers are looking for ways to make encrypted data more useful on cloud computing infrastructure.

6.2

Fully Homomorphic Encryption

Fully homomorphic encryption is the best expected solution to most cloud-computing privacy concerns [15]. It is an encryption scheme that allows for arbitrary manipulations on encrypted data, eliminating the need to decrypt that data before working on

44

it. With such technology, cloud applications could process data blindly, returning encrypted information that could only be read by the private-key holder. For example, a tax application could take in your encrypted nancial data and return an encrypted tax form, without ever knowing the actual information you submitted [7]. In order to permit arbitrary manipulations of encrypted data, such a scheme must preserve two seemingly unlikely relationships between inputs. The sum of inputs X and Y, when encrypted, must equal the sum of the encrypted versions of those inputs [7]. The same must hold for multiplication. When both operations preserve those relationships, arbitrary manipulations can be performed on encrypted data that will have the same effect on the non-encrypted data. Thirty years ago, Ron Rivest, a co-creator of the RSA cryptosystem, posited that fully homomorphic encryption was an impossible problem [6]. Nonetheless, a solution was found last year by a researcher at IBM [6]. The current solution is extremely inefcient and is not yet practical to implement. However, fully homomorphic encryption is expected to become a reality in about a decade [6]. Even though that technology is not currently practical, a similar idea is being developed for performing blind searches on encrypted meta data [15]. With such technology, users can make blind queries on encrypted documents. The user rst encrypts a query string, and then compares that encrypted string with the encrypted meta-data terms. Documents matching the terms will be returned, but the server will have no knowledge of the contents of the query or of the documents. Microsoft Research is working on a new architecture to facilitate this form of encryption [15].

6.3

Hierarchical Encryption

Hierarchical encryption is a way to marginally increase the usefulness of encrypted cloud data, and could help expand the possible uses of cloud computing [15].

45

Hierarchical encryption lets users provide different levels of access to a single encrypted document. For example, in a personal nance application, the user would hold the master key, and could grant subkeys to other people which would provide limited access to sections of that information [8]. Another use of hierarchical encryption is for electronic medical records: doctors and insurance companies could be granted subkeys in order to decrypt selected parts of the entire record. This is useful for cloud computing because it limits the amount of decryption that has to be performed in the cloud, and reduces the need to store redundant information.

46

CHAPTER 7 GENERAL ISSUES

7.1

Lack of Industry Standards

The lack of industry standards is seen by some as the largest issue confronting the future of cloud computing [11]. Cloud computing providers are not inherently interoperable, so by choosing one, an organization can effectively limit its exibility by locking itself to that provider. Cloud providers adopting standards would make that a non-issue; interoperability between providers would let users move their data and applications between CSPs. However, CSPs are hesitant to adopt standards, as it is not yet clear whether such action would be benecial to those companies [11]. A vice president at Amazon Web Services echoes this concern: We think its very early to understand not only what the standards are, but along what dimensions standards are even useful [11].

7.2

Data Lock-In and Cloudbursting

Since there are no standards, it is difcult for organizations to switch from one IaaS CSP to another, or to integrate two or more clouds. This creates the potential for data lock-in, which is when cloud users have no option but to stay with a particular CSP, because their applications will only work on that providers systems [1]. This risk is seen as a very signicant obstacle to adoption of cloud computing, which also has to do with uncertainty over the longevity of CSPs [1]. However, there are companies that assist in the process of migrating applications and data between different CSPs, called cloud bursting [11]. A related use of cloud bursting is for organizations who want to use public cloud infrastructure, but only to handle their

47

excess compute capacity. This is a popular way for organizations to integrate public cloud computing into their current systems (often their private cloud), because in many cases they have pre-existing IT infrastructure. IT companies that assist in cloud bursting services are growing popular because of the difculty and desirability of this option [11].

7.3

Open Source Initiatives

Some commentators believe that open-source projects are very important to establishing industry standards that would make companies more willing to use cloud computing [11]. This is partly because security researchers cannot examine proprietary software for vulnerabilities, making it more likely that those vulnerabilities will be discovered only by hackers [9]. There are two main open-source projects: Eucalyptus and Hadoop. Eucalyptus lets organizations create their own cloud on preexisting infrastructure, and features an interface similar to EC2s [11]. Hadoop contains software libraries for manipulating large amounts of data in parallel across many machines, much like Googles MapReduce [1].

7.4

Central Points of Internet Control

As more individuals and organizations entrust their data and applications to cloud providers, the more centralized the Internet becomes [15]. The growing popularity of the cloud challenges the original conception of the Internet as a fundamentally decentralized entity [15]. Some are concerned that this will lead to cloud providers becoming central control points of the Internet, increasing the likelihood of government regulation or other forms of censorship [15]. Again, adoption of standards is seen as a potential solution to this problem, because it would save users from getting locked in to particular CSPs [11].

48

CHAPTER 8 CONCLUSION As Larry Ellison, Ron Rivest, and others have implied, the term cloud computing has become an over used buzz word. Popular media seems to take every opportunity to comment on the broadly dened computing paradigm. Moreover, articles tend to emphasize either the awesome potential of the cloud or the dangers it presents. Notably, when popular web applications experience downtime, or their employees passwords get stolen, columnists warn readers about security and privacy issues of the cloud. As weve seen, some cloud computing security risks may be exaggerated. Notably, targeted VM placement, although an inherent vulnerability of public IaaS cloud infrastructure, does not have many practical implications. Monitoring cross-VM data leakage has been demonstrated, but reveals little, and would be difcult to implement against a real-world target. Various forms of VM escape are thought to be possible, but seem extremely unlikely. Furthermore, the industry has been responsive to perceived threats. Amazon, for example, now offers its Virtual Private Cloud Service, which eliminates many network- and host-level risks. Cloud computing does, however, present new and real security challenges. Notably, the virtualization layer presents new risks specic to IaaS cloud computing, which stem from the necessity of augmenting the hypervisor to accommodate on-demand service. This risk was evidenced by the vulnerability in Vaserv.coms hosting management interface that resulted in the destruction of 100,000 websites. That and other security risks are worrying, but they still arent among the most signicant obstacles to wider adoption of cloud computing. Privacy issues are more important to the future of this computing paradigm. We saw that encryption is difcult to perform in the cloud, which creates privacy risks for several reasons. When encryption schemes such as hierarchical encryption and fully

49

homomorphic encryption become realities, cloud computing will be suitable for more types of applications. There is no easy answer as to whether cloud computing is any less secure than noncloud IT alternatives, because that answer is very application dependent. However, the dynamic cloud computing industry has been responding to security and privacy issues as they arise. This is evidenced by the relatively good service availability track records of CSPs compared to private datacenters. Furthermore, Amazons recent addition of their VPS service shows how CSPs are catering to security- and privacy-conscious customers. The industry sales gures, which are rapidly outpacing those of traditional IT services, reect the industrys responsiveness to those concerns. Due to the tremendous efciency of cloud computing and the pressure exerted by large organizations to make cloud computing secure, it is very likely that the cloud will only continue to get safer.

50

BIBLIOGRAPHY [1] M. Armbrust, A. Fox, R. Grifth, A.D. Joseph, R.H. Katz, A. Konwinski, G. Lee, D.A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. Above the clouds: a Berkeley view of cloud computing. Tech. Rep., February 2009. [2] Michael Calore. Ma.gnolia suffers major data loss, site taken ofine. Wired.com, January 2009. [http://www.wired.com/epicenter/2009/01/magnolia-suffer/]. [3] Stephen Cass. Map: Water-powered computers. Technology Review, 113(1), July/August 2009. [4] David Chiu. Elasticity in the cloud. Crossroads, 16(3):34, 2010. [5] Dan Farber. Oracles Ellison nails cloud computing. in CNET Outside the Lines [http://news.cnet.com/8301-13953 3-10052188-80.html]. [6] Craig Gentry. Fully homomorphic encryption using ideal lattices. In STOC 09: Proceedings of the 41st annual ACM symposium on Theory of computing, pages 169178, New York, NY, USA, 2009. ACM. [7] Andy Greenberg. IBMs blindfolded calculator. Forbes Magazine, July 2009. [http://www.forbes.com/forbes/2009/0713/breakthroughs-privacy-supersecret-encryption.html]. [8] Jeremy Horwitz and Ben Lynn. Towards hierarchical identity-based encryption. Advances in Cryptology EUROCRYPT, pages 466481, 2002. [9] Tim Mather, Subra Kumaraswamy, and Shahed Latif. Cloud Security and Privacy. OReilly, 2009. [10] Peter Mell and Tim Grance. The NIST denition of cloud computing. October 2009. [11] Erica Naone. Industry challenges: The standards question. Technology Review, 113(1), July/August 2010. [12] Erica Naone. Technology overview: Conjuring clouds. 113(1), July/August 2010. Technology Review,

[13] Jenni Susan Reuben. A survey on virtual machine security. Helsinki University of Technology, 2007. 51

[14] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds. In CCS 09: Proceedings of the 16th ACM conference on Computer and communications security, pages 199212, New York, NY, USA, 2009. ACM. [15] David Talbot. Security in the ether. Technology Review, 113(1):3637, January/February 2010.

52

You might also like