You are on page 1of 7

International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.

1, January 2013

SCALABLE CLUSTER BASED CLOUD STORAGE


Parinaz Eskandarian Miyandoab1 and Jaber Karimpour2
1

Department of Computer Engineering, Islamic Azad University, Zanjan Branch, Zanjan, Iran
parinazeskandarian@yahoo.com
2

Department of Computer Science, University of Tabriz, Tabriz, Iran


karimpour@tabrizu.ac.ir

ABSTRACT
We consider a cloud system that has to save lots of files and has to use hundreds of computers. The existing cloud storage designs are not scalable enough to support such a huge number of nodes. In this paper, we propose a novel cloud storage system containing thousands of virtual file servers on hundreds of computers. We group these virtual servers into clusters. This system is perfectly scalable because the system load is divided among the clusters. Our simulation experiments show that our cloud storage system achieves smaller file read/write latency and traffic/processing overhead than the existing systems.

KEYWORDS
Cloud Storage, Data Center, Virtual, Cluster, File Server

1. INTRODUCTION
Cloud storage [1] is a model of networked online storage where data is stored in virtualized pools of storage which are generally hosted by third parties. Hosting companies operate large data centers, and people who require their data to be hosted buy or lease storage capacity from them. The data center operators virtualize the resources according to the requirements of the customer and expose them as storage pools, which the customers can themselves use to store files or data objects. Physically, the resource may span across multiple servers. Cloud storage services such as Amazon S3 [2], cloud storage products such as EMC Atmos [3], and distributed storage research projects such as OceanStore [4] are all examples of object storage. In this paper, we propose a novel cloud storage system containing thousands of virtual file servers on hundreds of computers. We group these virtual servers into clusters. This system is perfectly scalable because the system load is divided among the clusters. Our simulation experiments show that our cloud storage system achieves smaller file read/write latency and traffic/processing overhead than the existing systems. The rest of this paper is organized as follows. We review related works in Section 2. We propose our cloud system in Section 3. Section 4 contains our simulation results and Section 5 concludes the paper.

DOI:10.5121/ijfcst.2013.3102

International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.1, January 2013

2. RELATED WORKS
In this section, we review related researches that focus on designing cloud systems. In [5], authors design a scalable architecture called vStore that provides reliable virtual disks for virtual machines (VM) in a Cloud environment. vStore uses the hosts limited local disks as a block-level cache for the network attached storages. One of the challenges of cloud storage system is difficult to balance the providing huge elastic capacity of storage and investment of expensive cost for it. In order to solve this issue in the cloud storage infrastructure, low cost PC cluster based storage server is configured in [6] to be activated for large amount of data to provide cloud users. BlueSky [7] is a network file system backed by cloud storage. BlueSky stores data persistently in a cloud storage provider allowing users to take advantage of the reliability and large storage capacity of cloud providers and avoid the need for dedicated server hardware. Authors in [8] address the problem of building a secure cloud storage system which supports dynamic users and data provenance. Gecko [9] is a design for storage arrays where a single log structure is distributed across a chain of drives, physically separating the tail of the log from its body. This design provides the benefits of logging fast, sequential writes for any number of contending applications while eliminating the disruptive effect of log cleaning activity on application I/O. Authors in [10] present a power-lean storage system, where racks of servers, or even entire data center shipping containers, can be powered down to save energy. MetaStorage [11] is a federated Cloud storage system that can integrate diverse Cloud storage providers. MetaStorage is a highly available and scalable distributed hash table that replicates data on top of diverse storage services. Authors in [12] present an architecture for a secure data repository service designed on top of a public Cloud infrastructure to support multi-disciplinary scientific communities dealing with personal and human subject data, motivated by the smart power grid domain. ecStore [13] is an elastic cloud storage system that supports automated data partitioning and replication, load balancing, efficient range query, and transactional access. Cloudy [14] is a modular cloud storage system. Cloudy provides a highly flexible architecture for distributed data storage and is designed to operate with multiple workloads.

3. CLOUD STORAGE DESIGN


In this section, we propose a cloud storage system. We call it CCS (Cluster-based Cloud Storage).

3.1. General Description


Figure 1 shows the general view of the CCS system. VMs are grouped into clusters. Each cluster has a cluster controller that manages the cluster. There is one central controller in the system that manages cluster controllers. To reduce the load on the central controller, we try to assign as many tasks as possible to cluster controllers.
2

International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.1, January 2013

Figure 1. Cloud system design

In the remaining part of this section, we describe the systems behaviour under various conditions assuming the clustered design depicted in Figure 1.

3.2. Clustering Algorithm


We put the VMs installed on the same computer in a cluster. We use the following metric to cluster the VMs. Clustering metric: Being on the same physical computer. Clustering is done once we start the cloud system and once we add/remove a VM. The clustering algorithm contains the following steps: I. II. III. IV. V. VI. The system manager adds one physical computer as the central controller to the system. The system manager decides the number of clusters for the system. The system manager adds one physical cluster controller computer for each cluster. The system manager configures the central controller to identify the cluster controllers by IP address. The central controller assigns a unique cluster id to each cluster controller. While there is a non-clustered VM vm1 in the system, do
3

International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.1, January 2013

VII.

a. The central controller assigns vm1 to the cluster (called Cluster cid) with the least number of VMs that satisfies the following condition. i. There is no VM vm2 in Cluster cid such that vm1 and vm2 are on the same physical computer. The central controller informs the cluster id (cid) to vm1.

3.3. Periodic Operations


The central controller distributes tasks among the cluster controllers itself. The central controller keeps loads of the cluster controllers in a table. The central controller updates this table after sending each request to a cluster controller. Load of a cluster controller equals summation of the loads on the VMs in the cluster. The cluster controller distributes tasks among the cluster VMs itself. The cluster controller keeps loads of its VMs in a table. The cluster controller updates this table after sending each request to a VM. Load of a VM equals number of the requests on the VM. The central controller checks cluster controllers for failure periodically and before sending a request to them. The cluster controller checks its VMs for failure periodically and before sending a request to them.

3.4. File Write Request


When an Internet user requests to write a file, the cloud system follows these steps: I. II. III. IV. The central controller receives the request from the Internet. The central controller sends the request to the cluster controller with the least load. The cluster controller sends the request to the VM with the least load. The VM saves the file inside its disk.

3.5. File Read Request


When an Internet user requests to read a file, the cloud system follows these steps: I. II. III. IV. The central controller receives the request from the Internet. The central controller sends the request to the cluster controller that contains the file in its cluster. The cluster controller sends the request to the VM that contains the file. The VM sends the file to the user.

3.6. VM Failure
If a VM fails, then The cluster controller will not write any more file in the VM. The cluster controller informs the system manager to repair the VM.

4. SIMULATION
We implemented CCS in the CloudSim [15] simulator. In this section, we evaluate the performance and the overhead of CCS. To do this, we define the simulation scenario presented in Table 1. In this scenario, we change number of VMs in different executions whereas the other parameters are fixed to evaluate the scalability of CCS.
4

International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.1, January 2013

We compare the following systems: CCS SCS: The Standard Cloud System with a central controller without clustering

4.1. Simulation Results


In SCS, the central controller has to handle all the tasks of requests. Thus, it takes long time to process a read/write request. If we have a higher request rate, then their response latency goes dramatically up. This shows SCS is not scalable and causes large read/write delays if we increase the request rates.
Table 1. Simulation Parameters.

Parameter Number of VMs (nv) Number of computers File Write Rate File Read Rate File Size Number of Files (nf) Simulation Duration

Value From 100 to 12500 50 nv/100 requests per second nv/100 requests per second 100 Kbytes 100 * nv 1 hour

CCS

SCS

Average Read Latency (s)

0.2 0.15 0.1 0.05 0 100 500 2500 12500 Number of VMs

Figure 2. Average file read latency versus number of VMs


CCS SCS

Average Write Latency (s)

0.3 0.2 0.1 0 100 500 2500 12500 Number of VMs

Figure 3. Average file write latency versus number of VMs 5

International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.1, January 2013
CCS SCS

Load on Central Controller (Tasks per second)

1000 500 0 100 500 2500 12500 Number of VMs

Figure 4. Load on the central controller versus number of VMs

In CCS, the tasks of requests are distributed among cluster controllers. Thus, the tasks assigned to the central controller increases slowly. Using this technique, CCS is capable to increase its VMs and clusters to handle more requests. This shows CCS is scalable and keeps small read/write delays if we increase the request rates. Figures 2 and 3 illustrate how much delay user requests experienced in our experiment. These results show that CCS handles user requests averagely 182 percent faster than the standard cloud system. Figure 4 illustrates how much load the central controller experienced in our experiment. These results show that CCS assigns averagely 507 percent less tasks to the central controller than the standard cloud system.

5. CONCLUSIONS
In this paper, we propose a novel cloud storage system containing thousands of virtual file servers on hundreds of computers. We group these virtual servers into clusters. This system is perfectly scalable because the system load is divided among the clusters. Our simulation experiments show that our cloud storage system achieves smaller file read/write latency and traffic/processing overhead than the existing systems.

REFERENCES
[1] [2] [3] [4] Cloud Storage, http://en.wikipedia.org/wiki/Cloud_storage Amazon S3, http://en.wikipedia.org/wiki/Amazon_S3 EMC Atmos, http://en.wikipedia.org/wiki/EMC_Atmos Sean Rhea, Chris Wells, Patrick Eaton, Dennis Geels, Ben Zhao, Hakim Weatherspoon, and John Kubiatowicz, Maintenance-Free Global Data Storage, IEEE Internet Computing , Vol 5, No 5, September/October 2001, pp 4049. Byung Chul Tak, Chunqiang Tang, and Rong N. Chang, Designing a Storage Infrastructure for Scalable Cloud Services, Technical Report, The Pennsylvania State University, 2011. Tin Tin Yee, Thinn Thu Naing, PC-Cluster based Storage System Architecture for Cloud Storage, International Journal on Cloud Computing: Services and Architecture, Volume: 1 - volume NO: 3 Issue: November 2011. Michael Vrable, Stefan Savage, and Geoffrey M. Voelker, BlueSky: A Cloud-Backed File System for the Enterprise, Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST), San Jose, CA, February 2012. Sherman S. M. Chow, Cheng-Kang Chu, Xinyi Huang, Jianying Zhou, Robert H. Deng, Dynamic Secure Cloud Storage with Provenance, Cryptography and Security, pp. 442-464, 2012. 6

[5] [6]

[7]

[8]

International Journal in Foundations of Computer Science & Technology (IJFCST), Vol. 3, No.1, January 2013

[9]

[10] [11] [12]

[13] [14] [15]

Ji-Yong Shin , Mahesh Balakrishnan , Lakshmi Ganesh , Tudor Marian , Hakim Weatherspoon, Gecko: A Contention-Oblivious Design for Cloud Storage, In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage), Boston, MA, U.S.A., Jun 2012. Lakshmi Ganesh, Hakim Weatherspoon, Ken Birman, Beyond Power Proportionality: Designing Power-Lean Cloud Storage, NCA 2011, pp.147-154, 2011. David Bermbach, Markus Klems, Stefan Tai, Michael Menzel, MetaStorage: A Federated Cloud Storage System to Manage Consistency-Latency Tradeoffs, IEEE CLOUD, pp.452-459, 2011. A. Kumbhare , Y. Simmhan , V. Prasanna, Designing a Secure Storage Repository for Sharing Scientific Datasets using Public Clouds, Proceedings of the second international workshop on Data intensive computing in the clouds, Pages 31-40, 2011. H. T. Vo, C. Chen, B. C. Ooi, Towards Elastic Transactional Cloud Storage with Range Query Support, Int'l Conference on Very Large Data Bases (VLDB), 2010. Donald Kossmann, Tim Kraska, Simon Loesing, Stephan Merkli, Raman Mittal, Flavio Pfaffhauser, Cloudy: A Modular Cloud Storage System, PVLDB 3(2), pp.1533-1536, 2010. Rodrigo N. Calheiros, Rajiv Ranjan, Anton Beloglazov, Csar A. F. De Rose, Rajkumar Buyya, CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms, SoftwarePractice & Experience, Volume 41 Issue 1, Pages 2350, January 2011.

AUTHORS
Parinaz Eskandarian was born in 1987 in Iran. Ms. Eskandarian received her B.Engr. degree from University of Tabriz Jahad Daneshgahi , and her M.S. degree from Islamic Azad University, Zanjan branch (Zanjan, Iran) in computer engineering in 2012.

Jaber Karimpour was born in 1975 in Iran. Dr. Karimpour received his B.Engr. degree and his M.S. degree from University of Tabriz in computer science. He also received his Phd degree from University of Tabriz (Tabriz, Iran) in computer science in 2009.

You might also like