You are on page 1of 6

Cloud Computing: Challenges and Responses

Nguyen The Huy

Abstract
Cloud Computing (CC) is on the rise. CC differs in a number of ways from the traditional computing models. CC presents numerous challenges to digital forensics community whose researches and practices largely fall in the realm of traditional computing. We will name a few of them in this paper together with some approaches the community has taken to overcome those challenges. On the other hand, CC is also full of economic and computational advantages. We will visit several approaches that employ CC to improve the quality of digital forensics work.

1. Introduction
Cloud Computing is defined by NIST as model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. In this definitions, CC has 5 essential characteristics (On-demand self-service, Broad network access, Resource pooling, Rapid elasticity, Measured service), 3 service models (Software as a service, Platform as a service, Infrastructure as a service) and 4 deployment models (Private cloud, Community cloud, Public cloud, Hybrid cloud). [8] On the other hand, according to Armbrust et al., CC is either Software as a Service (SaaS) or Utility Computing, but excludes Private Cloud deployment model [2]. Although their definition is narrower than that of NISTs, their claims and arguments at the least are valid for a Public cloud deployment model, which is currently the most popular. Examples of public CC nowadays are Amazon EC2, Google AppEngine and Microsoft Azure. Although not required by CCs definition, virtualization is considered essential to achieve elasticity and the illusion of infinite capacity [2]. Armbrust, et al. also noted that the construction and operation of extremely large-scale data center is necessary for economic and efficient use of CC. High automation is expected in such large-scale cloud infrastructure. While private or smaller scale CCs are possible [5], the scale of many popular CC deployments nowadays is (extremely) large. As a result of that large scale infrastructure, access to the internal construction and operation of the cloud are (severely) limited to outsider (e.g. cloud user, digital forensics investigator). Private cloud, however, may offer more relaxed access to its internal. Another commonly found characteristic in CC is distributed environment. Broad network access and rapid elasticity characteristics of CC help the distributed environment scale even easier.

2. Challenges of Cloud Computing as Digital Forensic Target


Many challenges in traditional computing models are still present in CC. For example, encryption and vast storage size still pose great difficulty in performing digital forensics. Traditional digital forensic tools and practices, while already showing insufficient capability in handling those challenges, probably cannot readily adapt to handle new challenges from CC. A prominent challenge from CC is the difficulty in collecting digital evidence, especially from infrastructure sources. The ability to perform data preservation and isolation are greatly hindered by (extremely) limited access to the source of digital evidence (e.g. infrastructure, logs). Garfinkel argues that those fundamental forensic activities cannot be performed in a CC environment [6]. Birk even questioned the availability and validity of digital evidence given the abstraction of internal CC infrastructure and the lack of standard within the cloud [4]. It is also obvious that locating digital evidence in CC is very difficult given the highly dynamic, virtualized and distributed natures of CC. For instance, a cloud similar to the one in [5] can load a virtual instance from its library onto any node in its pool. Or as Marty noticed, on Amazon AWS, load balancers IP address constantly changes [7]. In addition, legal challenge also places a big challenge for forensics work. Besides existing legal complication, the fact that data belonging to multiple cloud users may reside in a same cloud entity introduces new legal obstacles [3].

3. Taking Advantage of Cloud Computing Power


As mentioned earlier, distributed environment is an often encountered characteristic of CC. Moreover, CC can offer scalable distributed environment with ease. Therefore mentioning early works leveraging CC for digital forensics benefit must include the feasibility assessment of developing digital forensics tools in scalable, distributed environment by Roussev and Richard [9]. Their work was motivated by the fact that current forensics tools were insufficient in handling the challenge of todays massive storage size and huge network bandwidth. Going further, they argued that when digital forensics tools getting more sophisticated, as they should be, current single-machine CPU power would not be enough. Then, they proposed a specialized distributed framework upon which digital forensics tools can build upon to tackle both IO and CPU constraints. Their proposed framework built on a coordinator-workers architecture communicating using simple text-based predefined system commands (initialization, termination, cache management and reply messages) and processing command (e.g. hash, grep, crack). Early result from their prototype showed significant reduction in processing time. That their software remained interactive during the run was a bonus. The application of distributed computing shows great potential for improving the performance of digital forensic tools or developing more sophisticated ones. However, the evaluated prototype implemented only 1 digital forensics operation: regular expression matching (Grep). As a result, the usefulness of the proposed distributed framework was less convincing. In addition, Roussev et al.s decision to develop a brand new distributed framework from scratch instead of making use of an existing generic one may not

be right. For instance, with only 1 forensic task ever implemented in that new framework, it is possible that the framework is not suitable for, or does not scale well in the development of other forensics software. Later in 2009, Roussev and other researchers presented a cloud-based implementation of several elementary digital forensics software. Their implementation was reported achieving linear and sublinear speedup compared to traditional implementation. Their work (called MMR) was an implementation of Googles MapReduce framework using Message Passing Interface (MPI). In particular, MMR consisted of 3 abstraction layers: MPI providing distributed communication; middleware platform providing synchronization and MapReduce abstraction; and finally software code containing application logic. A comprehensive evaluation of using MMR in developing elementary forensics software or alike (such as wordcount, grep, bloom filter or pi estimator representing CPU-bound image processing algorithms) confirmed the feasibility of achieving scalable and robust performance runs with MMR. MMR was also reported to have better performance over Hadoop, the Java implementation of MapReduce. [9] This work by Roussev et al. was another attempt to apply distributed computing to solve the performance issues of digital forensics tools. Compared to previous work, this works implementation and evaluation were more robust and comprehensive. Much of the robustness, we believe, was the result of using MPI and Googles MapReduce instead of developing a new specialized distributed framework. The evaluation also included a comparison between the performance of MMR and Hadoop. Because of the limitations in Hadoops Java implementation [9], the result, as mentioned earlier, was expected. However, it would be very interesting if there were a comparison between the above 2 in term of ease of design and implementation. As forensics software become more sophisticated, ease of development will become more important. Java is well known for its object-oriented-ness and automatic memory management. The implementation language of MMR was unfortunately not mentioned. Taking advantage of CC in a different way, Buchanan et al. used CC to methodologically evaluate the quality of digital forensics tools [5]. Inspire them was the fact that credibility of digital forensics finding is impacted by the lack of standardization in procedures, toolkits and data sets. Built from their success of using virtualization in teaching computer security and digital forensics, Buchanan et al. presented an infrastructure based on virtualization within CC deployment. In brief, they created a set of evaluation criteria and a CC-based testing system. This system was capable of script-automating different modes of testing and of creating and preparing portable and reproducible test environment. Each test environment was a virtual instance stored in a shared library. Result from evaluation of their systems showed reliable and robust and scalable execution. In addition, the systems demonstrated better energy consumption and CPU utilization compared to traditional stand-alone test system. Buchanan et al.s work showed great promise in improving the quality and credibility of digital forensics researches and practices. Their work can also promote and facilitate collaboration among members of the digital forensics community because it would be easier to create, collect and transfer testing and

training data sets. However, copyright issues may still constraint the creating and sharing certain types of digital forensics data. Binary scrambling techniques may be used in some cases, but it was reported to create new issue for forensics techniques which rely on known binary signature [5]. In addition, the current implementation based on VMware technologies cannot accommodate nondesktop (e.g. mobile phones, handheld devices) digital forensics data and tools. For instance, two most popular mobile platforms, Apples iOS and Android, have not been supported as guest OS by VMware [11]. Given the current trend of mobile computing, we expect demand for digital forensics researches and practices on those mobile platforms to increase rapidly. Therefore, it is highly desirable that future version of Buchanan et al.s test system can support such guest OSs.

4. Different Approach to Forensics in Cloud Computing


Marty from Loggly, Inc. highlighted the importance of log analysis in a digital forensics investigation. In his paper, he presented a list of challenges encountered when analyzing logs from CC deployment for forensics purpose. Those challenges are often results of failures in properly managing log files and properly generating log records. In order to overcome those difficulties, Marty proposed a proactive approach to generating and managing logs. His suggestion is a comprehensive set of guidelines on log management (e.g. enabling, storing, transporting and tuning logs) and log creation (e.g. when, what, how to log). A sample set up in a SaaS CC was presented together with practical tips and actual difficulties encountered during the set up. The goal was to collect logs from Django web application, JavaScript, Apache web server, load balancer, MySQL, operating system and back-end. [7] Even though Martys guidelines comprehensively cover logging issues from infrastructure level (e.g. OS, transport) to application level (e.g. when, what, how to log), Marty also noted that application developers total support is necessary for the success of the framework. In addition, it is not easy to implement those guidelines; some may not even be possible. (Extremely) limited access to the internal structure and operation of the cloud leaves little room for any outsider to improve the log management process to support digital forensics. Usually this is the case for logs from infrastructure sources. For instance, in his sample set up, Marty was not able to obtain MySQL logs. Even if this is possible, the large scale and dynamic nature of CC can hamper the log management process. Martys guidelines on infrastructure log were also rather short and abstract compared to his guidelines on how to generate application log. Yet even generating log on application level may also fail to follow all guidelines. While the guidelines on what, when, where, who, why and how to log are useful and extensive, it is perhaps rather overwhelming to any developer who is not familiar with such practice. More importantly, it is often the case that logging is not mentioned, or only briefly mentioned in any business user requirements. Thus to implement all guidelines is to over-invest time and effort while obtaining no clear economics gain (measured against delivering all required functionality on time and within budget). In another work on log analysis, Accorsi et al. took the practice to a higher level. While current digital forensics toolkits only performed analysis on logs from infrastructure sources, they invented a forensics

technique called RECIF for the analysis of logs from the application level. Probably inspired by the success of researches in the mature business informatics, their technique offered a pioneering approach to digital forensics investigation. In brief: a data flow is a transition between 2 events whose output of one is used as input of the other. A policy is a set of constraint-exception relations expressed in a special simple language. From data collected in application logs, a propagation graph of data flows is reconstructed and the resulted graph is matched with a set of predefined business process policies to detect information leak [1]. An interesting fact was the use of MXML, the standardized log format for business process. It was mentioned that tools for transforming logs from major business process systems such as SAP, Oracle and Sage to MXML were available. An example run was reported with the technique correctly detected information leak from a set of generated data flows and a separation-ofduty policy. Further works were ongoing to demonstrate the correctness of the technique in more complex scenarios. Overall, the technique looks very promising. It works on a standardized log formats which can be transformed from multiple other common formats. It can readily support a many common business policies such as Separation of Duties, Conflict of interests. It is also complimentary to other digital forensics techniques. And it doesnt require any special tools or skills (besides the ability to write business process policy in a pre-defined syntax, which looks simple). This technique is probably most useful in information leak scenarios where the so-called crime does not involve much technical details. Those scenarios are often the result of flaws in security policy specifications or software designs. However, the expressiveness of the language can still be enhanced. Obviously, the language cannot express policy of type deny all, allow a few because it was intrinsically allow all, deny a few.

5. Conclusion and Future Direction


In this paper, we have reviewed a few challenges posed by CC to the digital forensics community. We also discuss a few approaches focusing on infrastructure and applications log analysis to tackle those challenges. On the other hand, CC also delivers many computational advantages which can be utilized by the community to improve the performance and quality of digital forensics tools. Digital forensics training in the virtualization environment of the cloud was also explored. From the above finding, we think the future of digital forensics given the rise of CC lies in taking its economic and computational advantages and exploring non-traditional approaches in collecting and analyzing digital evidence. Among the possible tactics, borrowing expertise from other related research fields may bring in new ideas and firmer approaches. A good example was the application of business informatics in the detection of information leaks [1].

6. References
[1]. Accorsi, R., Wonnemann, C., & Stocker, T. (2011). Towards forensics data flow analysis of business process logs. IT Security Incident Management & IT Forensics.

[2]. Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., et al. (2009). Above the Clouds: A Berkeley View of Cloud Computing. University of California at Berkeley. [3]. Beebe, N. (2009). Digital Forensic Research: The good, the Bad and the Unaddressed. Advances in Digital Forensics V, IFIP AICT 306 , pp. 17-36. [4]. Birk, D. (n.d.). Technical Challenges of Forensic Investigations in Cloud Computing Environments. Retrieved April 8, 2011, from http://www.zurich.ibm.com/~cca/csc2011/submissions/birk.pdf [5]. Buchanan, W. J., Macfarlane, R. J., Flandrin, F., Graves, J., Buchanan, B., Fan, L., et al. (2011). Cloudbased Digital Forensics Evaluation Test (D-FET) Platform. Cyberforensics. [6]. Garfinkel, S. L. (2010). Digital forensics research: the next 10 years. Digital Forensics Research Workshop. [7]. Marty, R. (2011). Cloud Application logging for forensics. SAC. [8]. National Institue of Standard and Technology. (n.d.). The NIST Definition of Cloud Computing. Retrieved April 28, 2011, from http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc [9]. Roussev, V., & Richard, G. G. (2004). Breaking the performance wall: the case for distributed digital forensics. Proceedings of the fourth digital forensic research workshop. [10]. Roussev, V., Wang, L., Richard, G., & Marziale, L. (2009). A CLOUD COMPUTING PLATFORM FOR LARGE-SCALE FORENSIC COMPUTING. Advances in Digital Forensics V, IFIP AICT 306, , 201-214. [11]. VMware Supports the Largest Number of Guest Operating Systems. (n.d.). Retrieved 04 12, 2011, from http://www.vmware.com/technical-resources/advantages/guest-os.html

You might also like