You are on page 1of 11

Big Data Analysis

Big Data Overview (1/2)


Big data is a general term used to describe the voluminous amount of unstructured and semi -structured data a company creates. Its the data that would take too much time and cost too much money to load into a relational database for analysis. Big data refers to datasets whose sizes are beyond the ability of typical database software tools to capture, store, manage and analyze. A primary goal for looking at big data is to discover repeatable business patterns. It has many additional uses, including real-time fraud detection, web display advertising and competitive analysis, call center optimization, social media and sentiment analysis, intelligent traffic management, and smart power grids. Big data analytics is often associated with cloud computing because the analysis of large data sets in real-time requires a framework like MapReduce to distribute the work among tens, hundreds or even thousands of computers. As technology advances over time, the size of datasets that qualify as big data will also increase and big data is expected to play a significant economic role to benefit not only private commerce but also national economies and their citizens. Big data involves more than simply the ability to handle large volumes of data. Instead, it represents a wide range of new analytical technologies and business possibilities.

Big Data Can Generate Significant Financial Value Across Sectors

Source: http://searchcloudcomputing.techtarget.com/definition/big-data-Big-Data, McKinsey Big Data Report, BI Research Using Big Data for Smarter Decision Making
2

Big Data Overview (2/2)


Data volume is the primary attribute of big data Big data can also be quantified by counting records, transactions, tables or files. Some organizations find it more useful to quantify big data in terms of time. For example, due to the seven-year statute of limitations in the U.S., many firms prefer to keep seven years of data available for risk, compliance and legal analysis. The scope of big data affects its quantification, too. For example, in many organizations, the data collected for general data warehousing differs from data collected specifically for analytics. Three Vs of Big Data

Data variety comes from a greater variety of sources Big data comes from a variety of sources, including logs, click streams, social media, radio-frequency identification (RFID) data from supply chain applications, text data from call center applications, semistructured data from various business-to-business processes, and geospatial data in logistics. The recent tapping of these sources for analytics means that so-called structured data is now joined by unstructured data (text and human language) and semi-structured data (XML, RSS feeds).

Data feed velocity as a defining attribute of big data

The collection of big data in real time isnt new; many firms have been collecting click stream data from the web for years, using streaming data to make purchase recommendations to web visitors. Even more challenging, the analytics that go with streaming data have to make sense of the data and possibly take actionall in real time.

The three Vs of big data (volume, variety and velocity) constitute a comprehensive definition. Each of the three Vs has its own ramifications for analytics.

Source: TWDI Research report on Big Data Analytics


3

Big Data Future


International Data Corporation (IDC) released a worldwide big data technology and services forecast report based on a survey in March 2012. As per the survey: The big data market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015. This represents a compound annual growth rate (CAGR) of 40% or about seven times that of the overall information and communications technology (ICT) market. The big data market is expanding rapidly and for technology buyers, opportunities exist to use big data technology to improve operational efficiency and to drive innovation. There are also big data opportunities for both large IT vendors and start ups. Major IT vendors are offering both database solutions and configurations supporting big data by evolving their own products as well as by acquisition. At the same time, more than half a billion dollars in venture capital has been invested in new big data technology. While the five-year CAGR for the worldwide market is expected to be nearly 40%, the growth of individual segments varies from 27.3% for servers and 34.2% for software to 61.4% for storage. The growth in appliances, cloud, and outsourcing deals for big data technology will likely mean that over time, end users will pay increasingly less attention to technology capabilities and will focus instead on the business value arguments. System performance, availability, security and manageability will all matter greatly; however, how they are achieved will be less of a point for differentiation among vendors. There is a shortage of trained big data technology experts, in addition to a shortage of analytics experts. This labor supply constraint will act as an inhibitor of adoption and use of big data technologies, and it will also encourage vendors to deliver big data technologies as cloud-based solutions. While software and services make up the bulk of the market opportunity, through 2015, infrastructure technology for big data deployments is expected to grow slightly faster at 44% CAGR. Storage, in particular, shows the strongest growth opportunity, growing at 61.4% CAGR through 2015.

IDC defines big data technologies as a new generation of technologies and architectures designed to extract value economically from very large volumes of a wide variety of data by enabling high-velocity capture, discovery and/or analysis.
Source: http://www.idc.com/getdoc.jsp?containerId=prUS23355112
4

Big Data Risks/Challenges


Big data is complex because of the variety of data it encompasses from structured data, such as transactions one makes or measurements one calculates and stores, to unstructured data such as text conversations, multimedia presentations and video streams. Big data presents a number of challenges relating to its complexity: One challenge is how one can understand and use big data when it comes in an unstructured format, such as text or video. Another challenge is how one can capture the most important data as it happens and deliver that to the right people in realtime. A third challenge is how one can store the data and analyze and understand it given its size and the computational capacity.

Big data also poses security and privacy risks for a large amount of data stored in data warehouses, centralized in a single repository. Big data and extreme workloads require optimized hardware and software. The main challenges of big data and extreme workloads are data variety and volume, and analytical workload complexity and agility. Many organizations are struggling to deal with increasing data volumes, and big data simply makes the problem worse. To solve this problem, organizations need to reduce the amount of data being stored and exploit new storage technologies that improve performance and storage utilization. Big datas increasing economic importance also raises a number of legal issues, especially when coupled with the fact that da ta is fundamentally different from many other assets. For example, one piece of data can be copied perfectly and easily combined with other data. The same piece of data can be used simultaneously by more than one person. Sectors with a relative lack of competitive intensity and performance transparency, along with industries where profit pools are highly concentrated, are likely to be slow to fully leverage the benefits of big data.

Source: BI Research Using Big Data for Smarter Decision Making, http://spotfireblog.tibco.com/?p=6793, https://www.privacyassociation.org/publications/2012_03_23_big_data_it_risks_and_privacy_meet_in_the_boardroom
5

Big Data Importance


Creating transparency Making big data more easily accessible to relevant stakeholders in a timely manner can create tremendous value. In the public sector, making relevant data more readily accessible across otherwise separated departments can sharply reduce search and processing time.

Enabling experimentation to discover needs

As more transactional data is created and stored in digital form, organizations can collect more accurate and detailed performance data on everything from product inventories to personnel sick days. Using data to analyze variability in performance is generated by controlled experiments.

Segmenting populations to customize actions

Big data allows organizations to create highly specific segmentations and to tailor products and services precisely to meet those needs. This approach is well-known in marketing and risk management but can be revolutionary elsewhere.

Replacing human decision making with automated algorithms

Sophisticated analytics can substantially improve decision making, minimize risks and unearth valuable insights that would otherwise remain hidden. Such analytics have applications for organizations from tax agencies that can use automated risk engines to flag candidates for further examination.

Innovating new business models, products and services

Big data enables companies to create new products and services, enhance existing ones, and invent entirely new business models. Manufacturers are using data obtained from the use of actual products to improve the development of the next generation of products and to create innovative after-sales service offerings.

Source: McKinsey Big Data Report


6

Big Data Vendors


2012 Big Data Pure-Play Vendors, Yearly Big Data Revenue (in $US Million)

In the current market, big data pure-play vendors account for $300 million in big data-related revenue. Despite their relatively small percentage of current overall revenue (approximately 5%), big data pure-play vendors (such as Vertica, Splunk and Cloudera) are responsible for the vast majority of new innovations and modern approaches to data management and analytics that have emerged over the last several years and made big data the hottest sector in IT.

Source: http://www.forbes.com/sites/siliconangle/2012/02/17/big-data-is-big-market-big-business/
7

Big Data Trends

The McKinsey Global Institute estimated that enterprises globally stored more than seven exabytes of new data on disk drives in 2010, while consumers stored more than six exabytes of new data on devices such as PCs and notebooks. Big data has now reached every sector in the global economy. In total, European organizations have about 70% of the storage capacity of the entire United States at almost 11 exabytes. The possibilities of big data continue to evolve rapidly, driven by innovation in the underlying technologies, platforms and analytic capabilities for handling data, as well as the evolution of behavior among its users as more and more individuals live digital lives. The use of big data is becoming a key way for leading companies to outperform their peers. McKinsey estimated that a retailer embracing big data has the potential to increase its operating margin by more than 60%. The increasing use of multimedia in sectors, including health care and consumer-facing industries, has contributed significantly to the growth of big data and will continue to do so.

The surge in the use of social media is producing its own stream of new data. While social networks dominate the communications portfolios of younger users, older users are adopting them at an even more rapid pace.

Source: McKinsey Big Data Report


8

Big Data Examples


Big data includes web logs, RFID, sensor networks, social networks, social data, Internet text and documents, Internet search indexing, call detail records, complex and/or interdisciplinary scientific research, military surveillance, medical records, photography archives, video archives, and large-scale e-commerce. Examples of Companies Using Big Data: IBM has formed a partnership with the Netherlands Institute for Radio Astronomy (ASTRON) for the DOME Project, which provided support in developing the tools needed to crunch the data for the ambitious international Square Kilometer Array (SKA) radio telescope. San Francisco-based SeeChange Company offered a better way of designing health insurance plans with what it calls value based benefits. The company used a substantial amount of data gleaned from personal health records, claims databases, lab feeds and pharmacy data to identify patients with chronic illnesses who would benefit from a customized compliance program. Boston-based company Humedica combined its data analytics with a real-time clinical surveillance and decision support system. The company also sells its detailed clinical spending data to life sciences companies, with the idea that customers will use it to quantify patient populations, market share and market opportunities. Castlight Health aimed to push transparency in healthcare pricing by offering consumers a search engine to find prices of healthcare services. Castlights technology allowed consumers to run side-by-side comparisons of out-of-pocket medical expenses. Armed with prices, consumers will then shop for bargains, limiting the growth of healthcare costs. Cleveland-based Explorys has started a Google-like service that helps clinicians analyze real-time information culled from troves of electronic medical records (EMRs), financial records and other data. The idea is that medical researchers can mine the vast amounts of data to learn how variations in treatment can affect outcomes, uncovering best practices to enhance patient care and lower costs. Apixios technology brings together data from structured sources like EMRs with unstructured data, such as a physicians patient encounter notes. The companys software uses natural language processing technology to interpret clinicians free -text searches and return the most relevant results.

Source: http://www.forbes.com/sites/alexknapp/2012/04/09/ibm-is-using-big-data-to-crunch-the-big-bang/, http://www.medcitynews.com/2011/11/5-companies-using-big-data-to-solvehealthcare-problems/


9

Role of Internal Audit in Managing Big Data Case Study


To manage data holdings effectively, an organization must first be aware of the location, condition and value of its research assets. Conducting a data audit provides this information, raising awareness of collection strengths and identifying weaknesses in data policies and management procedures. The benefits of conducting an audit for managing big data effectively are: Check the extent of data assets and deep dive into what all is available. Data that is redundant or unimportant may be identified and reduced. Monitor holdings and avoid big data leaks. Data hacking, social engineering and data leaks are all concepts that plague a company an audit can help a company identify areas where there is a possibility of leakage. Manage risks associated with big data loss and irretrievability. Data which is not structured and is lying untouched may never be retrieved; an audit can help identify such cases. Develop a big data strategy and implement robust big data policies. Big data requires robust management and proper structurization. Improve workflows and benefit from efficiency savings. Check where there are complex and time-consuming workflows and where there is a scope of improving efficiencies. Realize the value of big data through improved access and reuse to check if there are areas that have not been used in a while.

Source: http://www.data-audit.eu/docs/DAF_briefing_paper.pdf
10

Managing Big Data Through Internal Audit


Following are issues of big data that internal audit can help mitigate: Most companies collect large volumes of data but they dont have comprehensive approaches for centralizing the information. Internal audit can help companies manage big data by streamlining and collating data effectively.

Complex Big Data

Big Data Security

Maintaining effective data security is increasingly recognized as a critical risk area for organizations. Loss of control over data security can have severe ramifications for an organization, including regulatory penalties, loss of reputation, and damage to business operations and profitability. Auditing can help organizations secure and control data collected.

Big Data Accessibility

Giving access to big data to the right person at the right time is another challenge organizations face. Segregation of duties (SoD) is an important aspect that can be checked by an IA.

Big Data Quality

The more data one accumulates, the harder it is to keep everything consistent and correct. Internal audit can check the quality of big data.

Big Data Understanding

Understanding and interpretation of big data remains one of the primary concerns for many organizations. Auditors can effectively simplify an organizations data effectively.

Source: http://www.acl.com/pdfs/wp_AA_Best_Practices.pdf, http://smartdatacollective.com/brett-stupakevich/48184/4-biggest-problems-big-data


11

You might also like