Professional Documents
Culture Documents
By Frank Ohlhor st
Copyright 2013 by John Wiley & Sons, Inc.
CHAPTER
As an alytics an d research were applied to large data sets, scien tists cam e
to th e con clu sion th at m ore is better in th is case, m ore data, m ore
an alysis, an d m ore resu lts. Research ers started to in corporate related
data sets, u n stru ctu red data, arch ival data, an d real-tim e data in to th e
process, wh ich in tu rn gave birth to wh at we n ow call Big Data.
In th e bu sin ess world, Big Data is all abou t opportu n ity. Accordin g
to IBM, every day we create 2.5 qu in tillion (2.5 10 18 ) bytes of data,
so m u ch th at 90 percen t of th e data in th e world today h as been
created in th e last two years. Th ese data com e from everywh ere:
sen sors u sed to gath er clim ate in form ation , posts to social m edia sites,
digital pictu res an d videos posted on lin e, tran saction records of on lin e
pu rch ases, an d cell ph on e GPS sign als, to n am e ju st a few. Th at is th e
catalyst for Big Data, alon g with th e m ore im portan t fact th at all of
th ese data h ave in trin sic valu e th at can be extrapolated u sin g an alytics,
algorith m s, an d oth er tech n iqu es.
Big Data h as already proved its im portan ce an d valu e in several
areas. Organ ization s su ch as th e Nation al Ocean ic an d Atm osph eric
Adm in istration (NOAA), th e Nation al Aeron au tics an d Space Adm in -
istration (NASA), several ph arm aceu tical com pan ies, an d n u m erou s
en ergy com pan ies h ave am assed h u ge am ou n ts of data an d n ow
leverage Big Data tech n ologies on a daily basis to extract valu e
from th em .
NOAA u ses Big Data approach es to aid in clim ate, ecosystem ,
weath er, an d com m ercial research , wh ile NASA u ses Big Data for
aeron au tical an d oth er research . Ph arm aceu tical com pan ies an d
en ergy com pan ies h ave leveraged Big Data for m ore tan gible resu lts,
su ch as dru g testin g an d geoph ysical an alysis. Th e New York Times h as
u sed Big Data tools for text an alysis an d Web m in in g, wh ile th e Walt
Disn ey Com pan y u ses th em to correlate an d u n derstan d cu stom er
beh avior in all of its stores, th em e parks, an d Web properties.
Big Data plays an oth er role in today s bu sin esses: Large organ i-
zation s in creasin gly face th e n eed to m ain tain m assive am ou n ts of
stru ctu red an d u n stru ctu red data from tran saction in form ation in
data wareh ou ses to em ployee tweets, from su pplier records to regu -
latory lin gs to com ply with govern m en t regu lation s. Th at n eed h as
been driven even m ore by recen t cou rt cases th at h ave en cou raged
com pan ies to keep large qu an tities of docu m en ts, e-m ail m essages,
an d oth er electron ic com m u n ication s, su ch as in stan t m essagin g an d
In tern et provider teleph on y, th at m ay be requ ired for e-discovery if
th ey face litigation .
Th is con sists of a
broad category of application s an d tech n ologies for gath erin g,
storin g, an alyzin g, an d providin g access to data. BI delivers
action able in form ation , wh ich h elps en terprise u sers m ake
better bu sin ess decision s u sin g fact-based su pport system s. BI
works by u sin g an in -depth an alysis of detailed bu sin ess data,
provided by databases, application data, an d oth er tan gible
data sou rces. In som e circles, BI can provide h istorical, cu rren t,
an d predictive views of bu sin ess operation s.
Th is is a process in wh ich data are an alyzed from
differen t perspectives an d th en tu rn ed in to su m m ary data th at
are deem ed u sefu l. Data m in in g is n orm ally u sed with data at
rest or with arch ival data. Data m in in g tech n iqu es focu s on
m odelin g an d kn owledge discovery for predictive, rath er th an
pu rely descriptive, pu rposes an ideal process for u n coverin g
n ew pattern s from large data sets.
Th ese look at data u sin g algorith m s
based on statistical prin ciples an d n orm ally con cen trate on data
sets related to polls, cen su s, an d oth er static data sets. Statistical
application s ideally deliver sam ple observation s th at can be u sed
to stu dy popu lated data sets for th e pu rpose of estim atin g,
testin g, an d predictive an alysis. Em pirical data, su ch as su rveys
an d experim en tal reportin g, are th e prim ary sou rces for an a-
lyzable in form ation .
Th is is a su bset of statistical application s in
wh ich data sets are exam in ed to com e u p with prediction s,
based on tren ds an d in form ation glean ed from databases. Pre-
dictive an alysis ten ds to be big in th e n an cial an d scien ti c
Neverth eless, ven turin g into the world of Hadoop is n ot a plu g-an d-play
experien ce; there are certain prerequisites, h ardware requirements, and
con guration chores that m ust be m et to ensu re su ccess. The rst step