You are on page 1of 15

Big Data Analytics: Turning Big Data into Big Money

By Frank Ohlhor st
Copyright 2013 by John Wiley & Sons, Inc.

CHAPTER

The Evolution
of Big Data

o tru ly u n derstan d th e im plication s of Big Data an alytics, on e h as


to reach back in to th e an n als of com pu tin g h istory, speci cally
bu sin ess in telligen ce (BI) an d scien ti c com pu tin g. Th e ideology
beh in d Big Data can m ost likely be tracked back to th e days before
th e age of com pu ters, wh en u n stru ctu red data were th e n orm (paper
records) an d an alytics was in its in fan cy. Perh aps th e rst Big
Data ch allen ge cam e in th e form of th e 1880 U.S. cen su s, wh en th e
in form ation con cern in g approxim ately 50 m illion people h ad to be
gath ered, classi ed, an d reported on .
With th e 1880 cen su s, ju st cou n tin g people was n ot en ou gh
in form ation for th e U.S. govern m en t to work with particu lar ele-
m en ts, su ch as age, sex, occu pation , edu cation level, an d even th e
n u m ber of in san e people in h ou seh old, h ad to be accou n ted for. Th at
in form ation h ad in trin sic valu e to th e process, bu t on ly if it cou ld be
tallied, tabu lated, an alyzed, an d presen ted. New m eth ods of relatin g
th e data to oth er data collected cam e in to bein g, su ch as associatin g
occu pation s with geograph ic areas, birth rates with edu cation levels,
an d cou n tries of origin with skill sets.
Th e 1880 cen su s tru ly yielded a m ou n tain of data to deal with , yet
on ly severely lim ited tech n ology was available to do an y of th e an alytics.
Th e problem of Big Data cou ld n ot be solved for th e 1880 cen su s, so it
took over seven years to m an u ally tabu late an d report on th e data.

77

c08 22 October 2012; 17:59:52


78 BI G DATA ANAL YTI CS

With th e 1890 cen su s, th in gs began to ch an ge, th an ks to th e


in trodu ction of th e rst Big Data platform : a m ech an ical device called
th e Hollerith Tabu latin g System , wh ich worked with pu n ch cards th at
cou ld h old abou t 80 variables. Th e Hollerith Tabu latin g System revo-
lu tion ized th e valu e of cen su s data, m akin g it action able an d in creasin g
its valu e an u n told am ou n t. An alysis n ow took six weeks in stead of
seven years. Th at allowed th e govern m en t to act on in form ation in a
reason able am ou n t of tim e.
Th e cen su s exam ple poin ts ou t a com m on th em e with data an a-
lytics: Valu e can be derived on ly by an alyzin g data in a tim e fram e in
wh ich action can still be taken to u tilize th e in form ation u n covered.
For th e U.S. govern m en t, th e ability to an alyze th e 1890 cen su s led to
an im proved u n derstan din g of th e popu lace, wh ich th e govern m en t
cou ld u se to sh ape econ om ic an d social policies ran gin g from taxation
to edu cation to m ilitary con scription .
In today s world, th e in form ation con tain ed in th e 1890 cen su s
wou ld n o lon ger be con sidered Big Data, accordin g to th e de n ition :
data sets so large th at com m on tech n ology can n ot accom m odate
an d process th em . Today s desktop com pu ters certain ly h ave en ou gh
h orsepower to process th e in form ation con tain ed in th e 1890 cen su s
by u sin g a sim ple relation al database an d som e basic code.
Th at realization tran sform s wh at Big Data is all abou t. Big Data
in volves h avin g m ore data th an you can h an dle with th e com pu tin g
power you already h ave, an d you can n ot easily scale you r cu rren t
com pu tin g en viron m en t to address th e data. Th e de n ition of Big Data
th erefore con tin u es to evolve with tim e an d advan ces in tech n ology.
Big Data will always rem ain a paradigm sh ift in th e m akin g.
Th at said, th e m om en tu m beh in d Big Data con tin u es to be driven
by th e realization th at large u n stru ctu red data sou rces, su ch as th ose
from th e 1890 cen su s, can deliver alm ost im m easu rable valu e. Th e
n ext gian t leap for Big Data an alytics cam e with th e Man h attan
Project, th e U.S. developm en t of th e atom ic bom b du rin g World War
II. Th e Man h attan Project n ot on ly in trodu ced th e con cept of Big Data
an alysis with com pu ters, it was also th e catalyst for Big Scien ce,
wh ich in tu rn depen ds on Big Data an alytics for su ccess. Th e n ext
largest Big Scien ce project began in th e late 1950s with th e lau n ch of
th e U.S. space program .

c08 22 October 2012; 17:59:52


THE EVOL UTI ON OF BI G DATA 79

As th e term Big Science gain ed cu rren cy in th e 1960s, th e Man -


h attan Project an d th e space program becam e paradigm atic exam ples.
However, th e In tern ation al Geoph ysical Year, an in tern ation al scien -
ti c project th at lasted from Ju ly 1, 1957, to Decem ber 31, 1958,
provided scien tists with an altern ative m odel: a syn optic collection of
observation al data on a global scale.
This new, potentially complem entary model of Big Science encom-
passed m ultiple elds of practice and relied heavily on the sharing of
large data sets that spanned multiple disciplines. The change in data
gathering techniques, analysis, and collaboration also helped to rede ne
how Big Science projects are planned and accomplished. Most impor-
tant, the International Geophysical Year project laid the foundation for
more ambitious projects that gathered m ore specialized data for speci c
analysis, such as the International Biological Program and later the Long-
Term Ecological Research Network. Both increased the mountains of
data gathered, incorporated newer analysis technologies, and pushed IT
technology further into the spotlight.
Th e In tern ation al Biological Program en cou n tered dif cu lties
wh en th e in stitu tion al stru ctu res, research m eth odologies, an d data
m an agem en t im plied by th e Big Scien ce m ode of research collided
with th e epistem ic goals, practices, an d assu m ption s of m an y of th e
scien tists in volved. By 1974, wh en th e program en ded, m an y parti-
cipan ts viewed it as a failu re.
Neverth eless, wh at m an y viewed as a failu re really was a su ccess.
Th e program tran sform ed th e way data were collected, sh ared, an d
an alyzed an d rede n ed h ow IT can be u sed for data an alysis. Historical
an alysis su ggests th at m an y of th e origin al in cen tives of th e program
(su ch as th e em ph asis on Big Data an d th e im plem en tation of th e
organ ization al stru ctu re of Big Scien ce) were in fact realized by th e
program s vision aries an d its im m ediate in vestigators. Even th ou gh
th e program failed to follow th e exact m odel of th e In tern ation al
Geoph ysical Year, it u ltim ately su cceeded in providin g a ren ewed
legitim acy for syn optic data collection .
Th e lesson s learn ed from th e birth of Big Scien ce spawn ed n ew Big
Data projects: weath er prediction , ph ysics research (su percollider data
an alytics), astron om y im ages (plan et detection ), m edical research (dru g
in teraction ), an d m an y oth ers. Of cou rse, Big Data doesn t apply on ly to

c08 22 October 2012; 17:59:52


80 BI G DATA ANAL YTI CS

scien ce; bu sin esses h ave latch ed on to its tech n iqu es, m eth odologies, an d
objectives, too. Th is h as allowed th e bu sin esses to u n cover valu e in data
th at m igh t previou sly h ave been overlooked.

BIG DA TA : THE MO DERN ERA

Big Science m ay h ave led to the birth of Big Data, but it was Big
Busin ess th at brought Big Data th rough its adolescence into th e m odern
era. Big Science and Big Busin ess differ on m an y levels, of course,
especially in analytics. Big Scien ce u ses Big Data to answer qu estion s
or prove theories, while Big Busin ess u ses Big Data to discover n ew
opportu nities, m easu re ef ciencies, or u ncover relationships among
wh at was thought to be u nrelated data sets.
Non eth eless, both u se algorith m s to m in e data, an d both h ave to
h ave tech n ologies to work with m ou n tain s of data. Bu t th e sim ilarities
en d th ere. Big Scien ce gath ers data based on experim en ts an d research
con du cted in con trolled en viron m en ts. Big Bu sin ess gath ers data from
sou rces th at are tran saction al in n atu re an d th at often h ave little
con trol over th e origin of th e data.
For Big Bu sin ess, an d bu sin esses of alm ost an y size, th ere is an
avalan ch e of data available th at is in creasin g expon en tially. Perh aps
Google CEO Erik Sch m idt said it best: Every two days n ow we create
as m u ch in form ation as we did from th e dawn of civilization u p u n til
2003. Th at s som eth in g like ve exabytes of data. An exabyte is
an in credibly large, alm ost u n im agin able am ou n t of in form ation : 10
to th e 18th power. Th in k of an exabyte as th e n u m ber 1 followed by
18 zeros.
It is th at m assive am ou n t of expon en tially growin g data th at
de n es th e fu tu re of Big Data. On ce again , we m ay n eed to look at
th e scien ti c com m u n ity to determ in e wh ere Big Data is h eaded for th e
bu sin ess world. Farn am Jah an ian , th e assistan t director for com pu ter
an d in form ation scien ce an d en gin eerin g for th e Nation al Scien ce
Fou n dation (NSF), kicked off a May 1, 2012, brie n g abou t Big Data on
Capitol Hill by callin g data a tran sform ative n ew cu rren cy for scien ce,
en gin eerin g, edu cation , an d com m erce. Th at brie n g, wh ich was
organ ized by Tech Am erica, brou gh t togeth er a pan el of leaders from
govern m en t an d in du stry to discu ss th e opportu n ities for in n ovation

c08 22 October 2012; 17:59:52


THE EVOL UTI ON OF BI G DATA 81

arisin g from th e collection , storage, an alysis, an d visu alization of large,


h eterogen eou s data sets, all th e wh ile takin g in to con sideration th e
sign i can t secu rity an d privacy im plication s.
Jah an ian n oted th at Big Data is ch aracterized n ot on ly by th e
en orm ou s volu m e of data bu t also by th e diversity an d h eterogen eity
of th e data an d th e velocity of its gen eration , th e resu lt of m odern
experim en tal m eth ods, lon gitu din al observation al stu dies, scien ti c
in stru m en ts su ch as telescopes an d particle accelerators, In tern et
tran saction s, an d th e widespread deploym en t of sen sors all arou n d u s.
In doin g so, h e set th e stage for wh y Big Data is im portan t to all facets
of th e IT discovery an d in n ovation ecosystem , in clu din g th e n ation s
academ ic, govern m en t, in du strial, en trepren eu rial, an d in vestm en t
com m u n ities.
Jah an ian fu rth er explain ed th e im plication s of th e m odern era of
Big Data with th ree speci c poin ts:

First, in sigh ts an d m ore accu rate prediction s from large an d


com plex collection s of data h ave im portan t im plication s for
th e econ om y. Access to in form ation is tran sform in g
tradition al bu sin esses an d is creatin g opportu n ities in n ew
m arkets. Big Data is drivin g th e creation of n ew IT produ cts
an d services based on bu sin ess in telligen ce an d data
an alytics an d is boostin g th e produ ctivity of rm s th at u se it
to m ake better decision s an d iden tify n ew bu sin ess tren ds.
Secon d, advan ces in Big Data are critical to accelerate
th e pace of discovery in alm ost every scien ce an d
en gin eerin g disciplin e. From n ew in sigh ts abou t protein
stru ctu re, biom edical research an d clin ical decision m akin g,
an d clim ate m odelin g to n ew ways to m itigate an d respon d
to n atu ral disasters an d n ew strategies for effective learn in g
an d edu cation , th ere are en orm ou s opportu n ities for data-
driven discovery.
Th ird, Big Data also h as th e poten tial to solve som e of
th e n ation s m ost pressin g ch allen ges in scien ce,
edu cation , en viron m en t an d su stain ability, m edicin e,
com m erce, an d cyber an d n ation al secu rity with
en orm ou s societal ben e t an d layin g th e fou n dation s for
U.S. com petitiven ess for m an y decades to com e.

Jah an ian sh ared th e Presiden t s Cou n cil of Advisors on Scien ce


an d Tech n ology s recen t recom m en dation for th e federal govern m en t

c08 22 October 2012; 17:59:52


82 BI G DATA ANAL YTI CS

to in crease R&D in vestm en ts for collectin g, storing, preservin g, m an -


agin g, an alyzin g, and sh arin g in creased qu an tities of data, becau se
th e poten tial to gain n ew in sigh ts [by m ovin g] from data to kn owl-
edge to action h as trem en dou s poten tial to tran sform all areas of
n ation al priority.
Partly in respon se to th is recom m en dation , th e Wh ite Hou se Of ce
of Scien ce an d Tech n ology Policy, togeth er with oth er agen cies,
an n ou n ced a $200 m illion Big Data R&D in itiative to advan ce core
tech n iqu es an d tech n ologies. Accordin g to Jah an ian , with in th is in i-
tiative, th e NSF s strategy for su pportin g th e fu n dam en tal scien ce an d
u n derlyin g in frastru ctu re en ablin g Big Data scien ce an d en gin eerin g
in volves th e followin g:

Advan ces in fou n dation al tech n iqu es an d tech n ologies (i.e.,


n ew m eth ods) to derive kn owledge from data.
Cyberin frastru ctu re to m an age, cu rate, an d serve data to scien ce
an d en gin eerin g research an d edu cation com m u n ities.
New approach es to edu cation an d workforce developm en t.
Nu rtu ran ce of n ew types of collaboration s m u ltidisciplin ary
team s an d com m u n ities en abled by n ew data access policies to
m ake advan ces in th e gran d ch allen ges of th e com pu tation - an d
data-in ten sive world today.

Ultim ately, Jah an ian con clu ded, realizin g th e en orm ou s poten tial
of Big Data requ ires a lon g-term , bold, su stain able, an d com preh en sive
approach , n ot on ly by NSF bu t also th rou gh ou t th e govern m en t an d
ou r n ation s research in stitu tion s.
Th e pan el discu ssion s th at followed ech oed m an y of Jah an ian s
rem arks. For exam ple, Nu ala O Con n or Kelly, th e sen ior cou n sel for
in form ation govern an ce an d ch ief privacy leader at Gen eral Electric
(GE), said, For u s, it s th e volu m e an d velocity an d variety of data
[an d th e opportu n ity th at s presen ted for u sin g] th at data to ach ieve
n ew resu lts for th e com pan y an d for ou r cu stom ers an d clien ts
[th rou gh ou t th e world]. Sh e cited as an exam ple th at GE Health care
collects an d m on itors m ain ten an ce data from its m ach in es deployed
worldwide an d can au tom atically sh ip replacem en t parts ju st days in
advan ce of th eir m alfu n ction in g, based on th e an alytics of m ach in e

c08 22 October 2012; 17:59:52


THE EVOL UTI ON OF BI G DATA 83

fu n ction ality. Mu ch of [th is] is don e rem otely an d at trem en dou s cost
savin gs, sh e said.
Caron Kogan , th e strategic plan n in g director at Lockh eed Martin ,
an d Flavio Villan u stre, th e vice presiden t of tech n ology at LexisNexis
Risk Solu tion s, described sim ilar pu rsu its with in th eir com pan ies
particu larly in in telligen ce an d frau d preven tion , respectively.
GE s Kelly tou ch ed on privacy aspects. Con trol m ay n o lon ger be
abou t n ot h avin g th e data at all, sh e poin ted ou t. A poten tially m ore
ef cien t solu tion is on e of m akin g su re th ere are appropriate con trols
tech n ologically an d processes an d policies an d laws in place an d th en
en su rin g appropriate en forcem en t. Sh e em ph asized strikin g th e righ t
balan ce between policies th at en su re th e protection of in dividu als an d
th ose th at en able tech n ological in n ovation an d econ om ic growth .
Bill Perlowitz, th e ch ief tech n ology of cer in Wyle Laboratories s
scien ce, tech n ology, an d en gin eerin g grou p, referen ced a paradigm
sh ift in scien ti c exploration :

Before, if you h ad an application or software, you h ad


valu e; n ow th at valu e is goin g to be in th e data. For
scien tists th at represen ts a sh ift from [h ypoth esis-driven ]
scien ce to data-driven research . Hypoth esis-driven scien ce
lim its you r exploration to wh at you can im agin e, an d
th e h u m an m in d . . . can on ly go so far. Data-driven
scien ce allows u s to collect data an d th en see wh at it tells u s,
an d we don t h ave a preten se th at we m ay u n derstan d
wh at th ose relation sh ips are an d wh at we m ay n d. So for
a research scien tist, th ese kin ds of ch an ges are very
excitin g an d som eth in g we ve been tryin g to get to for som e
tim e n ow.

Perh aps Nick Com bs, th e federal ch ief tech n ology of cer at EMC
Corporation , su m m ed it u p best wh en describin g th e u n preceden ted
growth in data: It s [n o lon ger abou t n din g a] n eedle in a h aystack or
con n ectin g th e dots. Th at s ch ild s play.
Wh at all of th is m ean s is th at th e valu e of Big Data an d th e
tran sform ation of th e ideologies an d tech n ologies are already h ere.
Th e govern m en t an d scien ti c com m u n ities are preparin g th em selves
for th e n ext evolu tion of Big Data an d are plan n in g h ow to address th e
n ew ch allen ges an d gu re ou t better ways to leverage th e data.

c08 22 October 2012; 17:59:52


84 BI G DATA ANAL YTI CS

TO DAY, TO MO RRO W, A N D THE N EXT DA Y

As th e am ou n t of data gath ered grow s expon en tially, so does th e


evolu tion of th e tech n ology u sed to process th e data. Accordin g to
th e In tern ation al Data Corporation , th e volu m e of digital con ten t
in th e world w ill grow to 2.7 billion terabytes in 2012, u p 48 percen t
from 2011, an d w ill reach 8 billion terabytes by 2015. Th at w ill be a
lot of data!
The ood of data is com in g from both stru ctured corporate data-
bases and u nstru ctured data from Web pages, blogs, social n etworkin g
m essages, and oth er sources. Currently, for exam ple, there are coun t-
less digital sen sors worldwide in industrial equ ipm ent, autom obiles,
electrical m eters, and shippin g crates. Those sen sors can m easu re and
comm u nicate location, m ovem en t, vibration , tem perature, h um idity,
and even chemical changes in the air. Today, Big Busin ess wields data
like a weapon . Gian t retailers, su ch as Walmart and Koh l s, analyze
sales, pricin g, econ om ic, demograph ic, and weath er data to tailor
product selection s at particu lar stores and determ in e the tim in g of price
m arkdowns.
Logistics com pan ies like Un ited Parcel Service m in e data on tru ck
delivery tim es an d traf c pattern s to n e-tu n e rou tin g. A wh ole eco-
system of n ew bu sin esses an d tech n ologies is sprin gin g u p to en gage
with th is n ew reality: com pan ies th at store data, com pan ies th at m in e
data for in sigh t, an d com pan ies th at aggregate data to m ake th em
m an ageable. However, it is an ecosystem th at is still em ergin g, an d its
exact sh ape h as yet to m ake itself clear.
Even th ou gh Big Data h as been arou nd for som e tim e, one of
th e biggest challen ges of workin g with it still rem ains, and that is
assem blin g data and preparin g them for analysis. Differen t systems store
data in different formats, even within th e sam e compan y. Assem bling,
standardizin g, and clean in g data of irregularities all without rem oving
th e in formation th at m akes them valu able rem ain a central challen ge.
Cu rren tly, Hadoop, an open sou rce software fram ework derived
from Google s MapRedu ce an d Google File System papers, is bein g
u sed by several tech n ology ven dors to do ju st th at. Hadoop m aps tasks
across a clu ster of m ach in es, splittin g th em in to sm aller su btasks,
before redu cin g th e resu lts in to on e m aster calcu lation . It s really an

c08 22 October 2012; 17:59:53


THE EVOL UTI ON OF BI G DATA 85

old grid-com pu tin g tech n iqu e given n ew life in th e age of clou d


com pu tin g. Man y of th e ch allen ges of yesterday rem ain today, an d
tech n ology is ju st n ow catch in g u p with th e dem an ds of Big Data
an alytics. However, Big Data rem ain s a m ovin g target.
As th e fu tu re brin gs m ore ch allen ges, it will also deliver m ore
solu tion s, an d Big Data h as a brigh t fu tu re, with tom orrow deliverin g
th e tech n ologies th at ease leveragin g th e data. For exam ple, Hadoop is
con vergin g with oth er tech n ology advan ces su ch as h igh -speed data
an alysis, m ade possible by parallel com pu tin g, in -m em ory processin g,
an d lower-cost ash m em ory in th e form of solid-state drives.
Th e prospect of bein g able to process troves of data very qu ickly,
in -m em ory, with ou t tim e-con su m in g forays to retrieve in form ation
stored on disk drives, will be a m ajor en abler, an d th is will allow
com pan ies to assem ble, sort, an d an alyze data m u ch m ore rapidly. For
exam ple, T-Mobile is u sin g SAP s HANA to m in e data on its 30 m illion
U.S. cu stom ers from stores, text m essages, an d call cen ters to tailor
person alized deals.
Wh at u sed to take T-Mobile a w eek to accom plish can n ow be
don e in th ree h ou rs w ith th e SAP system . Organ ization s th at can
u tilize th is capability to m ake faster an d m ore in form ed bu sin ess
decision s w ill h ave a distin ct advan tage over com petitors. In a sh ort
period of tim e, Hadoop h as tran sition ed from relative obscu rity as a
con su m er In tern et project in to th e m ain stream con sciou sn ess of
en terprise IT.
Hadoop is designed to handle m oun tain s of u nstru ctu red data.
However, as it exists, the open sou rce code is a lon g way from m eeting
enterprise requirem en ts for security, m an agem en t, and ef ciency
without some seriou s customization . Enterprise-scale Hadoop deploy-
m en ts require costly IT specialists who are capable of guiding a lot of
som ewh at disjoin ted processes. That currently lim its adoption to orga-
n izations with substan tial IT budgets.
As tom orrow delivers re n ed platform s, Hadoop an d its derivatives
will start to t in to th e en terprise as a com plem en t to existin g data
an alytics an d data wareh ou sin g tools, available from establish ed
bu sin ess process ven dors, su ch as Oracle, HP, an d SAP. Th e key will be
to m ake Hadoop m u ch m ore accessible to en terprises of all sizes, wh ich
can be accom plish ed by creatin g h igh availability platform s th at take

c08 22 October 2012; 17:59:53


86 BI G DATA ANAL YTI CS

m u ch of th e com plexity ou t of assem blin g an d preparin g h u ge


am ou n ts of data for an alysis.
Aggregatin g m u ltiple steps in to a stream lin ed au tom ated process
w ith sign i can tly en h an ced secu rity will prove to be th e catalyst th at
drives Big Data from today to tom orrow. Add th ose en h an cem en ts
to n ew tech n ologies, su ch as applian ces, an d th e m om en tu m sh ou ld
con tin u e to pick u p, th an ks to easy m an agem en t th rou gh u ser-
frien dly GUI.
Th e tru e valu e of Big Data lies in th e am ou n t of u sefu l data th at can
be derived from it. Th e fu tu re of Big Data is th erefore to do for data
an d an alytics wh at Moore s Law h as don e for com pu tin g h ardware an d
expon en tially in crease th e speed an d valu e of bu sin ess in telligen ce.
Wh eth er th e n eed is to lin k geograph y an d retail availability, u se patien t
data to forecast pu blic h ealth tren ds, or an alyze global clim ate tren ds, we
live in a world fu ll of data. Effectively h arn essin g Big Data will give
bu sin esses a wh ole n ew len s th rou gh wh ich to see it.
However, th e advan ce of Big Data tech n ology doesn t stop with
tom orrow. Beyon d tom orrow probably h olds su rprises th at n o on e
h as even im agin ed yet. As tech n ology m arch es ah ead, so will th e
u sefu ln ess of Big Data. A case in poin t is IBM s Watson , an arti cial
in telligen ce com pu ter system capable of an swerin g qu estion s posed in
n atu ral lan gu age. In 2011, as a test of its abilities, Watson com peted on
th e qu iz sh ow Jeopardy!, in th e sh ow s on ly h u m an -versu s-m ach in e
m atch to date. In a two-gam e, com bin ed-poin t m atch , broadcast in
th ree episodes aired Febru ary 14 16, Watson beat Brad Ru tter, th e
biggest all-tim e m on ey win n er on Jeopardy!, an d Ken Jen n in gs,
th e record h older for th e lon gest ch am pion sh ip streak (74 win s).
Watson h ad access to 200 m illion pages of structu red and u nstruc-
tu red content consum in g four terabytes of disk storage, in clu din g the full
text of Wikipedia, but was n ot conn ected to th e Intern et du rin g th e
game. Watson demonstrated that th ere are new ways to deal with Big
Data and n ew ways to m easu re results, perh aps exem plifyin g wh ere
Big Data m ay be h eaded.
So wh at s n ext for Watson ? IBM h as stated pu blicly th at Watson
was a clien t-driven in itiative, an d th e com pan y in ten ds to pu sh Wat-
son in direction s th at best serve cu stom er n eeds. IBM is n ow workin g
with n an cial gian t Citi to explore h ow th e Watson tech n ology cou ld

c08 22 October 2012; 17:59:53


THE EVOL UTI ON OF BI G DATA 87

im prove an d sim plify th e ban kin g experien ce. Watson s applicability


doesn t en d with ban kin g, h owever; IBM h as also team ed u p with
h ealth in su rer WellPoin t to tu rn Watson in to a m ach in e th at can
su pport th e doctors of th e world.
Accordin g to IBM, Watson is best su ited for u se cases in volvin g
critical decision m akin g based on large volu m es of u n stru ctu red data.
To drive th e Big Data cru n ch in g m essage h om e, IBM h as stated th at
90 percen t of th e world s data was created in th e last two years, an d 80
percen t of th at data is u n stru ctu red. Fu rth erin g th e valu e proposition
of Watson an d Big Data, IBM h as also stated th at ve n ew research
docu m en ts com e ou t of Wall Street every m in u te, an d m edical
in form ation is dou blin g every ve years.
IBM views th e fu tu re of Big Data a little differen tly th an oth er
ven dors do, m ost likely based on its Watson research . In IBM s fu tu re,
Watson becom es a service as IBM calls it, Watson -as-a-Service
wh ich will be delivered as a private or h ybrid clou d service.
Watson aside, th e h ealth care in du stry seem s ripe as a sou rce of
prediction for h ow Big Data will evolve. Exam ples abou n d for th e
ben e ts of Big Data an d th e m edical eld; h owever, gettin g th ere is
an oth er story altogeth er. Health care (or in th is con text, Big Medi-
cin e ) h as som e speci c ch allen ges to overcom e an d som e speci c
goals to ach ieve to realize th e poten tial of Big Data:

Big Medicin e is drown in g in in form ation wh ile also dyin g of


th irst. For th ose in th e m edical profession , th at axiom can be
su m m ed u p with a situ ation th at m ost m edical person n el face:
Wh en you re in th e in stitu tion an d you re tryin g to gu re ou t
wh at s goin g on an d h ow to report on som eth in g, you re dyin g
of th irst in a sea of in form ation . Th ere is a trem en dou s am ou n t of
in form ation , so m u ch so th at it becom es a Big Data problem .
How does on e tap in to th at in form ation an d m ake sen se of it?
Th e an swer h as im plication s n ot on ly for patien ts bu t also for th e
service providers, ran gin g from n u rses, ph ysician s, an d h ospital
adm in istrators, even to governm ent and in su rance agencies. The
big issue is th at th e data are not organized; they are a m ixture of
structured and un structured data. How the data will ultimately be
handled over th e next few years will be driven by the governm ent,

c08 22 October 2012; 17:59:53


88 BI G DATA ANAL YTI CS

which will require a tremendous amount of information to be


recorded for reporting pu rposes.
Tech n ologies th at tap in to Big Data n eed to becom e m ore
prevalen t an d even u biqu itou s. From th e patien t s perspective,
an alytics an d Big Data will aid in determ in in g wh ich h ospital in
a patien t s im m ediate area is th e best for treatin g h is or h er
con dition . Today th ere are a h u ge n u m ber of ch oices available,
an d m ost people ch oose by word of m ou th , in su ran ce
requ irem en ts, doctor recom m en dation s, an d m an y oth er fac-
tors. Wou ldn t it m ake m ore sen se to pick a facility based on
report cards derived by an alytics? Th at is th e goal of th e gov-
ern m en t, wh ich wan ts patien ts to be able to look at a report
card for variou s in stitu tion s. However, th e on ly way to create
th at report card is to u n lock all of th e in form ation an d im pose
regu lation s an d reportin g. Th at will requ ire variou s types of IT
to tap in to u n stru ctu red in form ation , like dash board tech n olo-
gies an d an alytics, bu sin ess in telligen ce tech n ologies, clin ical
in telligen ce tech n ologies, an d reven u e cycle m an agem en t
in telligen ce for in stitu tion s.
Decision support needs to be easier to access. Currently in medical
institutions, evidence-based m edicine and decision support is not
as easy to access as it should be. Utilizing Big Data analytics will
make the decision process easier and will provide the hard evi-
dence to validate a particular decision path. For example, when a
patient is suffering from a particular condition, there s a high
potential that som ething is going to happen to that patient because
of his or her history. The likely outcomes or progressions can be
brought up at the beginning of the care cycle, and the treating
physician can be informed immediately. Inform ation like that
and much more will come from the Big Data analytics process.
In form ation n eeds to ow m ore easily. Cu rren tly from a patien t s
perspective, h ealth care today lim its in form ation . Patien ts often
h ave little perspective on wh at exactly is h appen in g, at least u n til
a ph ysician com es in . However, th e m ajority of patien ts are
appreh en sive abou t talkin g to th e ph ysician . Th at becom es an

c08 22 October 2012; 17:59:53


THE EVOL UTI ON OF BI G DATA 89

in form ation al blockade for both th e ph ysician an d th e patien t an d


creates a situ ation in wh ich it becom es m ore dif cu lt for both
ph ysician s an d patien ts to m ake ch oices. Big Data h as th e
poten tial to solve th at problem as well; th e ow of in form ation
will be easier n ot on ly for physicians to m anage bu t also for patien ts
to access. For example, ph ysicians will be able to look on their
tablets or smartphones and see there is a 15-minute emergency-
room wait over here and a 5-min ute wait over there. Schedulin g,
diagnostic support, and evidence-based m edicin e support in the
work ow will improve.
Qu ality of care n eeds to be in creased wh ile drivin g costs down .
From a cost perspective an d a qu ality-of-care poin t of view,
th ere are a n u m ber of differen t areas th at can be im proved by
Big Data. For exam ple, if a patien t experien ces an in ju ry wh ile
stayin g in a h ospital, th e h ospital will n ot be reim bu rsed for h is
or h er care. Th e system can see th at th is h as th e poten tial to
h appen an d can alert everyon e. Big Data can en able a proactive
approach for care th at redu ces acciden ts or oth er problem s th at
affect th e qu ality of care. By preven tin g problem s an d acciden ts,
Big Data can yield sign i can t savin gs.
Th e ph ysician patien t relation sh ip n eeds to im prove. Th an ks to
social m edia an d m obile application s, wh ich are ben e tin g from
Big Data tech n iqu es, it is becom in g easier to research h ealth
issu es an d allow patien ts an d ph ysician s to com m u n icate m ore
frequ en tly. Stored data an d u n stru ctu red data can be an alyzed
again st social data to iden tify h ealth tren ds. Th at in form ation
can th en be u sed by h ospitals to keep patien ts h ealth ier an d ou t
of th e facility. In th e past, h ospitals m ade m ore m on ey th e sicker
a patien t was an d th e lon ger th ey kept h im or h er th ere.
However, with h ealth care reform , h ospitals are goin g to start
bein g com pen sated for keepin g patien ts h ealth ier. Becau se of
th at th ere will be an explosion of m obile application s an d even
social m edia, allowin g patien ts to h ave easier access to n u rses
an d ph ysician s. Health care is u n dergoin g a tran sform ation in
wh ich th e focu s is m ore on keepin g patien ts h ealth y an d drivin g

c08 22 October 2012; 17:59:53


90 BI G DATA ANAL YTI CS

down costs. Th ese two m ajor areas are goin g to drive a great
deal of ch an ge, an d a lot of evolu tion will take place from a
h ealth in form ation tech n ology poin t of view, all u n derpin n ed
by th e availability of data.

Health care proves th at Big Data h as de n ite valu e an d will


argu ably be th e leader in Big Data developm en ts. However, th e lesson s
learn ed by th e h ealth care in du stry can readily be applied to oth er
bu sin ess m odels, becau se Big Data is all abou t kn owin g h ow to u tilize
an d an alyze data to t speci c n eeds.

CHA N GIN G A LGO RITHMS

Ju st as data evolve an d force th e evolu tion of Big Data platform s, th e


very basic elem en ts of an alytics evolve as well. Most approach es to
dealin g with large data sets with in a classi cation learn in g paradigm
attem pt to in crease com pu tation al ef cien cy. Given th e sam e am ou n t
of tim e, a m ore ef cien t algorith m can explore m ore of th e h ypoth esis
space th an a less ef cien t algorith m . If th e h ypoth esis space con tain s
an optim al solu tion , a m ore ef cien t algorith m h as a greater ch an ce of
n din g th at solu tion (assu m in g th e h ypoth esis space can n ot be
exh au stively search ed with in a reason able tim e). It is th at desire for
ef cien cy an d speed th at is forcin g th e evolu tion of algorith m s an d th e
su pportin g system s th at ru n th em .
However, a m ore ef cien t algorith m resu lts in m ore search in g or a
faster search , n ot a better search . If th e learn in g biases of th e algorith m
are in appropriate, an in crease in com pu tation al ef cien cy m ay n ot
equ al an im provem en t in prediction perform an ce. Th erein lies th e
problem : More ef cien t algorith m s n orm ally do n ot lead to addition al
in sigh ts, ju st im proved perform an ce. However, im provin g th e per-
form an ce of a Big Data an alytics platform in creases th e am ou n t of data
th at can be an alyzed, an d th at m ay lead to n ew in sigh ts.
Th e trick h ere is to create n ew algorith m s th at are m ore exible,
th at in corporate m ach in e learn in g tech n iqu es, an d th at rem ove th e
bias from an alysis. Com pu ter system s are n ow becom in g powerfu l
an d su btle en ou gh to h elp redu ce h u m an biases from ou r decision
m akin g. An d th is is th e key: Com pu ters can do it in real tim e. Th at will

c08 22 October 2012; 17:59:53


THE EVOL UTI ON OF BI G DATA 91

in evitably tran sform th e objective observer con cept in to an organ ic,


evolvin g database.
Today th ese system s can ch ew th rou gh billion s of bits of data,
an alyze th em via self-learn in g algorith m s, an d package th e in sigh ts for
im m ediate u se. Neith er we n or th e com pu ters are perfect, bu t in
tan dem we m igh t n eu tralize ou r biased, in tu itive failin gs wh en we
price a car, prescribe a m edicin e, or deploy a sales force.
In th e real world, accu rate algorith m s will tran slate to fewer
h u n ch es an d m ore facts. Take, for exam ple, th e ban kin g an d m ortgage
m arket, wh ere even th e m ost kn owledgeable h u m an can qu ickly be
ou tdon e by an algorith m . Big Data system s are n ow of su ch scale th at
th ey can an alyze th e valu e of ten s of th ou san ds of m ortgage-backed
secu rities by pickin g apart th e on goin g, dyn am ic creditworth in ess of
ten s of m illion s of in dividu al h om e own ers. Su ch a system h as already
been bu ilt for Wall Street traders.
By cru n ch in g billion s of data poin ts abou t traf c ows, an algo-
rith m m igh t n d th at on Fridays a delivery eet sh ou ld stick to th e
h igh ways, despite th e gu t in stin ct of a dispatch er for su rface road
sh ortcu ts.
Big Data is at an evolu tion ary ju n ctu re wh ere h u m an ju dgm en t
can be im proved or even replaced by m ach in es. Th at m ay sou n d
om in ou s, bu t th e sam e system s are already predictin g h u rrican es,
warn in g of earth qu akes, an d m appin g torn adoes.
Bu sin esses are seein g th e valu e, an d th e system s an d algorith m s
are startin g to su pplem en t h u m an ju dgm en t an d are even on a path to
replace it, in som e cases. Un til recen tly, h owever, bu sin esses h ave
been th warted by th e cost of storage, slower processin g speeds, an d th e
ood of th e data th em selves, spread sloppily across scores of differen t
databases in side on e com pan y.
With tech n ology an d pricin g poin ts n ow solvin g th ose problem s,
th e evolu tion of algorith m s an d Big Data platform s is bou n d to
accelerate an d ch an ge th e very way we do predictive an alysis, research ,
an d even bu sin ess.

c08 22 October 2012; 17:59:53

You might also like