You are on page 1of 9

Big Data Analytics: Turning Big Data into Big Money

By Frank Ohlhor st
Copyright 2013 by John Wiley & Sons, Inc.

CHAPTER

Why Big Data


Matters

n owin g wh at Big Data is an d kn owin g its valu e are two differen t


th in gs. Even with an u n derstan din g of Big Data an alytics, th e
valu e of th e in form ation can still be dif cu lt to visu alize. At rst
glan ce, th e well of stru ctu red, u n stru ctu red, an d sem istru ctu red data
seem s alm ost u n fath om able, with each bu cket drawn bein g little m ore
th an a m ish m ash of u n related data elem en ts.
Fin din g wh at m atters an d wh y it m atters is on e of th e rst steps
in drin kin g from th e well of Big Data an d th e key to avoid drown in g in
in form ation . However, th is qu estion still rem ain s: Wh y does Big Data
m atter? It seem s dif cu lt to an swer for sm all an d m ediu m bu sin esses,
especially th ose th at h ave sh u n n ed bu sin ess in telligen ce solu tion s in
th e past an d h ave com e to rely on oth er m eth ods to develop th eir
m arkets an d m eet th eir goals.
For th e en terprise m arket, Big Data an alytics h as proven its
valu e, an d exam ples abou n d. Com pan ies su ch as Facebook,
Am azon , an d Google h ave com e to rely on Big Data an alytics as part
of th eir prim ary m arketin g sch em es as w ell as a m ean s of servicin g
th eir cu stom ers better.
For exam ple, Am azon h as leveraged its Big Data well to create an
extrem ely accu rate represen tation of wh at produ cts a cu stom er sh ou ld
bu y. Am azon accom plish es th at by storin g each cu stom er s search es
an d pu rch ases an d alm ost an y oth er piece of in form ation available,

11

c02 22 October 2012; 17:53:9


12 BI G DATA ANAL YTI CS

an d th en applyin g algorith m s to th at in form ation to com pare on e


cu stom er s in form ation with all of th e oth er cu stom ers in form ation .
Am azon h as learn ed th e key trick of extractin g valu e from a large
data well an d h as applied perform an ce an d depth to a m assive am ou n t
of data to determ in e wh at is im portan t an d wh at is extran eou s. Th e
com pan y h as su ccessfu lly captu red th e data exh au st th at an y cu s-
tom er or poten tial cu stom er h as left beh in d to bu ild an in n ovative
recom m en dation an d m arketin g data elem en t.
Th e resu lts are real an d m easu rable, an d th ey offer a practical
advan tage for a cu stom er. Take, for exam ple, a cu stom er bu yin g a
jacket in a sn owy region . Wh y n ot su ggest pu rch asin g gloves to m atch ,
or boots, as well as a sn ow sh ovel, an ice m elt, an d tire ch ain s? For an
in -store salesperson , th ose recom m en dation s m ay com e n atu rally; for
Am azon , Big Data an alytics is able to in terpret tren ds an d brin g
u n derstan din g to th e pu rch asin g process by sim ply lookin g at wh at
cu stom ers are bu yin g, wh ere th ey are bu yin g it, an d wh at th ey h ave
pu rch ased in th e past. Th ose data, com bin ed with oth er pu blic data
su ch as cen su s, m eteorological, an d even social n etworkin g data,
create a u n iqu e capability th at services th e cu stom er an d Am azon
as well.
Mu ch th e sam e can be said for Facebook, wh ere Big Data com es
in to play for critical featu res su ch as frien d su ggestion s, targeted ads,
an d oth er m em ber-focu sed offerin gs. Facebook is able to accu m u late
in form ation by u sin g an alytics th at leverage pattern recogn ition , data
m ash -u ps, an d several oth er data sou rces, su ch as a u ser s preferen ces,
h istory, an d cu rren t activity. Th ose data are m in ed, alon g with th e data
from all of th e oth er u sers, to create focu sed recom m en dation s, wh ich
are reported to be qu ite accu rate for th e m ajority of u sers.

BIG DA TA REA CHES DEEP

Google leverages th e Big Data m odel as well, an d it is on e of th e ori-


gin ators of th e software elem en ts th at m ake Big Data possible. How-
ever, Google s approach an d focu s is som ewh at differen t from th at of
com pan ies like Facebook an d Am azon . Google aim s to u se Big Data to
its fu llest exten t, to ju dge search resu lts, predict In tern et traf c u sage,
an d service cu stom ers with Google s own application s. From th e

c02 22 October 2012; 17:53:9


WHY BI G DATA MATTERS 13

advertisin g perspective, Web search es can be tied to produ cts th at t


in to th e criteria of th e search by delvin g in to a vast m in e of Web search
in form ation , u ser preferen ces, cookies, h istories, an d so on .
Of cou rse, Am azon , Google, an d Facebook are h u ge en terprises
an d h ave access to petabytes of data for an alytics. However, th ey are
n ot th e on ly exam ples of h ow Big Data h as affected bu sin ess processes.
Exam ples abou n d from th e scien ti c, m edical, an d en gin eerin g com -
m u n ities, wh ere h u ge am ou n ts of data are gath ered th rou gh experi-
m en tation , observation , an d case stu dies. For exam ple, th e Large
Hadron Collider at CERN can gen erate on e petabyte of data per sec-
on d, givin g n ew m ean in g to th e con cept of Big Data. CERN relies on
th ose data to determ in e th e resu lts of experim en ts u sin g com plex
algorith m s an d an alytics th at can take sign i can t am ou n ts of tim e an d
processin g power to com plete.
Man y ph arm aceu tical an d m edical research rm s are in th e sam e
category as CERN, as well as organ ization s th at research earth qu akes,
weath er, an d global clim ates. All ben e t from th e con cept of Big Data.
However, wh ere does th at leave sm all an d m ediu m bu sin esses? How
can th ese en tities ben e t from Big Data an alytics? Th ese bu sin esses do
n ot typically gen erate petabytes of data or deal with trem en dou s
volu m es of u n categorized data, or do th ey?
For small and m ediu m busin esses (SMB), Big Data analytics can
deliver valu e for m u ltiple busin ess segments. Th at is a relatively recent
development with in the Big Data analytics m arket. Small and m ediu m
busin esses h ave access to scores of publicly available data, inclu ding
m ost of th e Web and social n etworkin g sites. Several h osted services
h ave also come in to being th at can offer the computin g power, storage,
and platform s for analytics, changin g the Big Data analytics m arket in to
a pay as you go, con sum e what you n eed entity. This proves to be
very affordable for the SMB m arket and allows th ose busin esses to take
it slow and experiment with what Big Data analytics can deliver.

O BSTA CLES REMA IN

With th e barriers of data volu m e an d costs som ewh at elim in ated, th ere
are still sign i can t obstacles for SMB en tities to leverage Big Data.
Th ose obstacles in clu de th e pu rity of th e data, an alytical kn owledge,

c02 22 October 2012; 17:53:9


14 BI G DATA ANAL YTI CS

an u n derstan din g of statistics, an d several oth er ph ilosoph ical an d


edu cation al ch allen ges. It all com es down to an alyzin g th e data n ot ju st
becau se th ey are th ere bu t for a speci c bu sin ess pu rpose.
For SMBs lookin g to gain experien ce in an alytics, th e rst place to
tu rn to is th e Web n am ely, for an alyzin g web site traf c. Here an
SMB can u se a tool like Blekko (h ttp:/ / www.blekko.com ) to look at
traf c distribu tion to a web site. Th is in form ation can be very valu able
for SMBs th at rely on a com pan y web site to dissem in ate m arketin g
in form ation , sell item s, or com m u n icate with cu rren t an d poten tial
cu stom ers. Blekko ts th e Big Data paradigm becau se it looks at
m u ltiple large data sets an d creates visu al resu lts th at h ave m ean in gfu l,
action able in form ation . Usin g Blekko, a sm all bu sin ess can qu ickly
gath er statistics abou t its web site an d com pare it with a com petitor s
web site.
Alth ou gh Blekko m ay be on e of th e sim plest exam ples of Big Data
an alytics, it does illu strate th e poin t th at even in its sim plest form , Big
Data an alytics can ben e t SMBs, ju st as it can ben e t large en terprises.
Of cou rse, oth er tools exist, an d n ew on es are com in g to m arket all of
th e tim e. As th ose tools m atu re an d becom e accessible to th e SMB
m arket, m ore opportu n ities will arise for SMBs to leverage th e Big
Data con cept.
Gath erin g th e data is u su ally h alf th e battle in th e an alytics gam e.
SMBs can search th e Web with tools like 80Legs, Extractiv, an d Nee-
dlebase, all of wh ich offer capabilities for gath erin g data from th e Web.
Th e data can in clu de social n etworkin g in form ation , sales lists, real
estate listin gs, produ ct lists, an d produ ct reviews an d can be gath ered
in to stru ctu red storage an d th en an alyzed. Th e gath ered data prove to
be a valu able resou rce for bu sin esses th at look to an alytics to en h an ce
th eir m arket stan din gs.
Big Data, wh eth er don e in -h ou se or on a h osted offerin g, provides
valu e to bu sin esses of an y size from th e sm allest bu sin ess lookin g to
n d its place in its m arket to th e largest en terprise lookin g to iden tify
th e n ext worldwide tren d. It all com es down to discoverin g an d
leveragin g th e data in an in telligen t fash ion .
Th e am ou n t of data in ou r world h as been explodin g, an d an a-
lyzin g large data sets is already becom in g a key basis of com petition ,
u n derpin n in g n ew waves of produ ctivity growth , in n ovation , an d

c02 22 October 2012; 17:53:9


WHY BI G DATA MATTERS 15

con su m er su rplu s. Bu sin ess leaders in every sector are goin g to h ave to
deal with th e im plication s of Big Data, eith er directly or in directly.
Fu rth erm ore, th e in creasin g volu m e an d detail of in form ation
acqu ired by bu sin esses an d govern m en t agen cies paired with th e rise
of m u ltim edia, social m edia, in stan t m essagin g, e-m ail, an d oth er
In tern et-en abled tech n ologies will fu el expon en tial growth in data
for th e foreseeable fu tu re. Som e of th at grow th can be attribu ted to
in creased com plian ce requ irem en ts, bu t a key factor in th e in crease in
data volu m es is th e in creasin gly sen sor-en abled an d in stru m en ted
world. Exam ples in clu de RFID tags, veh icles equ ipped with GPS sen -
sors, low-cost rem ote sen sin g devices, in stru m en ted bu sin ess pro-
cesses, an d in stru m en ted web site in teraction s.
Th e qu estion m ay soon arise of wh eth er Big Data is too big, leadin g
to a situ ation in wh ich determ in in g valu e m ay prove m ore dif cu lt.
Th is will evolve in to an argu m en t for th e qu ality of th e data over th e
qu an tity. Neverth eless, it will be alm ost im possible to deal with ever-
growin g data sou rces if bu sin esses don t prepare to deal with th e
m an agem en t of data h ead-on .

DATA CO N TIN UE TO EVO LVE

Before 2010, m an agin g data was a relatively sim ple ch ore: On lin e
tran saction processin g system s su pported th e en terprise s bu sin ess
processes, operation al data stores accu m u lated th e bu sin ess tran sac-
tion s to su pport operation al reportin g, an d en terprise data wareh ou ses
accu m u lated an d tran sform ed bu sin ess tran saction s to su pport both
operation al an d strategic decision m akin g.
Th e typical en terprise n ow experien ces a data growth rate of 40 to
60 percen t an n u ally, wh ich in tu rn in creases n an cial bu rden s an d
data m an agem en t com plexity. Th is situ ation im plies th at th e data
th em selves are becom in g less valu able an d m ore of a liability for m an y
bu sin esses, or a low-com m odity elem en t.
Noth in g cou ld be fu rth er from th e tru th . More data m ean m ore
valu e, an d cou n tless com pan ies h ave proved th at axiom with Big Data
an alytics. To exem plify th at valu e, on e n eeds to look n o fu rth er th an at
h ow vertical m arkets are leveragin g Big Data an alytics, wh ich leads to
a disru ptive ch an ge.

c02 22 October 2012; 17:53:9


16 BI G DATA ANAL YTI CS

For exam ple, sm aller retailers are collectin g click-stream data from
web site in teraction s an d loyalty card data from tradition al retailin g
operation s. Th is poin t-of-sale in form ation h as tradition ally been u sed
by retailers for sh oppin g basket an alysis an d stock replen ish m en t, bu t
m an y retailers are n ow goin g on e step fu rth er an d m in in g th e data for
a cu stom er bu yin g an alysis. Th ose retailers are th en sh arin g th ose data
(after n orm alization an d iden tity scru bbin g) with su ppliers an d
wareh ou ses to brin g added ef cien cy to th e su pply ch ain .
An oth er exam ple of n din g valu e com es from th e world of sci-
en ce, wh ere large-scale experim en ts create m assive am ou n ts of data
for an alysis. Big scien ce is n ow paired with Big Data. Th ere are far-
reach in g im plication s in h ow big scien ce is workin g with Big Data; it is
h elpin g to rede n e h ow data are stored, m in ed, an d an alyzed. Large-
scale experim en ts are gen eratin g m ore data th an can be h eld at a lab s
data cen ter (e.g., th e Large Hadron Collider at CERN gen erates over 15
petabytes of data per year), wh ich in tu rn requ ires th at th e data be
im m ediately tran sferred to oth er laboratories for processin g a tru e
m odel of distribu ted an alysis an d processin g.
Oth er scienti c quests are prime examples of Big Data in action ,
fueling a disru ptive change in h ow experim en ts are performed and
data in terpreted. Th an ks to Big Data m eth odologies, contin en tal-scale
experiments h ave become both politically and techn ologically feasible
(e.g., th e Ocean Observatories Initiative, the National Ecological Obser-
vatory Network, and USArray, a con tin en tal-scale seism ic observatory).
Mu ch of th e disru ption is fed by im proved in stru m en t an d sen sor
tech n ology; for in stan ce, th e Large Syn optic Su rvey Telescope h as a
3.2-gigabyte pixel cam era an d gen erates over 6 petabytes of im age
data per year. It is th e platform of Big Data th at is m akin g su ch lofty
goals attain able.
Th e validation of Big Data an alytics can be illu strated by advan ces
in scien ce. Th e biom edical corporation Bioin form atics recen tly
an n ou n ced th at it h as redu ced th e tim e it takes to sequ en ce a gen om e
from years to days, an d it h as also redu ced th e cost, so it will be feasible
to sequ en ce an in dividu al s gen om e for $1,000, pavin g th e way for
im proved diagn ostics an d person alized m edicin e.
Th e n an cial sector h as seen h ow Big Data an d its associated
an alytics can h ave a disru ptive im pact on bu sin ess. Fin an cial services

c02 22 October 2012; 17:53:9


WHY BI G DATA MATTERS 17

rm s are seein g larger volu m es th rou gh sm aller tradin g sizes,


in creased m arket volatility, an d tech n ological im provem en ts in au to-
m ated an d algorith m ic tradin g.

DA TA AN D DATA A N A LYSIS ARE GETTIN G


MO RE CO MPLEX

On e of th e su rprisin g ou tcom es of th e Big Data paradigm is th e sh ift of


wh ere th e valu e can be fou n d in th e data. In th e past, th ere was an
in h eren t h ypoth esis th at th e bu lk of valu e cou ld be fou n d in stru ctu red
data, wh ich u su ally con stitu te abou t 20 percen t of th e total data stored.
Th e oth er 80 percen t of data is u n stru ctu red in n atu re an d was often
viewed as h avin g lim ited or little valu e.
Th at perception began to ch an ge on ce th e su ccesses of search
en gin e providers an d e-retailers sh owed oth erwise. It was th e an alysis
of th at u n stru ctu red data th at led to click-stream an alytics (for
e-retailers) an d search en gin e prediction s th at lau n ch ed m u ch of th e Big
Data m ovem en t. Th e rst exam ples of th e su ccessfu l processin g of large
volu m es of u n stru ctu red data led oth er in du stries to take n ote, wh ich in
tu rn h as led to en terprises m in in g an d an alyzin g stru ctu red an d
u n stru ctu red data in con ju n ction to look for com petitive advan tages.
Un stru ctu red data brin g com plexity to th e an alytics process.
Tech n ologies su ch as im age processin g for face recogn ition , search
en gin e classi cation of videos, an d com plex data in tegration du rin g
geospatial processin g are becom in g th e n orm in processin g u n stru c-
tu red data. Add to th at th e n eed to su pport tradition al tran saction -
based an alysis (e.g., n an cial perform an ce), an d it becom es easy to see
com plexity growin g expon en tially. Moreover, oth er capabilities are
becom in g a requ irem en t, su ch as web click-stream data drivin g
beh avioral an alysis.
Beh avioral an alytics is a process th at determ in es pattern s of
beh avior from h u m an -to-h u m an an d h u m an -to-system in teraction
data. It requ ires large volu m es of data to bu ild an accu rate m odel. Th e
beh avioral pattern s can provide in sigh t in to wh ich series of action s led
to an even t (e.g., a cu stom er sale or a produ ct switch ). On ce th ese
pattern s h ave been determ in ed, th ey can be u sed in tran saction pro-
cessin g to in u en ce a cu stom er s decision .

c02 22 October 2012; 17:53:9


18 BI G DATA ANAL YTI CS

Wh ile m odels of tran saction al data an alytics are well u n derstood


an d m u ch of th e valu e is realized from stru ctu red data, it is th e valu e
fou n d in beh avioral an alytics th at allows th e creation of a m ore pre-
dictive m odel. Beh avioral in teraction s are less u n derstood, an d th ey
requ ire large volu m es of data to bu ild accu rate m odels. Th is is an oth er
case wh ere m ore data equ al m ore valu e; th is is backed by research th at
su ggests th at a soph isticated algorith m with little data is less accu rate
th an a sim ple algorith m with a large am ou n t of data. Eviden ce of th is
can be fou n d in th e algorith m s u sed for voice an d h an dwritin g rec-
ogn ition an d crowd sou rcin g.

THE FUTURE IS N O W

New developm en ts for processin g u n stru ctu red data are arrivin g on
th e scen e alm ost daily, with on e of th e latest an d m ost sign i can t
com in g from th e social n etworkin g site Twitter. Makin g sen se of its
m assive database of u n stru ctu red data was a h u ge problem so h u ge,
in fact, th at it pu rch ased an oth er com pan y ju st to h elp it n d th e valu e
in its m assive data store. Th e su ccess of Twitter revolves arou n d h ow
well th e com pan y can leverage th e data th at its u sers gen erate. Th is
am ou n ts to a great deal of u n stru ctu red in form ation from th e m ore
th an 200 m illion accou n ts th e site h osts, wh ich gen erates 230 m illion
Twitter m essages a day.
To address th e problem , th e social n etworkin g gian t pu rch ased
BackType, th e developer of Storm , a software produ ct th at can parse
live data stream s su ch as th ose created by th e m illion s of Twitter feeds.
Twitter h as released th e sou rce code of Storm , m akin g it available to
oth ers wh o wan t to pu rsu e th e tech n ology. Twitter is n ot in terested in
com m ercializin g Storm .
Storm h as proved its valu e for Twitter, wh ich can n ow perform
an alytics in real tim e an d iden tify tren ds an d em ergin g topics as th ey
develop. For exam ple, Twitter u ses th e software to calcu late h ow
widely Web addresses are sh ared by m u ltiple Twitter u sers in real tim e.
With th e capabilities offered by Storm , a com pan y can process Big
Data in real tim e an d garn er kn owledge th at leads to a com petitive
advan tage. For exam ple, calcu latin g th e reach of a Web address cou ld
take u p to 10 m in u tes u sin g a sin gle m ach in e. However, with a Storm

c02 22 October 2012; 17:53:9


WHY BI G DATA MATTERS 19

clu ster, th at workload can be spread ou t to dozen s of m ach in es, an d a


resu lt can be discovered in ju st secon ds. For com pan ies th at m ake
m on ey from em ergin g tren ds (e.g., ad agen cies, n an cial services, an d
In tern et m arketers), th at faster processin g can be cru cial.
Like Twitter, m an y organ ization s are discoverin g th at th ey h ave
access to a great deal of data, an d th ose data, in all form s, cou ld
be tran sform ed in to in form ation th at can im prove ef cien cies, m axi-
m ize pro ts, an d u n veil n ew tren ds. Th e trick is to organ ize an d
an alyze th e data qu ickly en ou gh , a process th at can n ow be accom -
plish ed u sin g open sou rce tech n ologies an d lu m ped u n der th e h eadin g
of Big Data.
Oth er exam ples abou n d of h ow u n stru ctu red, sem istru ctu red, an d
stru ctu red Big Data stores are providin g valu e to bu sin ess segm en ts.
Take, for exam ple, th e on lin e sh oppin g service Livin gSocial, wh ich
leverages tech n ologies su ch as th e Apach e Hadoop data processin g
platform to garn er in form ation abou t wh at its u sers wan t.
Th e process h as allowed Livin gSocial to offer predictive an alysis in
real tim e, wh ich better services its cu stom er base. Th e com pan y is n ot
alon e in its qu est for squ eezin g th e m ost valu e ou t of its u n stru ctu red
data. Oth er m ajor sh oppin g sites, sh oppin g com parison sites, an d
on lin e version s of brick-an d-m ortar stores h ave also im plem en ted
tech n ologies to brin g real-tim e an alytics to th e forefron t of cu stom er
in teraction .
However, in th at h igh ly com petitive m arket, n din g n ew ways to
in terpret th e data an d process th em faster is provin g to be th e critical
com petitive advan tage an d is drivin g Big Data an alytics forward with
n ew in n ovation s an d processes. Th ose en terprises an d m an y oth ers
learn ed th at data in all form s can n ot be con sidered a com m odity item ,
an d ju st as with gold, it is th rou gh m in in g th at on e n ds th e n u ggets of
valu e th at can affect th e bottom lin e.

c02 22 October 2012; 17:53:9

You might also like