You are on page 1of 8

Big Data Analytics: Turning Big Data into Big Money

By Frank Ohlhor st
Copyright 2013 by John Wiley & Sons, Inc.

CHAPTER

Big Data and the


Business Case

ig Data is qu ickly becom in g m ore th an ju st a bu zzword. A


pleth ora of organ ization s h ave m ade sign i can t in vestm en ts in
th e tech n ology th at su rrou n ds Big Data an d are cu rren tly startin g
to leverage th e con ten t with in to n d real valu e.
Even so, th ere is still a great deal of con fu sion abou t Big Data,
sim ilar to wh at m an y in form ation tech n ology (IT) m an agers h ave
experien ced in th e past with disru ptive tech n ologies. Big Data is dis-
ru ptive in th e way th at it ch an ges h ow bu sin ess in telligen ce (BI) is
u sed in a bu sin ess an d th at is a scary proposition for m an y sen ior
execu tives.
Th at situ ation pu ts ch ief tech n ology of cers, ch ief in form ation
of cers, an d IT m an agers in th e u n en viable position of tryin g to prove
th at a disru ptive tech n ology will actu ally im prove bu sin ess operation s.
Fu rth er com plicatin g th is situ ation is th e h igh cost associated with in -
h ou se Big Data processin g, as well as th e secu rity con cern s th at su r-
rou n d th e processin g of Big Data an alytics off-site.
Perh aps som e of th e strife com es from th e term Big Data itself.
Non tech n ical people m ay th in k of Big Data literally, as som eth in g
associated with big problem s an d big costs. Presen tin g Big Data as Big
An alytics in stead m ay be th e way to win over appreh en sive decision
m akers wh ile bu ildin g a bu sin ess case for th e staff, tech n ology, an d
resu lts th at Big Data relies u pon .

21

c03 22 October 2012; 17:55:52


22 BI G DATA ANAL YTI CS

Th e trick is to m ove beyon d th e accepted de n ition of Big Data


wh ich im plies th at it is n oth in g m ore th an data sets th at h ave becom e
too large to m an age with tradition al tools an d explain th at Big Data is
a com bin ation of tech n ologies th at m in es th e valu e of large databases.
An d large is th e key word h ere, sim ply becau se m assive am ou n ts of
data are bein g collected every secon d m ore th an ever im agin able
an d th e size of th ese data is greater th an can be practically m an aged by
today s cu rren t strategies an d tech n ologies.
Th at h as created a revolu tion in wh ich Big Data h as becom e
cen tered on th e tsu n am i of data an d h ow it will ch an ge th e execu tion
of bu sin esses processes. Th ese ch an ges in clu de in trodu cin g greater
ef cien cies, bu ildin g n ew processes for reven u e discovery, an d fu elin g
in n ovation . Big Data h as qu ickly grown from a n ew bu zzword bein g
tossed arou n d tech n ology circles in to a practical de n ition for wh at it is
really all abou t, Big An alytics.

REA LIZIN G VA LUE

A n u m ber of in du stries in clu din g h ealth care, th e pu blic sector, retail,


an d m an u factu rin g can obviou sly ben e t from an alyzin g th eir rap-
idly growin g m ou n ds of data. Collectin g an d an alyzin g tran saction al
data gives organ ization s m ore in sigh t in to th eir cu stom ers pre-
feren ces, so th e data can th en be u sed as a basis for th e creation of
produ cts an d services. Th is allows th e organ ization s to rem edy
em ergin g problem s in a tim ely an d m ore com petitive m an n er.
Th e u se of Big Data an alytics is th u s becom in g a key fou n dation for
com petition an d growth for in dividu al rm s, an d it will m ost likely
u n derpin n ew waves of produ ctivity, growth , an d con su m er su rplu s.

THE CA SE FO R BIG DA TA

Bu ildin g an effective business case for a Big Data project in volves


iden tifying several key elem en ts th at can be tied directly to a busin ess
process and are easy to u nderstand as well as quantify. These elements
are knowledge discovery, actionable in form ation, short-term and lon g-
term ben e ts, th e resolution of pain poin ts, and several others th at are
aligned with m aking a bu siness process better by providing insight.

c03 22 October 2012; 17:55:52


BI G DATA AND THE BUSI NESS CASE 23

In m ost in stan ces, Big Data is a disru ptive elem en t wh en in tro-


du ced in to an en terprise, an d th is disru ption in clu des issu es of scale,
storage, an d data cen ter design . Th e disru ption n orm ally in volves costs
associated with h ardware, software, staff, an d su pport, all of wh ich
affect th e bottom lin e. Th at m ean s th at retu rn on in vestm en t (ROI)
an d total cost of own ersh ip (TCO) are key elem en ts of a Big Data
bu sin ess plan . Th e trick is to accelerate ROI wh ile redu cin g TCO. Th e
sim plest way to do th is is to associate a Big Data bu sin ess plan with
oth er IT projects driven by bu sin ess n eeds.
Wh ile th at m igh t sou n d like a real ch allen ge, bu sin esses are
actu ally in vestin g in storage tech n ologies an d im proved processin g to
m eet oth er bu sin ess goals, su ch as com plian ce, data arch ivin g, clou d
in itiatives, an d con tin u ity plan n in g. Th ese in itiatives can provide th e
fou n dation for a Big Data project, th an ks to th e two prim ary n eeds of
Big Data: storage an d processin g.
Lately th e n atu ral growth of bu sin ess IT solu tion s h as been focu sed
on processes th at take on a distribu ted n atu re in wh ich storage an d
application s are spread ou t over m u ltiple system s an d location s. Th is
also proves to be a n atu ral com pan ion to Big Data, fu rth er h elpin g to
lay th e fou n dation for Big An alytics.
Buildin g a busin ess case involves u sing case scenarios and providing
supportin g information. An extensive supply of exam ples exists, with
several draft busin ess cases, case scenarios, and oth er collateral, all
cou rtesy of the m ajor ven dors involved with Big Data solu tion s. Notable
ven dors with m assive amoun ts of collateral inclu de IBM, Oracle, and HP.
Wh ile th ere is n o set form u la for bu ildin g a bu sin ess case, th ere are
som e critical elem en ts th at can be u sed to de n e h ow a bu sin ess case
sh ou ld look, wh ich h elps to en su re th e su ccess of a Big Data project.
A solid bu sin ess case for Big Data an alytics sh ou ld in clu de th e
followin g:

Th is in clu des th e
drivers of th e project, h ow oth ers are u sin g Big Data, wh at
bu sin ess processes Big Data will align with , an d th e overall goal
of im plem en tin g th e project.
It is often dif cu lt to qu an tify th e ben e ts of
Big Data as static an d tan gible. Big Data an alytics is all abou t th e

c03 22 October 2012; 17:55:52


24 BI G DATA ANAL YTI CS

in terpretation of data an d th e visu alization of pattern s, wh ich


am ou n ts to a su bjective an alysis, h igh ly depen den t on h u m an s
to tran slate th e resu lts. However, th at does n ot preven t a
bu sin ess case from in clu din g ben e ts driven by Big Data in
n on su bjective term s (e.g., iden tifyin g sales tren ds, locatin g
possible in ven tory sh rin kage, qu an tifyin g sh ippin g delays, or
m easu rin g cu stom er satisfaction ). Th e trick is to align th e ben -
e ts of th e project with th e n eeds of a bu sin ess process or
requ irem en t. An exam ple of th at wou ld be to iden tify a bu sin ess
goal, su ch as 5 percen t an n u al growth , an d th en sh ow h ow Big
Data an alytics can h elp to ach ieve th at goal.
Th ere are several path s to take to th e destin ation of
Big Data, ran gin g from in -h ou se big iron solu tion s (data cen ters
ru n n in g large m ain fram e system s) to h osted offerin gs in th e
clou d to a h ybrid of th e two. It is im portan t to research th ese
option s an d iden tify h ow each m ay work for ach ievin g Big Data
an alytics, as well as th e pros an d con s of each . Preferen ces an d
ben e ts sh ou ld also be h igh ligh ted, allowin g a n an cial decision
to be tied to a tech n ological decision .
Scope is m ore of a m an agem en t issu e th an a
ph ysical deploym en t issu e. It all com es down to h ow th e
im plem en tation scope affects th e resou rces, especially person n el
an d staff. Scope qu estion s sh ou ld iden tify th e who an d th e when
of th e project, in wh ich person n el h ou rs an d tech n ical expertise
are de n ed, as well as th e train in g an d an cillary elem en ts. Costs
sh ou ld also be associated with staf n g an d train in g issu es,
wh ich h elps to create th e big pictu re for TCO calcu lation s an d
provides th e basis for accu rate ROI calcu lation s.
Calcu latin g risk can be a com plex en deavor.
However, sin ce Big Data an alytics is tru ly a bu sin ess process th at
provides BI, risk calcu lation s can in clu de th e cost of doin g
n oth in g com pared to th e ben e ts delivered by th e tech n ology.
Oth er risks to con sider are secu rity im plication s (wh ere th e
data live an d wh o can access it), CPU overh ead (wh eth er th e
an alytics will lim it th e processin g power available for a lin e
of bu sin ess application s), com patibility an d in tegration issu es

c03 22 October 2012; 17:55:52


BI G DATA AND THE BUSI NESS CASE 25

(wh eth er th e in stallation an d operation will work with th e


existin g tech n ology), an d disru ption of bu sin ess processes
(in stallation creates down tim e). All of th ese elem en ts can be
con sidered risks with a large-scale project an d sh ou ld be
accou n ted for to bu ild a solid bu sin ess case.

Of cou rse, th e m ost critical th em e of a bu sin ess case is ROI. Th e


retu rn , or ben e t, th at an organ ization is likely to receive in relation to
th e cost of th e project is a ratio th at can ch an ge as m ore research is
don e an d in form ation is gath ered wh ile bu ildin g a bu sin ess case.
Ideally, th e ROI-to-cost ratio im proves as m ore research is don e an d
th e bu sin ess case writers discover addition al valu e from th e im ple-
m en tation of a Big Data an alytics solu tion . Neverth eless, ROI is u su ally
th e m ost im portan t factor in determ in in g wh eth er a project will u lti-
m ately go forward. Th e determ in ation of ROI h as becom e on e of th e
prim ary reason s th at com pan ies an d n on pro t organ ization s en gage in
th e bu sin ess case process in th e rst place.

THE RISE O F BIG DA TA O PTIO N S

Teradata, IBM, HP, Oracle, an d m an y oth er com pan ies h ave been
offerin g terabyte-scale data wareh ou ses for m ore th an a decade, bu t
th ose offerin gs were tu n ed for processes in wh ich data wareh ou sin g was
th e prim ary goal. Today, data ten d to be collected an d stored in a wider
variety of form ats an d can in clu de stru ctu red, sem istru ctu red, an d
u n stru ctu red elem en ts, wh ich each ten d to h ave differen t storage
an d m an agem en t requ irem en ts. For Big Data an alytics, data m u st be
able to be processed in parallel across m u ltiple servers. Th is is a n ecessity,
given th e am ou n ts of in form ation bein g an alyzed.
In addition to h avin g exh au stively m ain tain ed tran saction al
data from databases an d carefu lly cu lled data residin g in data ware-
h ou ses, organ ization s are reapin g u n told am ou n ts of log data from
servers an d form s of m ach in e-gen erated data, cu stom er com m en ts
from in tern al an d extern al social n etworks, an d oth er sou rces of loose,
u n stru ctu red data.
Su ch data sets are growin g at an expon en tial rate, th an ks to
Moore s Law. Moore s Law states th at th e n u m ber of tran sistors th at

c03 22 October 2012; 17:55:52


26 BI G DATA ANAL YTI CS

can be placed on a processor wafer dou bles approxim ately every 18


m on th s. Each n ew gen eration of processors is twice as powerfu l as its
m ost recen t predecessor. Sim ilarly, th e power of n ew servers also
dou bles every 18 m on th s, wh ich m ean s th eir activities will gen erate
correspon din gly larger data sets.
Th e Big Data approach represen ts a m ajor sh ift in h ow data are
h an dled. In th e past, carefu lly cu lled data were piped th rou gh th e
n etwork to a data wareh ou se, wh ere th ey cou ld be fu rth er exam in ed.
However, as th e volu m e of data in creases, th e n etwork becom es a
bottlen eck. Th at is th e kin d of situ ation in wh ich a distribu ted plat-
form , su ch as Hadoop, com es in to play. Distribu ted system s allow th e
an alysis to occu r wh ere th e data reside.
Tradition al data system s are n ot able to h an dle Big Data effectively,
eith er becau se th ose system s are n ot design ed to h an dle th e variety of
today s data, wh ich ten d to h ave m u ch less stru ctu re, or becau se th e
data system s can n ot scale qu ickly an d affordably. Big Data an alytics
works very differen tly from tradition al BI, wh ich n orm ally relies on a
clean su bset of u ser data placed in a data wareh ou se to be qu eried in
a lim ited n u m ber of predeterm in ed ways.
Big Data takes a very differen t approach , in wh ich all of th e data an
organ ization gen erates are gath ered an d in teracted with . Th at allows
adm in istrators an d an alysts to worry abou t h ow to u se th e data later.
In th at sen se, Big Data solu tion s prove to be m ore scalable th an tra-
dition al databases an d data wareh ou ses.
To u n derstan d h ow th e option s arou n d Big Data h ave evolved, on e
m u st go back to th e birth of Hadoop an d th e dawn of th e Big Data
m ovem en t. Hadoop s roots can be traced back to a 2004 Google wh ite
paper th at described th e in frastru ctu re Google bu ilt to an alyze data on
m an y differen t servers, u sin g an in dexin g system called Bigtable.
Google kept Bigtable for in tern al u se, bu t Dou g Cu ttin g, a developer
wh o h ad already created th e Lu cen e an d Solr open sou rce search
en gin e, created an open sou rce version of Bigtable, n am in g th e tech -
n ology Hadoop after h is son s stu ffed eleph an t.
On e of Hadoop s rst adopters was Yah oo, wh ich dedicated large
am ou n ts of en gin eerin g work to re n e th e tech n ology arou n d 2006.
Yah oo s prim ary ch allen ge was to m ake sen se of th e vast am ou n t of
in terestin g data stored across separated system s. Un ifyin g th ose data

c03 22 October 2012; 17:55:52


BI G DATA AND THE BUSI NESS CASE 27

an d an alyzin g th em as a wh ole becam e a critical goal for Yah oo, an d


Hadoop tu rn ed ou t to be an ideal platform to m ake th at h appen . Today
Yah oo is on e of th e biggest u sers of Hadoop an d h as deployed it on
m ore th an 40,000 servers.
Th e com pan y u ses th e tech n ology for m u ltiple bu sin ess cases an d
an alytics ch ores. Yah oo s Hadoop clu sters h old m assive log les of
wh at stories an d section s u sers click on ; advertisem en t activity is also
stored, as are lists of all of th e con ten t an d articles Yah oo pu blish es. For
Yah oo, Hadoop h as proven to be well su ited for search in g for pattern s
in large sets of text.

BEYO N D HADO O P

An oth er n am e to becom e fam iliar with in th e Big Data realm is th e


Cassan dra database, a tech n ology th at can store 2 m illion colu m n s in a
sin gle row. Th at m akes Cassan dra ideal for appen din g m ore data on to
existin g u ser accou n ts with ou t kn owin g ah ead of tim e h ow th e data
sh ou ld be form atted.
Cassan dra s roots can also be traced to an on lin e service provider,
in th is case Facebook, wh ich n eeded a m assive distribu ted database
to power th e service s in box search . Like Yah oo, Facebook wan ted to
u se th e Google Bigtable arch itectu re, wh ich cou ld provide a colu m n -
an d row-orien ted database stru ctu re th at cou ld be spread on a large
n u m ber of n odes.
However, Bigtable h ad a seriou s lim itation : It u sed a m aster n ode
orien ted design . Bigtable depen ded on a sin gle n ode to coordin ate all
read-an d-write activities on all of th e n odes. Th is m ean t th at if th e
h ead n ode wen t down , th e wh ole system wou ld be u seless.
Cassan dra was bu ilt on a distribu ted arch itectu re called Dyn am o,
wh ich th e Am azon en gin eers wh o developed it described in a 2007
wh ite paper. Am azon u ses Dyn am o to keep track of wh at its m illion s
of on lin e cu stom ers are pu ttin g in th eir sh oppin g carts.
Dyn am o gave Cassan dra an advan tage over Bigtable, sin ce
Dyn am o is n ot depen den t on an y on e m aster n ode. An y n ode can
accept data for th e wh ole system , as well as an swer qu eries. Data are
replicated on m u ltiple h osts, creatin g resilien cy an d elim in atin g th e
sin gle poin t of failu re.

c03 22 October 2012; 17:55:52


28 BI G DATA ANAL YTI CS

WITH CHO ICE CO ME DECISIO N S

Man y of th e tools rst developed by on lin e service providers are


becom in g m ore available for en terprises as open sou rce software.
Th ese days, Big Data tools are bein g tested by a wider ran ge of orga-
n ization s, beyon d th e large on lin e service providers. Fin an cial in sti-
tu tion s, telecom m u n ication s, govern m en t agen cies, u tility com pan ies,
retail, an d en ergy com pan ies all are testin g Big Data system s.
Natu rally, m ore ch oices can m ake a decision h arder, wh ich is
perh aps on e of th e biggest ch allen ges associated with pu ttin g togeth er
a bu sin ess plan th at m eets project n eeds wh ile n ot in trodu cin g an y
addition al u n certain ty in to th e process. Ideally, a Big Data bu sin ess
plan will exem plify th e prim ary goal of su pportin g both lon g-term
strategic an alysis an d on e-off tran saction al an d beh avioral an alysis,
wh ich delivers both im m ediate ben e ts an d lon g-term ben e ts.
Wh ile Hadoop is applicable to th e m ajority of businesses, it is n ot
th e only gam e in town (at least wh en it com es to open sou rce imple-
m en tations). On ce an organ ization h as decided to leverage its h eaps of
m achine-generated and social n etworkin g data, setting u p th e infra-
stru ctu re will n ot be the biggest challen ge. Th e biggest challen ge m ay
come from deciding to go it alone with an open source or to turn to one
of the com mercial implementations of Big Data techn ology. Ven dors
su ch as Cloudera, Hortonworks, and MapR are com mercializing Big
Data techn ologies, m akin g them easier to deploy and m anage.
Add to th at th e growin g crop of Big Data on -dem an d services from
clou d services providers, an d th e decision process becom es th at m u ch
m ore com plex. Decision m akers will h ave to in vest in research an d
perform du e diligen ce to select th e proper platform an d im plem en ta-
tion m eth odology to m ake a bu sin ess plan su ccessfu l. However, m ost
of th at legwork can be don e du rin g th e bu sin ess plan developm en t
ph ase, wh en th e pros an d con s of th e variou s Big Data m eth odologies
can be weigh ed an d th en m easu red again st th e overall goals of th e
bu sin ess plan . Wh ich tech n ology will get th ere th e fastest, with th e
lowest cost, an d with ou t m ortgagin g fu tu re capabilities?

c03 22 October 2012; 17:55:52

You might also like