You are on page 1of 17

Big Data Analytics: Turning Big Data into Big Money

By Frank Ohlhor st
Copyright 2013 by John Wiley & Sons, Inc.

CHAPTER

Best Practices for


Big Data Analytics

ike an y oth er tech n ology or process, th ere obviou sly are best
practices th at can be applied to th e problem s of Big Data. In
m ost cases, best practices u su ally arise from years of testin g an d
m easu rin g resu lts, givin g th em a solid fou n dation to bu ild on .
How ever, Big Data, as it is applied today, is relatively n ew, sh ort
circu itin g th e tried-an d-tru e m eth odology u sed in th e past to derive
best practices. Neverth eless, best practices are presen tin g th em selves
at a fairly accelerated rate, wh ich m ean s th at w e can still learn from
th e m istakes an d su ccesses of oth ers to de n e wh at works best an d
wh at doesn t.
Th e evolu tion ary aspect of Big Data ten ds to affect best practices,
so wh at m ay be best today m ay n ot n ecessarily be best tom orrow .
Th at said, th ere are still som e core proven tech n iqu es th at can be
applied to Big Data an alytics an d th at sh ou ld w ith stan d th e test of
tim e. With n ew term s, n ew skill sets, n ew produ cts, an d n ew provi-
ders, th e w orld of Big Data an alytics can seem u n fam iliar, bu t tried-
an d-tru e data m an agem en t best practices do h old u p well in th is still
em ergin g disciplin e.
As with an y bu sin ess in telligen ce (BI) an d/ or data wareh ou se
in itiative, it is critical to h ave a clear u n derstan din g of an organ ization s
data m an agem en t requ irem en ts an d a well-de n ed strategy before
ven tu rin g too far down th e Big Data an alytics path . Big Data an alytics

93

c09 22 October 2012; 18:0:37


94 BI G DATA ANAL YTI CS

is widely h yped, an d com pan ies in all sectors are bein g ooded with
n ew data sou rces an d ever larger am ou n ts of in form ation . Yet m akin g
a big in vestm en t to attack th e Big Data problem with ou t rst gu rin g
ou t h ow doin g so can really add valu e to th e bu sin ess is on e of th e
m ost seriou s m issteps for wou ld-be u sers.
Th e trick is to start from a bu sin ess perspective an d n ot get too
h u n g u p on th e tech n ology, wh ich m ay en tail m ediatin g con versation s
am on g th e ch ief in form ation of cer (CIO), th e data scien tists, an d
oth er bu sin esspeople to iden tify wh at th e bu sin ess objectives are
an d wh at valu e can be derived. De n in g exactly wh at data are avail-
able an d m appin g ou t h ow an organ ization can best leverage th e
resou rces is a key part of th at exercise.
CIOs, IT m an agers, an d BI an d data wareh ou se profession als n eed
to exam in e wh at data are bein g retain ed, aggregated, an d u tilized an d
com pare th at with wh at data are bein g th rown away. It is also critical
to con sider extern al data sou rces th at are cu rren tly n ot bein g tapped
bu t th at cou ld be a com pellin g addition to th e m ix. Even if com pan ies
aren t su re h ow an d wh en th ey plan to ju m p in to Big Data an alytics,
th ere are ben e ts to goin g th rou gh th is kin d of an evalu ation soon er
rath er th an later.
Begin n in g th e process of accu m u latin g data also m akes you better
prepared for th e even tu al leap to Big Data, even if you don t kn ow
wh at you are goin g to u se it for at th e ou tset. Th e trick is to start
accu m u latin g th e in form ation as soon as possible. Oth erwise th ere
m ay be a m issed opportu n ity becau se in form ation m ay fall th rou gh
th e cracks, an d you m ay n ot h ave th at rich h istory of in form ation to
draw on wh en Big Data en ters th e pictu re.

STA RT SMA LL WITH BIG DA TA

Wh en an alyzin g Big Data, it m akes sen se to de n e sm all, h igh -valu e


opportu n ities an d u se th ose as a startin g poin t. Ideally, th ose sm aller
tasks will bu ild th e expertise n eeded to deal with th e larger qu estion s
an organ ization m ay h ave for th e an alytics process. As com pan ies
expan d th e data sou rces an d types of in form ation th ey are lookin g
to an alyze, an d as th ey start to create th e all-im portan t an alytical
m odels th at can h elp th em u n cover pattern s an d correlation s in both

c09 22 October 2012; 18:0:37


BEST PRACTI CES FOR BI G DATA ANAL YTI CS 95

stru ctu red an d u n stru ctu red data, th ey n eed to be vigilan t abou t
h om in g in on th e n din gs th at are m ost im portan t to th eir stated
bu sin ess objectives.
It is critical to avoid situ ation s in wh ich you en d u p with a process
th at iden ti es n ews pattern s an d data relation sh ips th at offer little
valu e to th e bu sin ess process. Th at creates a dead spot in an an alytics
m atrix wh ere pattern s, th ou gh n ew, m ay n ot be relevan t to th e
qu estion s bein g asked.
Su ccessfu l Big Data projects ten d to start with very targeted goals
an d focu s on sm aller data sets. On ly th en can th at su ccess be bu ilt
u pon to create a tru e Big Data an alytics m eth odology th at starts sm all
an d grows after th e practice h as served th e en terprise rath er well,
allowin g valu e to be created with little u pfron t in vestm en t wh ile
preparin g th e com pan y for th e poten tial win dfall of in form ation th at
can be derived from an alytics.
Th at can be accom plish ed by startin g with sm all bites (i.e., takin g
in dividu al data ows an d m igratin g th ose in to differen t system s for
con verged processin g). Over tim e, th ose sm all bites will tu rn in to big
bites, an d Big Data will be born . Th e ability to scale will prove
im portan t as data collection in creases, th e scale of th e system will
n eed to grow to accom m odate th e data.

THIN KIN G BIG

Leveragin g open sou rce Hadoop tech n ologies an d em ergin g packaged


an alytics tools m akes an open sou rce en viron m en t m ore fam iliar to
bu sin ess an alysts train ed in u sin g SQL. Ultim ately, scale will becom e
th e prim ary factor wh en m appin g ou t a Big Data an alytics road m ap,
an d bu sin ess an alysts will n eed to esch ew th e ways of SQL to grasp th e
con cept of distribu ted platform s th at ru n on n odes an d clu sters.
It is critical to con sider wh at th e bu ildu p will look like. It can be
accom plish ed by determ in in g h ow m u ch data will n eed to be gath ered
six m on th s from n ow an d calcu latin g h ow m an y m ore servers m ay be
n eeded to h an dle it. You will also h ave to m ake su re th at th e software
is u p to th e task of scalin g. On e big m istake is to be ign oran t abou t
th e poten tial growth of th e solu tion an d th e poten tial popu larity of th e
solu tion on ce it is rolled in to produ ction .

c09 22 October 2012; 18:0:38


96 BI G DATA ANAL YTI CS

As analytics scales, data govern an ce becomes increasin gly important,


a situation that is n o different with Big Data than it is with any oth er
large-scale n etwork operation . The sam e can be said for information
govern an ce practices, wh ich is just as important today with Big Data as it
was yesterday with data warehou sin g. A critical caveat is to remember
th at information is a corporate asset and sh ou ld be treated as su ch .

A VO IDIN G WO RST PRA CTICES

Th ere are m an y poten tial reason s th at Big Data an alytics projects fall
sh ort of th eir goals an d expectation s, an d in som e cases it is better to
kn ow wh at not to do rath er th an kn owin g wh at to do. Th is leads u s
to th e idea of iden tifyin g worst practices, so th at you can avoid
m akin g th e sam e m istakes th at oth ers h ave m ade in th e past. It is
better to learn from th e errors of oth ers th an to m ake you r own . Som e
worst practices to look ou t for are th e followin g:

Many organi-
zations m ake th e m istake of assu min g that sim ply deploying
a data wareh ou sin g or BI system will solve critical busin ess
problem s and deliver value. However, IT as well as BI and ana-
lytics program m an agers get sold on th e techn ology h ype and
forget that busin ess value is th eir rst priority; data analysis
techn ology is ju st a tool u sed to generate that value. In stead of
blin dly adoptin g and deploying som ethin g, Big Data analytics
proponents rst n eed to determine the busin ess purposes that
wou ld be served by th e techn ology in order to establish a busi-
n ess case and on ly then choose and implement the right ana-
lytics tools for the job at h an d. Without a solid u nderstan ding of
busin ess requ irements, th e danger is that project team s will end
u p creating a Big Data disk farm that really isn t worth anyth in g
to the organization , earn in g th e teams an un wan ted spot in th e
data doghouse.

Building an analytics system, especially one involvin g Big Data,


is com plex and resource-in tensive. As a result, m any organ iza-
tion s h ope the software they deploy will be a m agic bullet that

c09 22 October 2012; 18:0:38


BEST PRACTI CES FOR BI G DATA ANAL YTI CS 97

instantly does it all for them. People should know better, of


cou rse, but th ey still h ave h ope. Software does h elp, sometimes
dram atically. But Big Data analytics is only as good as the data
bein g analyzed and th e analytical skills of th ose u sin g the tools.

In san ity is often de n ed as repeatin g a task an d expectin g dif-


feren t resu lts, an d th ere is som e m odicu m of in san ity in th e
world of an alytics. People forget th at tryin g wh at h as worked for
th em in th e past, even wh en th ey are con fron ted with a dif-
feren t situ ation , leads to failu re. In th e case of Big Data, som e
organ ization s assu m e th at big ju st m ean s m ore tran saction s an d
large data volu m es. It m ay, bu t m an y Big Data an alytics
in itiatives in volve u n stru ctu red an d sem istru ctu red in form ation
th at n eeds to be m an aged an d an alyzed in fu n dam en tally
differen t ways th an is th e case with th e stru ctu red data in
en terprise application s an d data wareh ou ses. As a resu lt, n ew
m eth ods an d tools m igh t be requ ired to captu re, clean se, store,
in tegrate, an d access at least som e of you r Big Data.
Som etim es en ter-
prises go to th e oth er extrem e an d th in k th at everyth in g is
differen t with Big Data an d th at th ey h ave to start from scratch .
Th is m istake can be even m ore fatal to a Big Data an alytics
project s su ccess th an th in kin g th at n oth in g is differen t. Ju st
becau se th e data you are lookin g to an alyze are stru ctu red
differen tly doesn t m ean th e fu n dam en tal laws of data m an -
agem en t h ave been rewritten .

A corollary to th e m iscon ception th at th e tech n ology can do it


all is th e belief th at all you n eed are IT staffers to im plem en t Big
Data an alytics software. First, in keepin g with th e th em e
m en tion ed earlier of gen eratin g bu sin ess valu e, an effective Big
Data an alytics program h as to in corporate exten sive bu sin ess
an d in du stry kn owledge in to both th e system design stage an d
on goin g operation s. Secon d, m an y organ ization s u n deresti-
m ate th e exten t of th e an alytical skills th at are n eeded. If Big
Data an alysis is on ly abou t bu ildin g reports an d dash boards,

c09 22 October 2012; 18:0:38


98 BI G DATA ANAL YTI CS

en terprises can probably ju st leverage th eir existin g BI exper-


tise. How ever, Big Data an alytics typically in volves m ore
advan ced processes, su ch as data m in in g an d predictive an a-
lytics. Th at requ ires an alytics profession als w ith statistical,
actu arial, an d oth er soph isticated skills, w h ich m igh t m ean
n ew h irin g for organ ization s th at are m akin g th eir rst forays
in to advan ced an alytics.
Too often,
com pan ies m easu re the success of Big Data analytics program s
m erely by the fact that data are bein g collected and then ana-
lyzed. In reality, collectin g and analyzing the data is just th e
begin ning. Analytics only produ ces busin ess value if it is incor-
porated in to busin ess processes, enablin g busin ess m an agers and
u sers to act on the n dings to improve organization al perfor-
m an ce and results. To be tru ly effective, an analytics program
also n eeds to inclu de a feedback loop for com mu nicating th e
success of action s taken as a resu lt of analytical n dings, followed
by a re n em en t of the analytical m odels based on th e busin ess
results.
Man y Big Data
analytics projects fall into a big trap: Proponents oversell h ow fast
they can deploy the systems and h ow signi can t the busin ess
ben e ts will be. Overpromising and u nderdeliverin g is the surest
way to get the business to walk away from any techn ology, and it
often sets back the u se of the particu lar technology within an
organ ization for a lon g tim e even if m an y other enterprises
are ach ieving success. In addition, wh en you set expectations
that the bene ts will com e easily and quickly, busin ess execu -
tives have a ten dency to u nderestimate the required level of
involvement and com mitm en t. And when a suf cient resou rce
com mitm en t isn t there, the expected ben e ts u su ally don t
com e easily or quickly and the project is labeled a failu re.

BA BY STEPS

It is said th at every jou rn ey begin s with th e rst step, an d th e jou rn ey


tow ard creatin g an effective Big Data an alytics h olds tru e to th at

c09 22 October 2012; 18:0:38


BEST PRACTI CES FOR BI G DATA ANAL YTI CS 99

axiom . However, it takes m ore th an on e step to reach a destin ation of


su ccess. Organ ization s em barkin g on Big Data an alytics program s
requ ire a stron g im plem en tation plan to m ake su re th at th e an alytics
process works for th em . Ch oosin g th e tech n ology th at w ill be u sed
is on ly h alf th e battle wh en preparin g for a Big Data in itiative. On ce a
com pan y iden ti es th e righ t database softw are an d an alytics tools an d
begin s to pu t th e tech n ology in frastru ctu re in place, it s ready to
m ove to th e n ext level an d develop a real strategy for su ccess.
Th e im portan ce of effective project m an agem en t processes to
creatin g a su ccessfu l Big Data an alytics program also can n ot be over-
stated. Th e followin g tips offer advice on steps th at bu sin esses sh ou ld
take to h elp en su re a sm ooth deploym en t:

By
their very n ature, Big Data analytics projects involve large data
sets. But th at doesn t m ean th at all of a compan y s data sou rces, or
all of the inform ation within a relevan t data sou rce, will n eed to be
analyzed. Organizations n eed to identify th e strategic data th at will
lead to valuable analytical insights. For instance, wh at com bin a-
tion of in form ation can pin poin t key custom er-reten tion factors?
Or wh at data are required to u ncover h idden pattern s in stock
m arket transaction s? Focusin g on a project s busin ess goals in the
plan ning stages can h elp an organization h om e in on the exact
analytics th at are required, after which it can and sh ou ld look
at the data n eeded to m eet those busin ess goals. In some cases, this
will indeed m ean inclu din g everyth ing. In oth er cases, though, it
m eans u sin g only a subset of th e Big Data on h an d.

Copin g with com plexity is th e key


aspect of m ost Big Data an alytics in itiatives. In order to get th e
righ t an alytical ou tpu ts, it is essen tial to in clu de bu sin ess-
focu sed data own ers in th e process to m ake su re th at all of
th e n ecessary bu sin ess ru les are iden ti ed in advan ce. On ce th e
ru les are docu m en ted, tech n ical staffers can assess h ow m u ch
com plexity th ey create an d th e work requ ired to tu rn th e data
in pu ts in to relevan t an d valu able n din gs. Th at leads in to th e
n ext ph ase of th e im plem en tation .

c09 22 October 2012; 18:0:38


100 BI G DATA ANAL YTI CS

Busin ess ru les are just th e rst step in


developin g effective Big Data analytics applications. Next, IT or
analytics profession als n eed to create the analytical queries and
algorithm s required to generate th e desired outpu ts. Bu t that
shouldn t be don e in a vacuu m. The better and m ore accu rate
that queries are in the rst place, the less redevelopment will
be required. Man y projects requ ire con tin ual reiterations becau se
of a lack of com mu nication between th e project team and
busin ess departments. Ongoin g comm u nication and collabora-
tion lead to a m uch smoother analytics developm en t process.
A su ccessfu l Big Data an alytics
in itiative requ ires on goin g atten tion an d u pdates in addition to
th e in itial developm en t work. Regu lar qu ery m ain ten an ce an d
keepin g on top of ch an ges in bu sin ess requ irem en ts are
im portan t, bu t th ey represen t on ly on e aspect of m an agin g an
an alytics program . As data volu m es con tin u e to in crease an d
bu sin ess u sers becom e m ore fam iliar with th e an alytics process,
m ore qu estion s will in evitably arise. Th e an alytics team m u st be
able to keep u p with th e addition al requ ests in a tim ely fash ion .
Also, on e of th e requ irem en ts wh en evalu atin g Big Data an a-
lytics h ardware an d software option s is assessin g th eir ability to
su pport iterative developm en t processes in dyn am ic bu sin ess
en viron m en ts. An an alytics system will retain its valu e over
tim e if it can adapt to ch an gin g requ irem en ts.
With in terest
growin g in self-service BI capabilities, it sh ou ldn t be sh ockin g
th at a focu s on en d u sers is a key factor in Big Data an alytics
program s. Havin g a robu st IT in frastru ctu re th at can h an dle
large data sets an d both stru ctu red an d u n stru ctu red in form a-
tion is im portan t, of cou rse. Bu t so is developin g a system th at is
u sable an d easy to in teract with , an d doin g so m ean s takin g th e
variou s n eeds of u sers in to accou n t. Differen t types of people
from sen ior execu tives to operation al workers, bu sin ess an a-
lysts, an d statistician s will be accessin g Big Data an alytics
application s in on e way or an oth er, an d th eir adoption of th e

c09 22 October 2012; 18:0:38


BEST PRACTI CES FOR BI G DATA ANAL YTI CS 101

tools will h elp to en su re overall project su ccess. Th at requ ires


differen t levels of in teractivity th at m atch u ser expectation s an d
th e am ou n t of experien ce th ey h ave with an alytics tools for
in stan ce, bu ildin g dash boards an d data visu alization s to presen t
n din gs in an easy-to-u n derstan d way to bu sin ess m an agers
an d workers wh o aren t in clin ed to ru n th eir own Big Data
an alytics qu eries.

Th ere s n o on e way to en su re Big Data an alytics su ccess. Bu t fol-


lowin g a set of fram eworks an d best practices, in clu din g th e tips ou tlin ed
h ere, can h elp organ ization s to keep th eir Big Data in itiatives on track.
Th e tech n ical details of a Big Data in stallation are qu ite in ten sive an d
n eed to be looked at an d con sidered in an in -depth m an n er. Th at isn t
en ou gh , th ou gh : Both th e tech n ical aspects an d th e bu sin ess factors
m u st be taken in to accou n t to m ake su re th at organ ization s get th e
desired ou tcom es from th eir Big Data an alytics in vestm en ts.

THE VA LUE O F A N O MA LIES

Th ere are people wh o believe th at an om alies are som eth in g best


ign ored wh en processin g Big Data, an d th ey h ave created soph isticated
scru bbin g program s to discard wh at is con sidered an an om aly. Th at
can be a sou n d practice wh en workin g with particu lar types of data,
sin ce an om alies can color th e resu lts. However, th ere are tim es wh en
an om alies prove to be m ore valu able th an th e rest of th e data in
a particu lar con text. Th e lesson to be learn ed is Don t discard data
with ou t fu rth er an alysis.
Take, for exam ple, th e world of h igh -en d n etw ork secu rity, wh ere
en cryption is th e n orm , access is logged, an d data are exam in ed in real
tim e. Here th e ability to iden tify som eth in g th at ts in to th e u n ch ar-
acteristic m ovem en t of data is of th e u tm ost im portan ce in oth er
words, secu rity problem s are detected by lookin g at an om alies. Th at
idea can be applied to alm ost an y disciplin e, ran gin g from n an cial
au ditin g to scien ti c in qu iry to detectin g cyber-th reats, all critical
services th at are based on iden tifyin g som eth in g ou t of th e ordin ary.
In th e world of Big Data, th at som eth in g ou t of th e ordin ary m ay
con stitu te a sin gle log en try ou t of m illion s, wh ich , on its own , m ay n ot

c09 22 October 2012; 18:0:38


102 BI G DATA ANAL YTI CS

be worth n oticin g. Bu t wh en an alyzed again st traf c, access, an d data


ow, th at sin gle en try m ay h ave u n told valu e an d can be a key piece of
foren sic in form ation . With com pu ter secu rity, seekin g an om alies
m akes a great deal of sen se. Neverth eless, m an y data scien tists are
relu ctan t to pu t m u ch stock in an om alies for oth er tasks.
An om alies can actu ally be th e h arbin gers of tren ds. Take on lin e
sh oppin g, for exam ple, in wh ich m an y bu yin g tren ds start off as iso-
lated an om alies created by early adopters of produ cts; th ese can th en
tran scen d in to a fad an d u ltim ately becom e a top produ ct. Th at type
of in form ation early tren din g can m ake or break a sales cycle.
Nowh ere is th is m ore tru e th an on Wall Street, wh ere an om alou s
stock trades can set off all sorts of alarm s an d create fren zies, all driven
by th e detection of a few sm all even ts u n covered in a pile of Big Data.
Given a large enough data set, anom alies com monly appear. One of
th e m ore interestin g aspects of anom aly value comes from th e realm
of social n etworkin g, where posts, tweets, and u pdates are thrown into
Big Data and th en analyzed. Here busin esses are lookin g at information
su ch as custom er sen tim en t, u sin g a h orizontal approach to com pare
anomalies across m an y differen t types of tim e series, the idea bein g that
differen t dim en sion s cou ld sh are sim ilar anomaly pattern s.
Retail shoppin g is a good example of that. A group of people m ay do
grocery shopping relatively con sisten tly throu ghout the year at Safeway,
Trader Joe s, or Whole Foods but then do holiday shopping at Best Bu y
and Toys R Us, leading to the expected year-en d increases. A com pany
like Apple m ight see a level pattern for m ost of the year, but when a n ew
iPh on e is released, the customers dutifully lin e u p along with the rest of
the world aroun d that beautiful structu re of glass and steel.
Th is in form ation tran slates to th e proverbial n eedle in a h aystack
th at n eeds to be brou gh t forth above oth er data elem en ts. It is th e
con cept th at for abou t 300 days of th e year, th e Apple store is a typical
electron ics retailer in term s of tem poral bu yin g pattern s (if n ot pro t
m argin s). However, th at all ch an ges wh en an an om alou s even t (su ch
as a n ew produ ct lau n ch ) tran slates in to two or th ree an n u al block-
bu ster even ts, an d it becom es th e differen tiatin g factor between an
Apple store an d oth er electron ics retailers. Com m on tren ds am on g
in du stries can be u sed to discou n t th e expected season al variation s in
order to focu s on th e tru ly u n iqu e occu rren ces.

c09 22 October 2012; 18:0:38


BEST PRACTI CES FOR BI G DATA ANAL YTI CS 103

For Twitter data, th ere are often big disparities am on g dim en sion s.
Hash tags are typically associated with tran sien t or irregu lar ph en om en a,
as opposed to, for in stan ce, th e m assive regu larity of tweets em an atin g
from a big cou n try. Becau se of th is greater degree of with in -dim en sion
sim ilarity, we sh ou ld treat th e dim en sion s separately. Th e dim en sion al
application of algorith m s can iden tify situ ation s in wh ich h ash tags an d
u ser n am es, rath er th an location s an d tim e zon es, dom in ate th e list of
an om alies, in dicatin g th at th ere is very little sim ilarity am on g th e item s
in each of th ese grou ps.
Given so m an y an om alies, m akin g sen se of th em becom es a dif-
cu lt task, creatin g th e followin g qu estion s: Wh at cou ld h ave cau sed
th e m assive u psu rges in th e oth erwise regu lar traf c? Wh at dom ain s
are in volved? Are URL sh orten ers an d Twitter live video stream in g
services in volved? Sortin g by th e m agn itu de of th e an om aly yields a
cu rsory an d excessively restricted view; correlation s of th e an om alies
often exist with in an d between dim en sion s. Th ere can be a great deal
of syn ergy am on g algorith m s, bu t it m ay take som e sort of clu sterin g
procedu re to u n cover th em .

EXPEDIEN CY VERSUS A CCURA CY

In th e past, Big Data an alytics u su ally in volved a com prom ise between
perform an ce an d accu racy. Th is situ ation was cau sed by th e fact th at
tech n ology h ad to deal with large data sets th at often requ ired h ou rs
or days to an alyze an d ru n th e appropriate algorith m s on . Hadoop
solved som e of th ese problem s by u sin g clu stered processin g, an d oth er
tech n ologies h ave been developed th at h ave boosted perform an ce. Yet
real-tim e an alytics h as been m ostly a dream for th e typical organ ization ,
wh ich h as been con strain ed by bu dgetary lim its for storage an d pro-
cessin g power two elem en ts th at Big Data devou rs at prodigiou s rates.
Th ese con strain ts created a situ ation in wh ich if you n eeded
an swers fast, you wou ld be forced to look at sm aller data sets, wh ich
cou ld lead to less accu rate resu lts. Accu racy, in con trast, often requ ired
th e opposite approach : workin g with larger data sets an d takin g m ore
processin g tim e.
As tech n ology an d in n ovation evolve, so do th e available option s.
Th e in du stry is addressin g th e speed-versu s-accu racy problem with

c09 22 October 2012; 18:0:38


104 BI G DATA ANAL YTI CS

in -m em ory processin g tech n ologies, in wh ich data are processed in


volatile m em ory in stead of directly on disk. Data sets are loaded in to a
h igh -speed cach e an d th e algorith m s are applied th ere, redu cin g all of
th e in pu t an d ou tpu t typically n eeded to read an d write to an d from
ph ysical disk drives.

IN -MEMO RY PRO CESSIN G

Organ ization s are realizin g th e valu e of an alyzed data an d are seekin g


ways to in crease th at valu e even fu rth er. For m an y, th e path to m ore
valu e com es in th e form of faster processin g. Discoverin g tren ds an d
applyin g algorith m s to process in form ation takes on addition al valu e if
th at an alysis can deliver real-tim e resu lts.
However, th e laten cy of disk-based clu sters an d wide area n etw ork
con n ection s m akes it dif cu lt to obtain in stan tan eou s resu lts from BI
solu tion s. Th e qu estion th en is wh eth er real-tim e processin g can
deliver en ou gh valu e to offset th e addition al expen ses of faster tech -
n ologies. To an swer th is, on e m u st determ in e wh at th e u ltim ate goal
of real-tim e processin g is. Is it to speed u p resu lts for a particu lar
bu sin ess process? Is it to m eet th e n eeds of a retail tran saction ? Is it to
gain a com petitive edge?
Th e reason s can be m an y, yet th e valu e gain ed is still dictated by
th e price feasibility of faster processin g tech n ologies. Th at is wh ere
in -m em ory processin g com es in to play. However, th ere are m an y
oth er factors th at will drive th e m ove toward in -m em ory processin g.
For exam ple, a recen t stu dy by th e Economist estim ated th at h u m an s
created abou t 150 exabytes of in form ation in th e year 2005. Alth ou gh
th at m ay sou n d like an expan sive am ou n t, it pales in com parison to
th e over 1,200 exabytes created in 2011.
Fu rth erm ore, th e research rm IDC (In tern ation al Data Corpora-
tion ) estim ates th at digital con ten t dou bles every 18 m on th s. Fu rth er
com plicatin g th e processin g of data is th e related growth of u n stru c-
tu red data. In fact, research ou tlet Gartn er projects th at as m u ch as
80 percen t of en terprise data will take on th e form of u n stru ctu red
elem en ts, span n in g tradition al an d n on tradition al sou rces.
Th e type of data, th e am ou n t of data, an d th e expedien cy of acces-
sin g th e data all in u en ce th e decision of wh eth er to u se in -m em ory

c09 22 October 2012; 18:0:38


BEST PRACTI CES FOR BI G DATA ANAL YTI CS 105

processin g. Neverth eless, th ese factors m igh t n ot h old back th e com in g


tide of advan ced in -m em ory processin g solu tion s sim ply becau se of th e
valu e th at in -m em ory processin g brin gs to bu sin esses.
To u n derstan d th e real-world advan tages of in -m em ory proces-
sin g, you h ave to look at h ow Big Data h as been dealt with to date
an d u n derstan d th e cu rren t ph ysical lim its of com pu tin g, wh ich are
dictated by th e speed of accessin g data from relation al databases,
processin g in stru ction s, an d all of th e oth er elem en ts requ ired to
process large data sets.
Usin g disk-based processin g m ean t th at com plex calcu lation s th at
in volved m u ltiple data sets or algorith m ic search processin g cou ld n ot
h appen in real tim e. Data scien tists wou ld h ave to wait a few h ou rs to
a few days for m ean in gfu l resu lts n ot th e best solu tion for fast
bu sin ess processes an d decision s.
Today bu sin esses are dem an din g faster resu lts th at can be u sed to
m ake qu icker decision s an d be u sed with tools th at h elp organ ization s
to access, an alyze, govern , an d sh are in form ation . All of th is brin gs
in creasin g valu e to Big Data.
Th e u se of in -m em ory tech n ology brin gs th at expedien cy to an a-
lytics, u ltim ately in creasin g th e valu e, wh ich is fu rth er accen tu ated by
th e fallin g prices of th e tech n ology. Th e availability an d capacity per
dollar of system m em ory h as in creased in th e last few years, leadin g to
a repostu lation of h ow large am ou n ts of data can be stored an d acted
u pon .
Fallin g prices an d in creased capacity h ave created an en viron m en t
wh ere en terprises can n ow store a prim ary database in silicon -based
m ain m em ory, resu ltin g in an expon en tial im provem en t in perfor-
m an ce an d en ablin g th e developm en t of com pletely n ew application s.
Ph ysical h ard drives are n o lon ger th e lim itin g elem en t for expedien cy
in processin g.
Wh en bu sin ess decision m akers are provided with in form ation an d
an alytics in stan tan eou sly, n ew in sigh ts can be developed an d bu sin ess
processes execu ted in ways n ever th ou gh t possible. In -m em ory pro-
cessin g sign als a sign i can t paradigm sh ift for IT operation s dealin g
with BI an d bu sin ess an alytics as th ey apply to large data sets.
In -m em ory processin g is poised to create a n ew era in bu sin ess
m an agem en t in wh ich m an agers can base th eir decision s on real-tim e

c09 22 October 2012; 18:0:38


106 BI G DATA ANAL YTI CS

an alyses of com plex bu sin ess data. Th e prim ary advan tages are as
follows:

Trem en dou s im provem en ts in data-processin g speed an d vol-


u m e, created by th e m u ltifold im provem en ts of data processin g,
wh ich can am ou n t to h u n dreds of tim es of in creased perfor-
m an ce com pared to older tech n ologies.
In -m em ory processin g th at can h an dle rapidly expan din g
volu m es of in form ation wh ile deliverin g access speeds th at are
th ou san ds of tim es faster th an th at of tradition al ph ysical disk
storage.
Better price-to-perform an ce ratios th an can displace th e overall
costs of in -m em ory processin g, com pared to disk-based pro-
cessin g, yet still offer real-tim e an alytics.
Th e ability to leverage th e sign i can t redu ction s in th e cost of
cen tral processin g u n its an d m em ory cost in recen t years,
com bin ed with m u lticore an d blade arch itectu res, to m odern ize
data operation s wh ile deliverin g m easu rable resu lts.

In -m em ory processin g offers th ese advan tages an d m an y oth ers by


sh iftin g th e an alytics process from a clu ster of h ard drives an d in de-
pen den t CPUs to a sin gle com preh en sive database th at can h an dle all
th e day-to-day tran saction s an d u pdates, as well as an alytical requ ests,
in real tim e.
In -m em ory com pu tin g tech n ology allows for th e processin g of
m assive qu an tities of tran saction al data in th e m ain m em ory of th e
server, th ereby providin g im m ediate resu lts from th e an alysis of th ese
tran saction s.
Sin ce in -m em ory tech n ology allows data to be accessed directly
from m em ory, qu ery resu lts com e back m u ch m ore qu ickly th an th ey
wou ld from a tradition al disk-based wareh ou se. Th e tim e it takes to
u pdate th e database is also sign i can tly redu ced, an d th e system can
h an dle m ore qu eries at a tim e.
With th is vast im provem en t in process speed, qu ery qu ality, an d
bu sin ess in sigh t, in -m em ory database m an agem en t system s prom ise
perform an ce th at is 10 to 20 tim es faster th an tradition al disk-based
m odels.

c09 22 October 2012; 18:0:38


BEST PRACTI CES FOR BI G DATA ANAL YTI CS 107

Th e elem en ts of in -m em ory com pu tin g are n ot n ew, bu t th ey h ave


n ow been developed to a poin t wh ere com m on adoption is possible.
Recen t im provem en ts in h ardware econ om ics an d in n ovation s in
software h ave n ow m ade it possible for m assive am ou n ts of data to be
sifted, correlated, an d u pdated in secon ds with in -m em ory tech n ology.
Tech n ological advan ces in m ain m em ory, m u lticore processin g, an d
data m an agem en t h ave com bin ed to deliver dram atic in creases in
perform an ce.
In -m em ory tech n ology prom ises im pressive ben e ts in m an y areas.
Th e m ost sign i can t are cost savin gs, en h an ced ef cien cy, an d greater
im m ediate visibility of a sort th at can en able im proved decision m akin g.
Bu sin esses of all sizes an d across all in du stries can ben e t from th e
cost savin gs obtain able th rou gh in -m em ory tech n ology. Database
m an agem en t cu rren tly accou n ts for m ore th an 25 percen t of m ost
com pan ies IT bu dgets. Sin ce in -m em ory databases u se h ardware sys-
tem s th at requ ire far less power th an tradition al database m an agem en t
system s, th ey dram atically redu ce h ardware an d m ain ten an ce costs.
In -m em ory databases also redu ce th e bu rden on a com pan y s
overall IT lan dscape, freein g u p resou rces previou sly devoted to
respon din g to requ ests for reports. An d sin ce in -m em ory solu tion s are
based on proven m atu re tech n ology, th e im plem en tation s are n on dis-
ru ptive, allowin g com pan ies to retu rn to operation s qu ickly an d easily.
An y com pan y with operation s th at depen d on frequ en t data
u pdates will be able to ru n m ore ef cien tly with in -m em ory tech -
n ology. Th e con version to in -m em ory tech n ology allows an en tire
tech n ological layer to be rem oved from a com pan y s IT arch itectu re,
redu cin g th e com plexity an d in frastru ctu re th at tradition al system s
requ ire. Th is redu ced com plexity allows data to be retrieved n early
in stan tan eou sly, m akin g all of th e team s in th e bu sin ess m ore ef cien t.
In -m em ory com pu tin g allows an y bu sin ess u ser to easily carve ou t
su bsets of BI for con ven ien t departm en tal u sage. Work grou ps can
operate au ton om ou sly with ou t affectin g th e workload im posed on
a cen tral data wareh ou se. An d, perh aps m ost im portan t, bu sin ess
u sers n o lon ger h ave to call for IT su pport to gain relevan t in sigh t in to
bu sin ess data.
Th ese perform an ce gain s also allow bu sin ess u sers on th e road to
retrieve m ore u sefu l in form ation via th eir m obile devices, an ability

c09 22 October 2012; 18:0:38


108 BI G DATA ANAL YTI CS

th at is in creasin gly im portan t as m ore bu sin esses in corporate m obile


tech n ologies in to th eir operation s.
With th at in m in d, it becom es easy to see h ow in -m em ory tech -
n ology allows organ ization s to com pile a com preh en sive overview of
th eir bu sin ess data, in stead of bein g lim ited to su bsets of data th at h ave
been com partm en talized in a data wareh ou se.
With th ose im provem en ts to database visibility, en terprises are
able to sh ift from after-even t an alysis (reactive) to real-tim e decision
m akin g (proactive) an d th en create bu sin ess m odels th at are predictive
rath er th an respon se based. More valu e can be realized by com bin in g
easy-to-u se an alytic solu tion s from th e start with th e an alytics plat-
form . Th is allows an yon e in th e organ ization to bu ild qu eries an d
dash boards with very little expertise, wh ich in tu rn h as th e poten tial to
create a pool of con ten t experts wh o, with ou t extern al su pport, can
becom e m ore proactive in th eir action s.
In -m em ory tech n ology fu rth er ben e ts en terprises becau se it
allows for greater speci city of in form ation , so th at th e data elem en ts
are person alized to both th e cu stom er an d th e bu sin ess u ser s in di-
vidu al n eeds. Th at allows a particu lar departm en t or lin e of bu sin ess to
self-service speci c n eeds wh ose resu lts can trickle u p or down th e
m an agem en t ch ain , affectin g accou n t execu tives, su pply ch ain m an -
agem en t, an d n an cial operation s.
Cu stom er team s can com bin e differen t sets of data qu ickly an d
easily to an alyze a cu stom er s past an d cu rren t bu sin ess con dition s
u sin g in -m em ory tech n ology from alm ost an y location , ran gin g from
th e of ce to th e road, on th eir m obile devices. Th is allows bu sin ess
u sers to in teract directly with cu stom ers u sin g th e m ost u p-to-date
in form ation ; it creates a collaborative situ ation in wh ich bu sin ess u sers
can in teract with th e data directly. Bu sin ess u sers can experim en t with
th e data in real tim e to create m ore in sigh tfu l sales an d m arketin g
cam paign s. Sales team s h ave in stan t access to th e in form ation th ey
n eed, leadin g to an en tirely n ew level of cu stom er in sigh t th at can
m axim ize reven u e growth by en ablin g m ore powerfu l u p-sellin g an d
cross-sellin g.
With traditional disk-based systems, data are u su ally processed
overn igh t, which m ay result in busin esses being late to react to important

c09 22 October 2012; 18:0:38


BEST PRACTI CES FOR BI G DATA ANAL YTI CS 109

supply alerts. In-memory techn ology can eliminate that problem by


givin g busin esses full visibility of th eir su pply-and-dem an d chain s on a
secon d-by-second basis. Busin esses are able to gain insight in real tim e,
allowin g them to react to changin g busin ess condition s. For exam ple,
busin esses m ay be able to create alerts, such as an early warnin g to restock
a speci c product, and can respond accordin gly.
Fin an cial con trollers face in creasin g ch allen ges brou gh t on by
in creased data volu m es, slow data processin g, delayed an alytics, an d
slow data-respon se tim es. Th ese ch allen ges can lim it th e con trollers
an alysis tim e fram es to several days rath er th an th e m ore u sefu l
m on th s or qu arters. Th is can lead to a variety of delays, particu larly at
th e closin g of n an cial periods. How ever, in -m em ory tech n ology,
large-volu m e data an alysis, an d a exible m odelin g en viron m en t can
resu lt in faster-closin g n an cial qu arters an d better visibility of
detailed n an ce data for exten ded periods.
In -m em ory tech n ology h as th e poten tial to h elp bu sin esses in an y
in du stry operate m ore ef cien tly, from con su m er produ cts an d
retailin g to m an u factu rin g an d n an cial services. Con su m er produ cts
com pan ies can u se in -m em ory tech n ology to m an age th eir su ppliers,
track an d trace produ cts, m an age prom otion s, provide su pport in
com plyin g with En viron m en tal Protection Agen cy stan dards, an d
perform an alyses on defective an d u n der-warran ty produ cts.
Retail com pan ies can m an age store operation s in m u ltiple location s,
con du ct poin t-of-sale an alytics, perform m u ltich an n el pricin g an alyses,
an d track dam aged, spoiled, an d retu rn ed produ cts. Man u factu rin g
organ ization s can u se in -m em ory tech n ology to en su re operation al
perform an ce m an agem en t, con du ct an alytics on produ ction an d
m ain ten an ce, an d perform real-tim e asset u tilization stu dies. Fin an cial
services com pan ies can con du ct h edge fu n d tradin g an alyses, su ch as
m an agin g clien t exposu res to cu rren cies, equ ities, derivatives, an d oth er
in stru m en ts. Usin g in form ation accessed from in -m em ory tech n ology,
th ey can con du ct real-tim e system atic risk m an agem en t an d reportin g
based on m arket tradin g exposu re.
As th e popu larity of Big Data an alytics grows, in -m em ory pro-
cessin g is goin g to becom e th e m ain stay for m an y bu sin esses lookin g
for a com petitive edge.

c09 22 October 2012; 18:0:38

You might also like