Professional Documents
Culture Documents
Kashan Jafri
Richard Mintz
Evan Ross
Marc Wright
Excel 2010 – Powerful BI Tool for Analyzing Big Datasets
Copyright
This document is provided “as-is”. Information and views expressed throughout, including
URLs and other Internet Web site references, may change without notice. You bear the risk
of using the content.
Some examples depicted here are provided for illustration purposes only and bare no real
association or connection to any known entities or organizations and no such inference is
intended nor should be supposed.
This document does not provide you with any legal rights to any intellectual property. You
may copy and use this document for your internal, reference purposes.
INTRODUCTION
Excel 2010 – Powerful BI Tool for Analyzing Big Datasets
This paper discusses our approach, findings and recommendations to architecting a high-
throughput, general purpose Business Intelligence solution using an I/O balanced approach
to symmetric multiprocessing (SMP) with Microsoft SQL Server 2012 Enterprise at the core
and Excel 2010 as the front-end business analytical tool.
We describe our experiences and the design path we took with a relatively stable yet large
dataset of about 76 billion rows. Our considerations focus on approaches to relatively high-
volume data needs — and does not specifically address high-velocity, high-variety or highly
complex data challenges.
This content is relevant to audiences including: CIOs, CTOs, IT planners, architects, DBAs,
and business intelligence (BI) users with an interest in deploying SMP-based DW/BI
capabilities that address big-dataset management with reporting and analysis leveraging
the power and familiarity of features found in Microsoft Excel 2010.
Big data can be described and classified using four key aspects:
Volume — How Plentiful: Many organizations are simply floundering in the current ocean
of ever-growing data with all its many forms and conceivable sizes. This data is also driving
the creation of even more data — data derived from data! How can an organization turn the
data from over 256 million daily financial trades into useable, legible and actionable
information? (The Toronto Stock Exchange, August 2012)
Velocity — How Fast: Sometimes a minute or possibly two is the decision window. For
time-sensitive processes such as fraud detection or identifying known terrorist at a border
crossing, big data must often be analyzed and queried as it comes pouring into a business
via live transactions and interactions. How do you keep a country safe and open to travel and
trade but closed to crime with the following kinds of stats and data-points: Number of travelers
processed — 24,513,463 , Number of Land vehicles processed (cars, trucks, busses) —
9,304,652, number of Aircraft processed — 90,685… (The Canada Border Services Agency –
CBSA, April to June 2012)
Variety — How Different/Varied: Big data is any type of data - structured data is about
traditional relational databases like Microsoft Access or SQL Server. Unstructured data is
anything else – plain text, audio, video, Microsoft Office Documents, etc. Search and index
all global web content (all text, all audio, PDFs, all documents, all videos, and every picture)
then serve this mosaic of content up to over 212 million Americans in the course of one
month! (Google, May 2012)
of data (complex, large volumes), and then present that information to the decision maker
at the point of decision (rapid delivery). Payment services company leverage corporate data
to conduct fraud detection—process allows them to deter more than US$37.7 million in
fraudulent transactions. (MoneyGram International, 2011)
Performance
The solution required us to efficiently process complex queries on large historical datasets.
Expected response times for most data queries are within minutes. The following table
provides three success measures that the client required us to meet:
General Requirements
The Data Warehouse must be optimized to address report and analysis needs
In-house query tools and other off-the-shelf analytical tools must be able to
integrate with the new data warehouse back-end
The data must only be accessible by named and managed departments within the
organization.
Excel 2010 – Powerful BI Tool for Analyzing Big Datasets
Ease of Use
Self-service analytics – end-users must be able to quickly conduct their own queries
to unlock insights with interactive data exploration and graphing/charting
The solution must provide query tools with an intuitive user interface for creating ad
hoc queries and continuing analysis in Microsoft Excel
Ability to export to Excel .XLS or .CSV formats
Ability to visually categorize and select data (e.g. by industry or by client etc.)
Several approaches were considered when architecting the SMP solution used at our client.
Because of the large data volumes (170 million rows per day), not all approaches would be
feasible on currently available hardware. In creating the best possible performance at a
reasonable cost, we decided to take a multi-tier approach with regards to the hardware and
software. The table below provides an overview of the reference architecture.
Excel 2010 – Powerful BI Tool for Analyzing Big Datasets
APPROACHES
The following section describes the various approaches considered for the solution. Each
approach is described along with its respective pros and cons. This development
methodology of quickly standing up solutions and evaluating against the clients business
requirements allowed us to quickly develop the solution was best suited to the client’s
needs.
y Reporting L
uer Re ist
po
XQ Services rts
MD
Ad-hoc
MDX Query
SSIS ETL SQL Query Analysis
MD Excel PivotTables s
X &D h
Ric ation
AX liz
SQL 2012 Data Warehouse SQL 2012 Analysis Que u a
ry Vis
2 years of data, ~76 billion rows, Services Tabular End-users
Source Data (Flat Files) Rolling daily partitions In-Memory
~170 million rows/day ~2 TB compressed 3
2
1
SharePoint 2010,
PerformancePoint
& PowerView
Approach 3: Column Store Index & SQL 2012 SSAS Multi-dimensional with ROLAP Fact Table
uery Reporting L
Re ist
SQL Q Services po
rts
y
er
Qu
SQL Query DX Ad-hoc
SSIS ETL M Analysis
ry
SQL
X Que Excel PivotTables
Qu MD s
ery & PowerPivot Rich ation
z li
ua
MDX Query Vis
End-users
Source Data (Flat Files)
~170 million rows/day
3
2
1
SharePoint 2010,
PerformancePoint
Approach 3: Column Store Index & SQL 2012 SSAS Multi-dimensional with ROLAP Fact Table
uery Reporting L
Re ist
SQL Q Services po
rts
y
er
Qu
SQL Query DX Ad-hoc
SSIS ETL M Analysis
ry
SQL
X Que Excel PivotTables
Qu MD s
ery & PowerPivot Rich ation
liz
ua
MDX Query Vis
End-users
Source Data (Flat Files)
~170 million rows/day
3
2
1
SharePoint 2010,
PerformancePoint
Excel 2010 – Powerful BI Tool for Analyzing Big Datasets
Approach 4: Column Store Index & SQL 2012 SSAS Multi-dimensional with MOLAP Fact Table
uery Reporting L
Re ist
SQL Q Services po
rts
y
er
Qu
SQL Query DX Ad-hoc
SSIS ETL M Analysis
ry
SQL
X Que Excel PivotTables
Qu MD s
ery & PowerPivot Rich ation
SQL 2012 Data Warehouse liz
ua
Column Store Index on Fact MDX Query Vis
2 years of data, ~76 billion rows, End-users
Source Data (Flat Files)
~170 million rows/day Rolling daily partitions
3
SQL 2012 Analysis 1
2
Services OLAP,
MOLAP Fact Table, SharePoint 2010,
Daily Partitions PerformancePoint
CONCLUSIONS
Designing a system for analysis versus reporting requires different techniques to ensure
optimum performance. The choice of tools also makes a difference, with new features such
as Power View requiring a tabular model. Multiple solutions may be required to address the
business requirements, but by keeping calculations and hierarchies within the relational
model we can ensure that all methods of analysis are using the same data and will return
the same result (single version of the truth)
#2 Column Store Index & SQL DirectQuery only supports Excel does not support DAX
2012 tabular in DirectQuery DAX capable query tools. queries. We had to abandon this
Mode Power View supported. option.
Excel 2010 – Powerful BI Tool for Analyzing Big Datasets
#4 Column Store Index & SQL SSRS, Excel, MOLAP cube allowed for a very
2012 SSAS Multi- PerformancePoint fully granular partitioning strategy (by
dimensional with MOLAP supported. Power View not day) while still delivering very
Fact table supported good query responses. Power
View does not consume (Multi-
Dimensional) cubes.
REFERENCES
Fast Track Data Warehouse on SQL Server Web site
http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/fast-
track.aspx
Fast Track Data Warehouse Reference Guide for SQL Server 2012
http://download.microsoft.com/download/D/2/0/D20E1C5F-72EA-4505-9F26-
FEF9550EFD44/Fast%20Track%20DW%20Reference%20Guide%20for%20SQL%202012.do
cx
Choosing a Tabular or Multidimensional Modeling Experience in SQL Server 2012
Analysis Services
http://download.microsoft.com/download/D/2/0/D20E1C5F-72EA-4505-9F26-
FEF9550EFD44/Fast%20Track%20DW%20Reference%20Guide%20for%20SQL%202012.do
cx
All about PowerPivot for Microsoft Excel
http://www.microsoft.com/en-us/bi/powerpivot.aspx
SQL Server Web site
http://www.microsoft.com/sqlserver/
How to Choose the Right Reporting and Analysis Tools to Suit Your Style
http://download.microsoft.com/download/D/2/0/D20E1C5F-72EA-4505-9F26-
FEF9550EFD44/MicrosoftReportingToolChoices%2020120327%201643E3.docx
SQL Server TechCenter
http://technet.microsoft.com/en-us/sqlserver/
SQL Server DevCenter
http://msdn.microsoft.com/en-us/sqlserver/