Professional Documents
Culture Documents
Copyright
This document is provided as-is. Information and views expressed throughout, including URLs and other Internet Web site references, may change without notice. You bear the risk of using the content. Some examples depicted here are provided for illustration purposes only and bare no real association or connection to any known entities or organizations and no such inference is intended nor should be supposed. This document does not provide you with any legal rights to any intellectual property. You may copy and use this document for your internal, reference purposes. 2012 Dimensional Strategies Inc. All rights reserved.
INTRODUCTION
Excel 2010 Powerful BI Tool for Analyzing Big Datasets
This paper discusses our approach, findings and recommendations to architecting a highthroughput, general purpose Business Intelligence solution using an I/O balanced approach to symmetric multiprocessing (SMP) with Microsoft SQL Server 2012 Enterprise at the core and Excel 2010 as the front-end business analytical tool. We describe our experiences and the design path we took with a relatively stable yet large dataset of about 76 billion rows. Our considerations focus on approaches to relatively highvolume data needs and does not specifically address high-velocity, high-variety or highly complex data challenges. This content is relevant to audiences including: CIOs, CTOs, IT planners, architects, DBAs, and business intelligence (BI) users with an interest in deploying SMP-based DW/BI capabilities that address big-dataset management with reporting and analysis leveraging the power and familiarity of features found in Microsoft Excel 2010.
Volume How Plentiful: Many organizations are simply floundering in the current ocean of ever-growing data with all its many forms and conceivable sizes. This data is also driving the creation of even more data data derived from data! How can an organization turn the data from over 256 million daily financial trades into useable, legible and actionable information? (The Toronto Stock Exchange, August 2012) Velocity How Fast: Sometimes a minute or possibly two is the decision window. For time-sensitive processes such as fraud detection or identifying known terrorist at a border crossing, big data must often be analyzed and queried as it comes pouring into a business via live transactions and interactions. How do you keep a country safe and open to travel and trade but closed to crime with the following kinds of stats and data-points: Number of travelers processed 24,513,463 , Number of Land vehicles processed (cars, trucks, busses) 9,304,652, number of Aircraft processed 90,685 (The Canada Border Services Agency CBSA, April to June 2012) Variety How Different/Varied: Big data is any type of data - structured data is about traditional relational databases like Microsoft Access or SQL Server. Unstructured data is anything else plain text, audio, video, Microsoft Office Documents, etc. Search and index all global web content (all text, all audio, PDFs, all documents, all videos, and every picture) then serve this mosaic of content up to over 212 million Americans in the course of one month! (Google, May 2012) Complexity How Difficult to manage and/or analyze: People relationships and interactions are amongst the most complex and actively morphing information domains to model, monitor and analyze. Who is Who?, Who Knows Who? and Who Does What? In order to answer these questions, organizations must resolve and relate all relevant sources
Excel 2010 Powerful BI Tool for Analyzing Big Datasets of data (complex, large volumes), and then present that information to the decision maker at the point of decision (rapid delivery). Payment services company leverage corporate data to conduct fraud detectionprocess allows them to deter more than US$37.7 million in fraudulent transactions. (MoneyGram International, 2011)
General Requirements The Data Warehouse must be optimized to address report and analysis needs In-house query tools and other off-the-shelf analytical tools must be able to integrate with the new data warehouse back-end The data must only be accessible by named and managed departments within the organization.
Excel 2010 Powerful BI Tool for Analyzing Big Datasets Ease of Use Self-service analytics end-users must be able to quickly conduct their own queries to unlock insights with interactive data exploration and graphing/charting The solution must provide query tools with an intuitive user interface for creating ad hoc queries and continuing analysis in Microsoft Excel Ability to export to Excel .XLS or .CSV formats Ability to visually categorize and select data (e.g. by industry or by client etc.)
APPROACHES
The following section describes the various approaches considered for the solution. Each approach is described along with its respective pros and cons. This development methodology of quickly standing up solutions and evaluating against the clients business requirements allowed us to quickly develop the solution was best suited to the clients needs.
y uer XQ MD
SSIS ETL
Reporting Services
L Re ist po rts
Ad-hoc Analysis
SQL Query
MDX Query
MD X
SQL 2012 Data Warehouse 2 years of data, ~76 billion rows, Rolling daily partitions SQL 2012 Analysis Services Tabular In-Memory ~2 TB compressed
&D
Excel PivotTables
AX Que ry
End-users
2 3
Approach 2 Column Store Index & SQL 2012 tabular in DirectQuery Mode
The second approach would be to still use an Analysis Services Tabular model but in DirectQuery mode. This allows the resulting queries to be pushed down to the SQL Database engine. When combined with a column store index on the fact tables, performance would be acceptable for the data volumes we were considering. The main drawback to this approach is that DirectQuery models only support the DAX query language, not MDX. This means that the only supported frontend tool at this time is Power View, which is an excellent tool for visualizing data, but is lacking in analytic functionality when compared to Excel Pivot Tables or traditional OLAP frontend tools. Without a powerful analytic frontend tool, this approach will not work for the vast majority of business users.
Approach 3: Column Store Index & SQL 2012 SSAS Multi-dimensional with ROLAP Fact Table
uery SQL Q
DX M
y er Qu
Reporting Services
L Re ist po rts
X MD
Ad-hoc Analysis
MDX Query
Source Data (Flat Files) ~170 million rows/day
1 2 3
End-users
Approach 3 Column Store Index & SQL 2012 SSAS Multi-dimensional with ROLAP fact table
Once Analysis Services Tabular was ruled-out as a solution for our client, we began considering traditional Analysis Services multidimensional models. An OLAP cube was created based on the star schema in the SQL Data Warehouse, at first using a ROLAP fact table in order to take advantage of the column-store index already being created to service list reporting through SSRS. The end-user experience using an OLAP cube was acceptable users would perform ad-hoc analysis through Excel Pivot tables, and drill-through to SSRS reports to access detail level data. Since Power View is not supported on multidimensional cubes, it is not available with this approach. For our client, Power View would be nice to have as a visualization tool but it was not considered a requirement for this solution.
Approach 3: Column Store Index & SQL 2012 SSAS Multi-dimensional with ROLAP Fact Table
uery SQL Q
DX M
y er Qu
Reporting Services
L Re ist po rts
X MD
Ad-hoc Analysis
MDX Query
Source Data (Flat Files) ~170 million rows/day
1 2 3
End-users
Approach 4 Column Store Index & SQL 2012 SSAS Multi-dimensional with MOLAP Fact table
In order to further improve performance for end-users, a traditional OLAP cube with MOLAP fact tables was also created, but this time partitioned by day. This allows the nightly ETL process to only load the most recent day of data, greatly reducing the overall run-time of the nightly process. Both the end-user experience and performance of the system was deemed acceptable and this was selected as the most desirable approach.
Approach 4: Column Store Index & SQL 2012 SSAS Multi-dimensional with MOLAP Fact Table
uery SQL Q
SQL Query SSIS ETL
DX M
X MD
y er Qu
Reporting Services
L Re ist po rts
Ad-hoc Analysis
Qu ery SQL 2012 Data Warehouse Column Store Index on Fact 2 years of data, ~76 billion rows, Rolling daily partitions
SQL 2012 Analysis Services OLAP, MOLAP Fact Table, Daily Partitions
SQL
MDX Query
2 3
End-users
CONCLUSIONS
Designing a system for analysis versus reporting requires different techniques to ensure optimum performance. The choice of tools also makes a difference, with new features such as Power View requiring a tabular model. Multiple solutions may be required to address the business requirements, but by keeping calculations and hierarchies within the relational model we can ensure that all methods of analysis are using the same data and will return the same result (single version of the truth)
Approaches #1 SQL 2012 Tabular in inMemory Mode BI Tool Supported All Microsoft tools (PowerPivot, Power View, PerformancePoint, SSRS etc.) can consume this model. DirectQuery only supports DAX capable query tools. Power View supported. Conclusions/Observations Tabular Model was about 2 TB in size. In the end the data volumes prohibited the use of this option. Excel does not support DAX queries. We had to abandon this option.
#2
Approaches #3 Column Store Index & SQL 2012 SSAS Multidimensional with ROLAP fact table Column Store Index & SQL 2012 SSAS Multidimensional with MOLAP Fact table
BI Tool Supported SSRS, Excel, PerformancePoint fully supported. Power View not supported SSRS, Excel, PerformancePoint fully supported. Power View not supported
Conclusions/Observations Uses extra disk space (compared to our MOLAP option). Power View does not consume (MultiDimensional) cubes. MOLAP cube allowed for a very granular partitioning strategy (by day) while still delivering very good query responses. Power View does not consume (MultiDimensional) cubes.
#4
REFERENCES
Fast Track Data Warehouse on SQL Server Web site http://www.microsoft.com/sqlserver/en/us/solutions-technologies/data-warehousing/fasttrack.aspx Fast Track Data Warehouse Reference Guide for SQL Server 2012 http://download.microsoft.com/download/D/2/0/D20E1C5F-72EA-4505-9F26FEF9550EFD44/Fast%20Track%20DW%20Reference%20Guide%20for%20SQL%202012.do cx Choosing a Tabular or Multidimensional Modeling Experience in SQL Server 2012 Analysis Services http://download.microsoft.com/download/D/2/0/D20E1C5F-72EA-4505-9F26FEF9550EFD44/Fast%20Track%20DW%20Reference%20Guide%20for%20SQL%202012.do cx All about PowerPivot for Microsoft Excel http://www.microsoft.com/en-us/bi/powerpivot.aspx SQL Server Web site http://www.microsoft.com/sqlserver/ How to Choose the Right Reporting and Analysis Tools to Suit Your Style http://download.microsoft.com/download/D/2/0/D20E1C5F-72EA-4505-9F26FEF9550EFD44/MicrosoftReportingToolChoices%2020120327%201643E3.docx SQL Server TechCenter http://technet.microsoft.com/en-us/sqlserver/ SQL Server DevCenter http://msdn.microsoft.com/en-us/sqlserver/