Professional Documents
Culture Documents
Course Overview
The course: what and how 0. Introduction I. Data Warehousing II. Decision Support and OLAP III. Data Mining IV. Looking Ahead
0. Introduction
Data Warehousing, OLAP and data mining: what and why (now)? Relation to OLTP A case study demos, labs
What product prom-otions have the biggest impact on revenue? What impact will new products/services have on revenue and margins?
Data, Data everywhere yet ... I cant find the data I need
data is scattered over the network many versions, subtle differences
Data
8
Evolution
60s: Batch reports
hard to find and analyze information inflexible and expensive, reprogram every new request
70s: Terminal-based DSS(Decision Support System and EIS (executive information systems)
still inflexible, not integrated with desktop tools
customer activity (1986-89) -- monthly summary customer activity detail (1987-89) customer activity detail (1990-91)
custid, activity date, amount, clerk id, order no custid, activity date, amount, line item no, order no
10
Definition of DSS
Decision support system is defined as a system that helps the decision makers in various levels to take decisions This system uses data, analytical models and user friendly software for taking decision
11
Definition of EIS
Executive information system(EIS) is defined as a system that helps the high level executives to take policy decisions. This system user higher level data, analytical models and user friendly software for taking decisions.
12
Evolution
80s: Desktop data access and analysis tools
query tools, spreadsheets, GUIs easier to use, but only access operational databases
90s: Data warehousing with integrated OLAP(online analytical processing)engines and tools
13
non-volatile
15
subject-oriented
A data warehouse is organized around the major subjects of the organization such as customer, supplier, product, sales, etc.., Data warehouse provides a simple and concise view around a particular subject by excluding data that are not useful to the decision support process.
16
Integrated:
A data warehouse is constructed by integrating multiple sources of data such as relational database, flat files and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attributes etc..,
17
Time Variant
Data warehouse maintains records of both historical and current data. So it can provide information in a historical perspective
18
Non Volatile
Once data warehouse is loaded with data, it is not possible to perform any modifications in the stored data.
19
Operational Database
Loans Credit Card Trust Customer
Data Warehouse
Vendor Product
Savings
Activity
21
Data Source
cleaning
Transformation
Data Warehouse
New Update
22
Collection data
Data warehousing collect data from various data sources such as relational data base, flat files and on-line records The collection of data are stored in database inside the warehouse. The type of data collection used depends on the architecture of the ware house.
23
Integration
Each and every data source uses from different schema. Data warehouse get data from different source with different schema and convert the data from various sources into a common integrated schema.
24
Star Schema
A single fact table and for each dimension one dimension table Does not capture hierarchies directly
T i e
date, custno, prodno, cityname, ...
c u s t
f a c t
p r o d
c i t y
25
Snowflake schema
Represent dimensional hierarchy directly by normalizing tables. Easy to maintain and saves storage
T i
e
date, custno, prodno, cityname, ...
c u s t
f a c t
p r o d c i t y
r e g i o 26 n
Data source delivers data into the database of data warehouse it should be corrected.
27
Update of data
Update on tables at the data sources must be sent to the data warehouse.
If the tables in data warehouse are same as sources, the updation is easy.
28
Summarizing data
The raw data generated by a transaction may be too large to store online. Therefore, we can use summary of transactions for easy querying.
29
Decision Support
Used to manage and control business Data is historical or point-in-time Optimized for inquiry rather than update
32
OLAP
DATA WAREHOUSE SQL Result OLAP SERVER Request
34
TYPES OF OLAP
MOLAP(MULTIDIMENSIONAL OLAP) ROLAP(RELATIONAL ROLAP)
35
Multi-dimensional Data
HeyI sold $100M worth of goods
Dimensions: Product, Region, Time Hierarchical summarization paths
W S N Juice Cola Milk Cream Toothpaste Soap 1 2 34 5 6 7
Product
Product Industry
Region Country
Time Year
Category
Region
Quarter
Product
City
Month
Week
36
Month
Office Day
ERP Systems
Purchased Data
Legacy Data
Metadata Repository
37
Data Acquisition
Warehouse data
Data Dictionary
Data Access
Information Directiory
Middleware
Design
Warehouse data
Management
38
Architecture of
39
Design Component
The data warehouse designer design the database of the data warehouse and the warehouse administrator manages the data warehouse. The designer and administrator use the design component to design and store data
40
Types of design
Bottom-up design Business value can be returned as quickly as the first data marts can be created Top-down design Atomic data, that is, data at the lowest level of detail, are stored in the data warehouse.
41
Hybrid design . Hybrid methodologies have evolved to take advantage of the fast turnaround time of bottom-up design and the enterprise-wide data consistency of top-down design.
42
43
Management Component
Administering data acquisition operation Managing backup copies of the data Recovering the lost data Providing security to the data stored in the data warehouse. Authorizing access to the data stored in the data warehouse.
44
45
46
47
Middleware Component
This components connect to the local databases. Analytical server used to analyze multidimensional data. Intelligent data warehousing middleware to control the access to the warehouse database.
49
Data mart
Data mart is a database that contains data needed for a small group of users for their own department needs.
50
This support the information requirement of a department in an organization This has small data model, shorter implementation, less data and some users.
51
Since each department has its own data mart, the departments can summarize, sort , select structure etc their own departments data. This will not confused with any other department. The department can do whatever DSS processing they want. The processing cost and storage are less that the data warehouse. The department can select a software for their data mart. it is powerful to fit their needs.
52
Enhance Operate
prototype
deploy
53
Product
Product Industry
Region Country
Period Year
Category
Region
Quarter
Product
City
Month
Week
54
Month
Office Day
The builder must forecast the usage of the warehouse by the users. The design should support accessing data with any meaningful values of the attributes. To build a good data warehouse data acquisition process must follow the steps given flow
extract the data from multiple heterogeneous sources Format the data for consistency within the warehouse. The data must be cleaned to ensure validity The data must be converted from relational ,object oriented ,hierarchy model to a multidimensional model. The data are loaded into the warehouse. Good monitoring tools are necessary to recover from 55 incorrect load.
56
57
Data Mining
Data mining is sorting through data to identify patterns and establish relationships.
59
60
Application Areas
Industry Finance Insurance Telecommunication Application Credit Card Analysis Claims, Fraud Analysis Call record analysis
Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
63
64
data
information
65
Data Warehouse
Databases
Flat Files
66
67
Data transformation The selected data are made for mining by performing aggregation operations Data mining Intelligent methods are applied to extract data patterns Pattern evaluation Identify the needed patterns Knowledge presentation present the mined knowledge to the user
68
70
Structuring/Modeling Issues
Less
Organizationally Structured
Data Warehouse
Data
74
Arrayed
77
Data Marts
Data Warehouse
78
True Warehouse
Data Sources
Data Warehouse
Data Marts
79
What Is OLAP?
Online Analytical Processing - coined by EF Codd in 1994 paper contracted by Arbor Software Generally synonymous with earlier terms such as Decisions Support, Business Intelligence, Executive Information System OLAP = Multidimensional Database MOLAP: Multidimensional OLAP (Arbor Essbase, Oracle Express) ROLAP: Relational OLAP (Informix MetaCube, Microstrategy DSS Agent)
81
Result: OLAP shifted from small vertical niche to mainstream DBMS category
82
Strengths of OLAP
It is a powerful visualization paradigm It provides fast, interactive response times
OLAP Is FASMI
Fast Analysis Shared Multidimensional Information
84
86
10
47 30
Cream 12
Product
Date
87
Household Telecomm
Video
Audio
Europe
Far East India Retail Direct Special
Sales Channel
88
Low-level Details
89
90
finance
manufacturing
91
Multidimensional Spreadsheets
Analysts need spreadsheets that support
pivot tables (cross-tabs) drill-down and roll-up slice and dice sort selections derived attributes
92
OLAP Operations
Roll Up
Drill Down
Single Cell
Multiple Cells
Slice
Dice
Prentice Hall
93
Database Layer
Presentation Layer
Generate SQL execution plans in the ROLAP engine to obtain OLAP functionality.
Database Layer
Presentation Layer
Store atomic data in a proprietary data structure (MDDB), pre-calculate as many outcomes as possible, obtain OLAP functionality via proprietary algorithms running against this data.
96