Data Warehousing Basic Concepts

DATA WAREHOUSING
Basics Concepts
People Making Technology Work
Agenda
Evolution of DWH Why should we consider Data Warehousing solutions ? Definition of Data Warehouse Characteristics of DWH Difference between DWs and OLTP DWH Life Cycle DWH Architecture Dimensional Data Modeling Star Schema Design Fact Table Fact Granularity Dimension Tables Snowflake Schema Design Important aspects of Star Schema & Snow Flake Schema Data Acquisition (ETL) ETL Concepts
Evolution of DWH
Traditional approaches to computer system design during 1980s
Not optimized for analysis and reporting Company wide reporting couldnt be supported from a single system For developing reports often required writing specific computer programs which was slow and expensive
Why should we consider Data Warehousing solutions ?
When users are requesting access to a large amount of
historical information for reporting purposes, you should

strongly consider a warehouse or mart. The user will benefit when the information is organized in an efficient manner for this type of access.
Def . Data Warehousing
DWH is type of relational data base system specially
designed for query analysis processing rather than

transactional processing.
The DWH systems are also called as Historical Dbs,

Read only Dbs, Integrated Dbs, Decision Supporting System, Executive info System, Business Info System.
Characteristics of DWH
Subject Oriented Non Volatile Integrated
Time Variant
Differences..
DWH database (OLAP)
Designed for analysis of business measures by category and attributes. Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table. Loaded with consistent, valid data; requires no real time validation. Supports few concurrent users relative to OLTP.
OLTP database
Designed for real time business operations.
Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table. Optimized for validation of incoming data during transactions; uses validation data tables. Supports thousands of concurrent users.
OLAP Database (OLAP)

Multidimensional Database Structures Index - Many
OLTP Database
Normalized Data Structures Index - Few
Joins - Few
Aggregated Data - More No. of users - Few Periodic update of data Huge volumes of data
Joins - Many
Aggregate Data - Few No. of users - More Data Modification More Small volumes of data
DWH Life Cycle
Business Analyst Data Modular
ETL Developer
Report Developer
Testing
DWH Architecture
Three common architectures are:
DWH Architecture (Basic)

DWH Architecture (With a staging area) DWH Architecture (With a staging area and data marts)
DWH Architecture (Basic)
DWH Architecture (with a staging area)
DWH Architecture (with a staging area and data marts)
Dimensional Data Modeling

To develop a Star Schema design a Data Modeler follows dimensional modeling design aspect. Dimensional modeling is a 3 stage process
Conceptual modeling Logical Modeling Physical Modeling
Before start implementing the schema design a Data modeler should understand the following process Understand the clients Business requirements Understand the grain of fact Designing of the Dimension tables Designing of the Fact tables
Example of Dimensional Data Model (Star Schema Design)
Fact Table
Contain numeric measures of the business Contains facts and connected to dimensions two types of columns facts or measures foreign keys to dimension tables May contain date-stamped data A fact table might contain either detail level facts or facts that have been aggregated
Steps in designing Fact Table

Identify a business process for analysis(like sales). Identify measures or facts (sales dollar). Identify dimensions for facts(product dimension, location dimension, time dimension, organization dimension). List the columns that describe each dimension.(region name, branch name, region name). Determine the lowest level of summary in a fact table(sales dollar).
Types of Facts (Measures)
Additive - Measures that can be added across all dimensions. Semi Additive - Measures that can be added across few dimensions and not with others. Non Additive - Measures that cannot be added across all dimensions.
In the example, sales fact table is connected to dimensions location, product, time and organization. Measure "Sales Dollar" in sales fact table can be added across all dimensions independently or in a combined manner which is explained below. Sales Dollar value for a particular product Sales Dollar value for a product in a location Sales Dollar value for a product in a year within a location Sales Dollar value for a product in a year within a location sold or serviced by an employee
Fact Granularity
A fact table maintains a numerical info It is defined as the level at which fact info/- is stored. The level is determined by dimensional table. Year? Quarter? Month? Week? Day?
Dimension Tables
Contain textual information that represents attributes of the business Contain relatively static data Are joined to fact table through a foreign key reference Are usually smaller than fact tables
Example of Location Dimension
Location Dimension
Location Dimension
Location Dimension Id
Country Name
State Name
County Name
City Name
Date Time Stamp
USA
New York
Shelby
Manhattan
1/1/2005 11:23:31 AM 1/1/2005 11:23:31 AM 1/1/2005 11:23:31 AM 1/1/2005 11:23:31 AM
USA
Florida
Jefferson
Panama City
USA
California
Montgomery
San Hose
USA
New Jersey
Hudson
Jersey City
Star Schema Design benefits
Easy for users to understand
Fast response to queries

Support multi dimensional analysis Supported by many front end tools
Snowflake Schema Design

Dimension table hierarchies are broken into simpler tables In few organizations, they try to normalize the dimension tables to save space Both Fact and Dimensional tables are Normalized Increases the number of joins and poor performance in retrieval of data May become large and unmanageable Degrades query performance
Example of Snowflake Schema
Important aspects of Star Schema & Snow Flake Schema
In a star schema every dimension will have a primary key. In a star schema, a dimension table will not have any parent table. Whereas in a snow flake schema, a dimension table will have one or more parent tables. Hierarchies for the dimensions are stored in the dimensional table itself in star schema. Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.
Data Acquisition
It is the process of extracting the relevant business info/- from the different source systems transforming the data from one format into an another format, integrating the data in to homogeneous format and loading the data in to a warehouse database. Data Extraction (E) Data Transformation (T) Data Loading (L)
Sample ETL Process Flow
ETL Process
The ETL Process having the following basic steps Is mapping the data between source systems and target database Is cleansing of source data in staging area Is transforming cleansed source data and then loading into the target system
Source System A database, application, file, or other storage facility from which the data in a data warehouse is derived. Mapping The definition of the relationship and data flow between source and target objects. Staging Area A place where data is processed before entering the warehouse. Cleansing The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.
Transformation The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources. Transportation The process of moving copied or transformed data from a source to a data warehouse. Target System A database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.
Thank You !!!

Data Warehousing Basic Concepts

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Warehousing Basic Concepts

Uploaded by

Copyright:

Available Formats

DATA WAREHOUSING

People Making Technology Work

Traditional approaches to computer system design during 1980s

Why should we consider Data Warehousing solutions ?

When users are requesting access to a large amount of

historical information for reporting purposes, you should

Def . Data Warehousing

DWH is type of relational data base system specially

designed for query analysis processing rather than

The DWH systems are also called as Historical Dbs,

Subject Oriented Non Volatile Integrated

OLAP Database (OLAP)

DWH Life Cycle

Business Analyst Data Modular

DWH Architecture (Basic)

DWH Architecture (Basic)

DWH Architecture (with a staging area)

DWH Architecture (with a staging area and data marts)

Dimensional Data Modeling

Conceptual modeling Logical Modeling Physical Modeling

Example of Dimensional Data Model (Star Schema Design)

Steps in designing Fact Table

Types of Facts (Measures)

Example of Location Dimension

Date Time Stamp

1/1/2005 11:23:31 AM 1/1/2005 11:23:31 AM 1/1/2005 11:23:31 AM 1/1/2005 11:23:31 AM

Star Schema Design benefits

Easy for users to understand

Fast response to queries

Snowflake Schema Design

Example of Snowflake Schema

Important aspects of Star Schema & Snow Flake Schema

Sample ETL Process Flow

Thank You !!!

You might also like