Professional Documents
Culture Documents
2
Abstract
Testing is an essential part of design life-cycle of any software product. Data
warehouse testing is very important in projects because users need to trust the
quality of information they access. The reasons for this are increase in Enterprise
Mergers & Acquisitions, Compliance Regulations, Increased focus on data by
Management and data driver decision makings. Data warehouse is a collection of
large amount of data which is used by management for making strategic decisions.
This paper takes a look at the different strategies to test a data warehouse
application. It attempts to suggest various approaches that could be beneficial while
testing the ETL process in a DW. A data warehouse is a critical business application
and defects in it results is business loss that cannot be accounted for. Here, we walk
you through some of the basic phases and strategies to minimize defects.
Focus is on different components in Data warehouse architecture, its design and
aligning test strategy accordingly. Data storage has become cheaper and easier.
Data driven decisions have proved to be more accurate. In this context, testing data
warehouse implementations are also of utmost significance. Organization decisions
depend entirely on the Enterprise data and data has to be Utmost quality. Successful
data ware housing will help Investors, Business leaders and Project Managers.
Addressed Audience:
Test Practitioners and Engineers, Software and Test Managers, QA Managers and
Development Managers as well as other professionals interested in building and
delivering better software.
Objectives:
1. Provide an overview Data warehouse testing
2. Data warehouse Life cycle & Testing activities for DW Projects
3. Test Data needs for DW Projects
4. Address challenges for DW Testing like voluminous data and heterogeneous
sources
5. Highlight Business Case Studies of successful DW Testing & Implementations
Contents
Table of Contents
1.
Introduction to Data Warehouse: .............................................................. 4
2.
Data Warehouse Testing Approach ............................................................ 5
3.
Methodological Framework ....................................................................... 6
4.
A Timeline for Testing .............................................................................. 7
5.
Types of Data warehouse Testing .............................................................. 7
6.
ETL Testing check points .......................................................................... 9
7.
KEY hot points in ETL Testing ................................................................. 10
8.
Database Testing VS. DW Testing ........................................................... 12
9.
Challenges in DW Testing ....................................................................... 13
10.
Tools: Data Warehouse Testing............................................................... 13
11.
Case Study: ......................................................................................... 14
The Client .................................................................................................. 14
The Solution ............................................................................................... 14
The Benefit ................................................................................................ 15
12.
Conclusion and Lessons Learnt ............................................................... 15
References ................................................................................................. 16
Author Bio ................................................................................................. 16
Appendix ................................................................................................... 16
The Data
Sources
Staging
Area
Operat
ional
Data
Source
]]]]]]]
Dimensional
Data Warehouse
Reporting
Report 1
Report 2
Report 3
Data
6
3. Methodological Framework
Below are the stages involved in the ETL process:
Requirements Elicitation: Requirements are elicited from users and
represented either in the informal or formal way.
Analysis and reconciliation: Data sources are inspected, normalized, and
integrated to obtain a reconciled schema.
Conceptual design: A conceptual schema for the data mart is designed
considering both user requirements and data available in the reconciled
schema.
Logical design: a logical schema for the data mart is obtained by properly
translating the conceptual schema.
Data staging design: ETL procedures are designed considering the source
schema, the reconciled schema and the data mart logical schema.
Physical design: This includes index selection, schema fragmentation, and all
other issues related to physical allocation.
Implementation: This includes implementation of ETL procedures and creation
of front-end reports.
7
4. A Timeline for Testing
From a methodological point of view, the three main phases of testing are:
Create a test plan. The test plan describes the tests that must be performed
and their expected coverage of the system requirements.
Prepare test cases. Test cases enable the implementation of the test plan by
detailing the testing steps together with their expect results. The reference
databases for testing should be prepared during this phase, and a wide,
comprehensive set of representative workloads should be defined.
Execute tests. A test execution log tracks each test along and its results.
8
i.
ii.
iii.
iv.
v.
10
7. KEY hot points in ETL Testing
There are several levels of testing that can be performed during data warehouse
testing and they should be defined as part of the testing strategy in different phases
of testing. Some examples include below:
Constraint Testing: During constraint testing, the objective is to validate unique
constraints, primary keys, foreign keys, indexes, and relationships. The test script
should include these validation points. Some ETL processes can be developed to
validate constraints during the loading of the warehouse. If the decision is made to
add constraint validation to the ETL process, the ETL code must validate all business
rules and relational data requirements.
Source to Target Counts: The objective of the count test scripts is to determine if
the record counts in the source match the record counts in the target. Some ETL
processes are capable of capturing record count information such as records read,
records written, records in error, etc.
Source to Target Data Validation: No ETL process is smart enough to perform
source to target field-to-field validation. This piece of the testing cycle is the most
labor intensive and requires the most thorough analysis of the data. There are a
variety of tests that can be performed during source to target validation. Below is a
list of tests those are best practices:
Threshold testing
Field-to-field testing
Initialization
Transformation and Business Rules: Tests to verify all possible outcomes of the
transformation rules, default values, straight moves and as specified in the Business
specification document. As a special mention, Boundary conditions must be tested on
the business rules.
Batch Sequence & Dependency Testing: ETLs in DW are essentially a sequence
of processes that execute in a particular sequence. Dependencies do exist among
various processes and the same are critical to maintain the integrity of the data.
Executing the sequences in a wrong order might result in inaccurate data in the
warehouse. The testing process must include at least 2 iterations of the endend
execution of the whole batch sequence. Data must be checked for its integrity during
this testing. The most common type of errors caused because of incorrect sequence
is the referential integrity failures, incorrect end-dating (if applicable) etc., reject
records etc.
Job restart Testing: In a real production environment, the ETL jobs/processes fail
because of number of reasons (say for ex: database related failures, connectivity
failures etc). The jobs can fail half/partly executed. A good design always allows for a
restart ability of the jobs from the failure point. Although this is more of a design
suggestion/approach, it is suggested that every ETL job is built and tested for restart
capability.
11
Error Handling: Understanding a script might fail during data validation, may
confirm the ETL process is working through process validation. During process
validation the testing team will work to identify additional data cleansing needs, as
well as identify consistent error patterns that could possibly be diverted by modifying
the ETL code.
Taking the time to modify the ETL process will need to be determined by the project
manager, development lead, and the business integrator. It is the responsibility of
the validation team to identify any and all records that seem suspect.
Once a record has been both data and process validated and the script has passed,
the ETL process is functioning correctly. Conversely, if suspect records have been
identified and documented during data validation those are not supported through
process validation, the ETL process is not functioning correctly. The development
team will need to become involved in finding the appropriate solution. For example,
during the execution of the source to target count scripts suspect counts are
identified (there are less records in the target table than in the source table). The
records that are missing should be captured during the error process and can be
found in the error log. If those records do not appear in the error log, the ETL
process is not functioning correctly and the development team needs to become
involved.
Negative Testing: Negative Testing checks whether the application fails and where
it should fail with invalid inputs and out of boundary scenarios and to check the
behavior of the application.
12
8. Database Testing VS. DW Testing
The difference between a Database and a Data Warehouse is not just a data volume.
ETL is the building block for a Data warehouse. Data warehouse testing thus should
be aligned with the data modeling underlying a data warehouse. Specific test
strategies should be designed for Extraction, Transformation and for the loading
modules.
Database Testing
Smaller in Scale
Usually used to test data
at the source instead of
testing using GUI
Usually Homogeneous
Data
Normalized Data
CRUD Operations
Consistent Data
13
9. Challenges in DW Testing
Voluminous Data from heterogeneous sources.
Data Quality not assured at source.
Difficult to estimate. Only volume might be available. No accurate picture of
the quality of the underlying data.
Business Knowledge. Organization wide Enterprise data knowledge may not
be feasible.
100% Data verification is not feasible. In such cases, the extraction,
transformation and loading components will be thoroughly tested to ensure all
types of data behave as expected.
Very High Cost of Quality. This is due to defect slippage will slip into
significantly high cost.
The Heterogeneous of data will be updated asynchronously.
Transaction level traceability will be difficult to attain in a Data warehouse.
10.
There are no standard guidelines on the tools that can be used for Data warehouse
testing. Majority of the teams will go with the tool that has been used for the data
warehouse implementation. A drawback of this approach is redundancy. The same
transformation logic need to be applied for DWH Implementation and also its
testing.
Tool selection also depends on the test strategy via exhaustive verification,
Sampling, Aggregation etc. Reusability & Scalability of the Test Suite being
developed is a very important factor to be considered.
14
11.
Case Study:
The Client
Our client is the investment advisory firm, managing assets for institutional and
private clients worldwide valued at $42.4 billion in assets and $34.2 billion in
Institutional/Private Client Assets.
The Challenge
Our client generated all financial information and reports via manual calculations
delivering their final reports in Excel format. This did not satisfy customer needs and
they threatened to take their business to competitors where they could get reports in
industry standard format. The challenge: Implement a Financial Reporting Solution
within a year. To achieve this, our client needed to build an infrastructure and a data
warehouse that would store data to generate ad hoc and canned reports in a timely
manner. But, the biggest challenge was how to validate the reports and ensure that
all the calculations were accurate. The solution was to find a Testing Partner with
extensive experience validating financial reports. This is exactly what SQA Solution
offered, experienced Financial Reporting QA Engineers with a strong financial
background and the technical skills to deliver a high-quality bullet-proof reporting
solution.
The Solution
The SQA Solution team began by assessing the work and offered a Free Rapid
Assessment to understand the scope of work, schedule, and the resourcing needs.
We assembled a team of eight: one QA Lead and seven Senior QA Engineers. Our QA
lead was responsible for the overall Test strategy, Test Planning, Daily Status
reporting, and day-to-day team management.
Our team assessed reporting requirements, data sources, and data target. We also
reviewed source to target maps and came up with 800+ test cases that focused on
ensuring:
Data Completeness ensuring all expected data is loaded.
Data Transformation ensuring all data is transformed correctly according to
business rules and/or design specifications.
Data Quality Ensuring ETL application correctly rejects, substitutes default
values, corrects or ignores and reports invalid data.
Performance & Scalability Ensuring data loads and queries perform within
expected time frames and that the technical architecture is scalable.
Reporting UI Testing Verify Reports User Interface
Data Calculations
Integration Testing
Compatibility Testing
User-acceptance Testing
Regression Testing
15
The Benefit
High quality data warehouse and reporting solution helped Client to retain
their customers and acquire new customers.
Achieved optimal performance for complex reports.
Great user experience.
Meet project deadlines.
12.
In this paper we proposed a comprehensive approach which adapts and extends the
testing methodologies proposed for general-purpose software to the peculiarities of
data ware- house projects. Our proposal builds on a set of tips and suggestions
coming from our direct experience on real projects, as well as from some interviews
we made to data warehouse practitioners. As a result, a set of relevant testing
activities were identified, classified, and framed within a reference design
methodology.
In order to experiment our approach on a case study, we are currently supporting a
professional design team engaged in a large data warehouse project, which will help
us better focus on relevant issues such as test coverage and test documentation. In
particular, to better validate our approach and understand its impact, we will apply it
to one out of two data marts developed in parallel, so as to assess the
extra-effort due to comprehensive testing on the one hand, the saving in postdeployment error correction activities and the gain in terms of better data and design
quality on the other.
16
References
Google
Wikipedia
Author Bio
Prasuna Potteti
| System Integration | Deloitte Consulting
Prasuna Potteti is a senior consultant with core competency in software testing, data
warehouse testing, test data management and testability. Familiar with many test
tools, she helps teams develop test plans, test cases and execution. She has helped
develop automated tests for functional, as well as performance tests for package
applications as well as custom development.
Prasuna has been actively involved in competency building in testing COE (Center of
Excellence), Deloitte.
Appendix
DW Data Warehouse
ETL- Extract, Transform & Load
ODS Operation Data Source
QA Quality Assurance
IT Information Technologies