You are on page 1of 5

ETL Test Scenarios and Test Cases

Based on my experience I prepared maximum test scenarios and test cases to validate the ETL
process. I will keep on update this content. Thanks
Test Scenario

Test Cases

Mapping doc validation

Verify mapping doc whether corresponding ETL information is provided or


not. Change log should maintain in every mapping doc.
Define the default test strategy If mapping docs are missed out some
optional information. Ex: data types length etc

Structure validation

1. Validate the source and target table structure against corresponding


mapping doc.
2. Source data type and Target data type should be same.
3. Length of data types in both source and target should be equal.
4. Verify that data field types and formats are specified
5. Source data type length should not less than the target data type length.
6. Validate the name of columns in table against mapping doc.

Constraint Validation

Ensure the constraints are defined for specific table as expected.

Data Consistency Issues

1. The data type and length for a particular attribute may vary in files or
tables though the semantic definition is the same.
Example: Account number may be defined as: Number (9) in one field or
table and Varchar2(11) in another table
2. Misuse of Integrity Constraints: When referential integrity constraints are
misused, foreign key values may be left dangling or
inadvertently deleted.
Example: An account record is missing but dependent records are not
deleted.

Data Completeness
Issues

Ensures that all expected data is loaded in to target table


1. Compare records counts between source and target. Check for any
rejected records.
2. Check Data should not be truncated in the column of target table.
3. Check boundary value analysis (ex: only >=2008 year data has to load
into the target)
4. Comparing unique values of key fields between source data and data
loaded to the warehouse. This is a valuable technique that points out a
variety of possible data errors without doing a full validation on all fields.

Data Correctness Issues

1. Data that is misspelled or inaccurately recorded.


2. Null, non-unique, or out of range data may be stored when the integrity
constraints are disabled.
Example: The primary key constraint is disabled during an import function.
Data is entered into the existing data with null unique identifiers.

Data Transformation

1. Create a spread sheet of scenarios of input data and expected results and
validate these with the business customer. This is an excellent requirements
elicitation step during design and could also be used as part of testing.
2. Create test data that includes all scenarios. Utilize an ETL developer to
automate the entire process of populating data sets with the scenario
spread sheet to permit versatility and mobility for the reason that scenarios
are likely to change.
3. Utilize data profiling results to compare range and submission of values in
each field between target and source data.
4. Validate accurate processing of ETL generated fields; for example,
surrogate keys.
5. Validate that the data types within the warehouse are the same as was
specified in the data model or design.
6. Create data scenarios between tables that test referential integrity.
7. Validate parent-to-child relationships in the data. Create data scenarios
that test the management of orphaned child records.

Data Quality

1. Number check: if in the source format of numbering the columns are as


xx_30 but if the target is only 30 then it has to load not pre_fix(xx_). we
need to validate.
2. Date Check: They have to follow Date format and it should be same
across all the records. Standard format: yyyy-mm-dd etc..
3. Precision Check: Precision value should display as expected in the target
table.
Example: In source 19.123456 but in the target it should display as 19.123
or round of 20.
4. Data Check: Based on business logic, few record which does not meet
certain criteria should be filtered out.
Example: only record whose date_sid >=2008 and GLAccount != CM001
should only load in the target table.
5. Null Check: Few columns should display Null based on business
requirement.
Example: Termination Date column should display null unless & until if his
Active status Column is T or Deceased.
Note: Data cleanness will be decided during design phase only.

Null Validation

Verify the null values where "Not Null" specified for specified column.

Duplicate check

1. Needs to validate the unique key, primary key and any other column
should be unique as per the business requirements are having any duplicate
rows.
2. Check if any duplicate values exist in any column which is extracting from
multiple columns in source and combining into one column.
3. Some time as per the client requirements we needs ensure that no
duplicates in combination of multiple columns within target only.
Example: One policy holder can take multiple polices and multiple claims.
In this case we need to verify the CLAIM_NO, CLAIMANT_NO,
COVEREGE_NAME, EXPOSURE_TYPE, EXPOSURE_OPEN_DATE,
EXPOSURE_CLOSED_DATE, EXPOSURE_STATUS, PAYMENT

DATE Validation

Date values are using many areas in ETL development for:


1. To know the row creation date ex: CRT_TS
2. Identify active records as per the ETL development perspective Ex:
VLD_FROM, VLD_TO
3. Identify active records as per the business requirements perspective Ex:
CLM_EFCTV_T_TS, CLM_EFCTV_FROM_TS
4. Sometimes based on the date values the updates and inserts are
generated.
Possible Test scenarios to validate the Date values:
a. From_Date should not greater than To_Date
b. Format of date values should be proper.
c. Date values should not any junk values or null values

Complete Data Validation


(using minus and
intersect)

1. To validate the complete data set in source and target table minus query
is best solution.
2. We need to source minus target and target minus source.
3. If minus query returns any value those should be considered as
mismatching rows.
4. And also we needs to matching rows among source and target using
Intersect statement.
5. The count returned by intersect should match with individual counts of
source and target tables.
6. If minus query returns o rows and count intersect is less than source
count or target table count then we can considered as duplicate rows are
exists.

Some Useful test


scenarios

1. Verify that extraction process did not extract duplicate data from the
source (usually this happens in repeatable processes where at point zero we
need to extract all data from the source file, but the during the next
intervals we only need to capture the modified, and new rows.)
2. The QA team will maintain a set of SQL statements that are automatically
run at this stage to validate that no duplicate data have been extracted from
the source systems.

Data cleanness

Unnecessary columns should be deleted before loading into the staging


area.
Example2: If a column have name but it is taking extra space , we have to
trim space so before loading in the staging area with the help of
expression transformation space will be trimmed.
Example1: Suppose telephone number and STD code in different columns
and requirement says it should be in one column then with the help of
expression transformation we will concatenate the values in one column.

You might also like