You are on page 1of 10

Common ETL Testing issues Informatica.

1. Unique constraint violation error occurs.

Issue Details: While running a job, if job gets failed and in workflow session logs unique
constraint violation error occurs.
Trouble Shooting:
Check the records in the corresponding table and delete the violated record.
Check the run date mentioned in the transfer control table and try incrementing it.

2. Table or view does not exist issue.

Issue Details: While running a job if job gets failed and in session log we got an error like
table or view does not exist.
Trouble Shooting:
Check the profile file for the corresponding Autosys job.
Change the values accordingly for corresponding phase of testing respective to correct
servers.
Check the connection string in power center
Change the source and target Servers/DBs accordingly
If above two things are updated, Login to the corresponding server with BL ids used for the
phase of testing and verify the table and view there.

3. Error related to integration service.

Issue Details: While running a job if job gets failed and in session log we got an error
related to integration service.

Trouble Shooting:
Check the profile file for the corresponding Autosys job.
Check for the correct repository of Power center corresponding to job.
Verify that the Integration service should be updated correctly.
Open the Power Center Monitor and verify that the workflow folder should be available under
the same Integration service.

4. Batch launched id access issue.

Issue Details: While running a job if job gets failed and in logs, Batch launched id access
issue is occurred.
Trouble Shooting:
Check the access of the BL id for corresponding server/Database.
If not, get the access of the same Server/DB for Batch Launcher id.
If the id is accessible, check in the profile file if it has the right database/server name.

5. Record counts mismatch issue

Issue Details: If during record counts check in source and target if its mismatched in large
account.
Trouble Shooting:
Check the load type, if its full load this is an issue and raise the defect.
If its incremental load, check the source extract time and stop extract time, Change the
timestamp in the parameter table and re run the job.
Check the no of processed rows and the no of loaded rows in the load summary of the
session log.

6. Workflow and session logs generation issue.

Issue Details: If Autosys job got failed and no workflow or session logs got generated.
Trouble Shooting:
Check the JIL source of the job: In Command line of JIL source, Hyphen anddot . Should
be placed at appropriate position.
Check if the profile files and the connections strings are pointing to the right
databases/servers.

7. Data loading issue for Full load.

Issue Details: If after running Autosys job data is not loaded at target side and in logs in the
load summary section there is 0 extraction and transformation for Full load.
Trouble Shooting:
Check in the source table, there should be data in the source table. There is also a possibility
that the source is having the older data than the cut of data in the control table or the last
processed time stamp.
If data is loading from STG to DW side, in the main STG table data should be present.
Check the max load date in the DW table and process date in the stage table; if it is already
matching then increment the process date.

8. Data loading issue for Full load.

Issue Details: If after running AutoSys job data is not loaded at target side and in logs in
the load summary section there is 0 extraction and transformation for Incremental load.
Trouble Shooting:
Check the transfer parameter entries for source and stop extract time and then check the
same in the logs. Time period for extraction of data load to till data load should be corrected
in the transfer parameter table.

9. Incremental Job failure.

Issue Details: If Autosys Incremental Job got failed.


Trouble Shooting:
Check the transfer parameter table and check the parameter values corresponding to the
incremental Job.

10.Autosys Job got failed which dont have work flow.

Issue Details: If the AutoSys Job which dont have workflow got failed.
Trouble Shooting:
Check the AutoSys logs in .err file. If the job got failed file size of .err becomes none zero
byte and if the job got succeeded vice versa.

11. AutoSys Job failure during flat file data loading.

Issue Details: If the job got failed when loading data from flat files to table.
Trouble Shooting:
Check the AutoSys logs and catch the error from .err file.
If the issue is related with files for example: invalid file or any other issues found, run the
gunzip t filename command where the file is placed. It will return the exact error for that
file.

12. Data comparison issue in large extent.

Issue Details: During data comparison for source and target if large number of differences
are found in DB comparator result.

Trouble Shooting:
Check the metadata columns at the target end and remove those columns from the target
end query of DB comparator.
Check the order by in the both queries and modify the queries with proper order by clause by
using primary or unique key in both source and target end.
Remove the timestamps from the comparison rule as they are intercepted differently by the
Sybase and the oracle database.

13. Box Job failure.

Issue Details: If box job got failed.


Trouble Shooting:
Check all the sub jobs under box jobs.
Pick the failed job and check the session logs for that job and check for the above issues if
any.

14. Box Job running for long time issue.

Issue Details: If the box jobs keep running for long time.
Resolution:
Verify that there should not be any job under Box job should be on hold status.
Change the status of the on hold sub job to off hold and trigger the box job.
Put the failed sub-jobs on ice if its not a mandatory/critical dependent job

15. Workflow monitors issue.

Issue Details: If not able to see the workflow status after running job in workflow monitor,
getting error while opening.

Job Level

Are objects named correctly? Do the object names adhere to established standards? Do the
names make sense?

Are global variables validated and/or initialized? This should be coded as one of the first
steps in the job. If any of the global variables fails validation (NULL when it should not be
NULL) then raise an exception to fail the job.

Is there a Try/Catch coded at the job level? This is a requirement in most shops as it
provides a way for the job to handle failure notification.

Have all errors and warnings been resolved? I like to see the results of the validation dialog
showing that all errors and warnings have been resolved. Objects should not be migrated out
of development until all errors and warnings are resolved. In a job that has 200 warnings
how do I know that you have looked at every warning to ensure that it is important? There
are some warnings that will not be resolved. One of these is when a table is used as both the
source and target.

Catch Object

Does failure notification - using smtp_to() - exist?

Is the email address an alias? Avoid using developer email addresses for this. Instead, the email
should go to an alias that is for a group of email addresses. That way you dont have to change the
ETL when a developer or operator moves out of your group.

Is an exception (re) raised? This is a very common fault. Developers will add the try/catch
but forget to re-raise the exception or comment it out during unit testing. Without re-raising
the exception the job will complete successfully when it actually failed.

Variables

Are they declared in the correct scope? Dont declare variables as global that are only used
locally.

Do variable names adhere to naming standards?

Are variables the correct data type?

Calls (to child objects)

Is the call correct? Is a hard coded value acceptable?

While Loop

Is a loop appropriate?

Is the expression coded such that it uses a variable that can be incremented?

Does code exist within the loop that increments the variable used in the expression?

Conditional

Is the expression correct.

If nothing is included in the FALSE branch, is that acceptable?

Dataflow

Show the Display Optimized SQL output in the code review document - a quick look at this
can often tell you how the Dataflow will (or won't) run.

Source Table

Show the columns and the data types in the code review document.

If a Join Rank is set, is it needed and is it correct?

Is the cache type correct? A setting of No should be carefully scrutinized.

Is the Array fetch size reasonable?

SQL Transform
o

Use of this transform must be justified. Has the lost metadata been accounted for
with a dummy branch that uses Query transforms?

Show the result set columns and data types in the code review document.

Is the cache type correct? A setting of No should be carefully scrutinized.

Is the Array fetch size reasonable?

Validate the SQL text. Hard coded database/schema name and object owner should
be avoided.

Query Transform
o

Mapping Tab

Show both the input and output schemas. Reviewers may need to verify data
types

Show the mapping column. If a column's mapping does not fit within the
displayed limit then list that columns mapping separately.

Distinct Tab This does not have to be called out separately as long as the SELECT
tab is shown in the Mapping screen print. However, if Distinct is turned ON then the
developer should be prepared to explain why.

FROM Tab

Show all Input schemas

Show all Join pairs

WHERE Tab

Validate parentheses Unnecessary parentheses make it difficult to read the


code

Look for implicit data type conversions never allow implicit date
conversions such as 12-05-2010. Always use an explicit date conversion
instead, such as to_date(12-05-2010, MM-DD-YYYY).

Are functions used with indexed columns? Can this be avoided by using a
different expression?

GROUP BY Tab - Was a GROUP BY used when a DISTINCT should have been
used?

ORDER BY Tab

Is it necessary?

Does the ORDER BY match the Primary Key of the Table_Comparison?

Advanced Tab

Requires justification

This is rarely ever necessary

Table Comparison Transform

Show both the input and the output schemas

Is the Compare table correct?

Is the Generated key column set? Is it necessary? Setting this for a non SCD Type II table
adds unnecessary overhead.

Is the "Input contains duplicate keys" option set? This requires justification.

Is the "Detect deleted rows" option set? This requires justification. If the source is not a full
data set then this is a fault that must be addressed.

The Comparison method selected should be justified. Why did you use that particular
method?

Input primary key columns

Is the list correct?

Is the order correct? If Sorted input is used then it must match the sort order of the
incoming data.

Compare columns
o

Is the list correct?

Carefully scrutinize the list. Only those columns that determine if an UPDATE will
occur should be in the list. Columns like LAST_UPDATE_DATE should not be in the
list if they are set by the ETL.

PK columns should not also appear in the Compare column list.

Columns that are changed later in the Dataflow should not appear in the list.

Run as a separate process Requires justification

Filter Requires justification

History Preserving Transform

Show the entire transform in the document

Are hard coded values acceptable for the New record and Current flag items?

Compare columns Only Type II columns should be in the list.

Preserve delete rows as update rows Requires justification

Key Generation Transform

Requires justification Is the use of a sequence a better solution?

Is the table and column name correct?

If the Increment value is not 1 then justification must be provided.

Target Table (Template)

Requires justification Is a target table allowed in production?

Target Table

Do the number of columns in the Schema In and Schema Out match? If not justification must
be provided.

Does the Schema Out have a Primary Key assigned? If not, that should raise a red flag. The
Use input keys option must be Yes if the target is to handle update or delete operations.

Is the Rows per commit reasonable

Is the Delete data from table before loading allowed in production?

If the Column comparison option is not Compare by name then justification must be
provided.

If Update key columns, Auto correct load or Include in transaction are Yes
then justification must be provided.

Bulk Loader

To be used in Dataflows that produce only inserts!

Is the Truncate allowed in production? Some databases may allow


insert/update/delete but not truncate.

Is the "Rows per commit" and "Network Packet Size" reasonable?

Is the "Maximum rejects" correct?

Enable partitioning and Bulk Loader are mutually exclusive with Oracle.

May not be appropriate for delta loads against very large tables.

Any use of the Load Triggers, Pre-Load Commands or Post-Load Commands tab must
be justified.

Map Operation Transform

Are the settings correct? It is very common to find a Map Operation that (incorrectly) has the
default settings.

Reverse Pivot

Is the "Input data is grouped" option turned ON?


o

Yes: The incoming data must be sorted in the correct order

No: Changing it to ON presents a huge performance optimization opportunity.

Other items to include in the code review document

Database
o

Data model - You do have a data model, don't you?

DDL - Data types, indexes, etc.

Stored procedures and functions

Impact Analysis What other objects (outside the scope of the job being reviewed) are
affected?

Unit test cases Show me that you thoroughly tested the code.

Job Monitor results Show me what happened in the execution of the job.

You might also like