You are on page 1of 81

1. What is Data warehouse?

According to Bill nmon, known as father of Data warehousing. A Data warehouse is a


subject oriented, integrated ,time variant, non volatile collection of data in support of
management's decision making process.
2. What are the types of data warehouses?

There are three types of data warehouses
Enterprise Data Warehouse
ODS (operational data store)
Data Mart
3. What is Data mart?
A data mart is a subset of data warehouse that is designed for a particular line of
business, such as sales, marketing, or finance. n a dependent data mart, data can be derived
from an enterprise wide data warehouse. n an independent data mart can be collected directly
from sources.
4. What is star schema?
A star schema is the simplest form of data warehouse schema that consists of one or
more dimensional and fact tables.
5. What is snow flake schema?
A Snowflake schema is nothing but one Fact table which is connected to a number of
dimension tables, The snowflake and star schema are methods of storing data which are
multidimensional in nature.
Overview:Dataware Housing nformatica nterview Questions and Solutions
What is the difference between static cache and dynamic cache?
Ans: n case of dynamic cache, when we are inserting a new row it checks the lookup cache to
see if it exists, if not inserts it into the target as well as
the cache but in case of static cache the new row is written only in the target and not the lookup
cache.
The lookup cache remains static and does not change during the session but incase of dynamic
cache the server inserts, updates in the cache during session.
How do we join two tables without joiner or SQL override?
Ans: We can join the tables using lookup tansformation and making a cartesian product.
What are pre-session and post-session options?
Ans: These are shell commands that nformatica server performs before running the session. We
should have permission and privileges to run the options in nformatica.
How do we copy mapping?
Ans: The steps to copy mapping are mentioned below: 1.) First, open navigator window 2.) Open
the mapping in repository 3.) Click copy
What are the tasks done by nformatica server?
Ans: Tasks done by nformatica server are: a.) Managing the sessions b.) Running the workflows
c.) Tuning performance d.) Scheduling of sessions and workflows
What is the difference between source qualifier and joiner transformation?
Ans: We need matching keys in source qualifier which we do not need in joiner transformation.
What are mapping parameters and mapping variables?
Ans: Mapping parameters are parameters which are constant through out the session. They are
specified before running the session. Unlike parameters, variables change through out the
session.
Why do we use partitioning the session in nformatica?
Ans: Partitioning achieves the session performance by reducing the time period of reading the
source and loading the data into target.
To achieve the session partition what are the necessary tasks you have to do?
Ans: Configure the session to partition source data.
nstall the nformatica server on a machine with multiple CPU's.
Can you copy the batches?
Ans: NO
How many number of sessions that you can create in a batch?
Ans: Any number of sessions.
When the nformatica server marks that a batch is failed?
Ans: f one of session is configured to run if previous completes and that previous session fails.
What is a command that used to run a batch?
Ans: pmcmd is used to start a batch.
What are the different options used to configure the sequential batches?
Ans:
Two options
Run the session only if previous session completes successfully.
Always runs the session.
How can you recognize whether or not the newly added rows in the source are getting inserted in
the target?
Ans: n the Type2 mapping we have three options to recognize the newly added rows Version
number Flag value, Effective date Range
Can you start a session within a batch individually?
Ans: Let me tell you that one can start the sessions in this way in the case of sequential batches
only. n case of the concurrent batches it is virtually impossible.
- See more at: http://www.aired.in/2010/02/informatica-etl-interview-
questions.html#sthash.m3gASgcB.dpuf
Can you copy the session to a different folder or repository?
Yes. By using copy session wizard, you can copy a session in a different folder or repository.
What is a command that used to run a batch?
pmcmd is used to start a batch.

What are the types of mapping in Getting Started Wizard?

Simple Pass through mapping

Slowly growing target mapping
What are the different types of Type2 slowly changing dimensions?

There are three types of slowly changing dimensions

SCD with versioning

SCD with flags

SCD with Date

What are the different threads in DTM process?

Master thread

Mapping thread

Reader thread

Writer thread

Pre and post session threads

What are the active and passive transformations?
An active transformation can change the number of rows that pass through it.
A passive transformation does not change the number of rows that pass through it.
What are the scheduling options to run a session?
Different options of scheduling are

Run only on demand: nformatica server runs the session only when user starts session explicitly

Run once: nformatica server runs the session only once at a specified date and time.

Run every: nformatica server runs the session at regular intervals as you configured.

Customized repeat: nformatica server runs the session at the date and time specified in the
repeat dialog box.
How can we store previous session logs?
Just run the session in time stamp mode then automatically session log will not overwrite current
session log.
What is the difference between Mapping and Mapplet?
Mapping is the collection of Source Definition, Target Definition, Transformation(s) and/or
Mapplet.
Mapplet is a set of reusable transformations for applying same business logic.
What is the use of Repository Manager?
Repository Manager is used to manage folders, view Source/Target dependencies, Mapping
Dependencies, adding and removing repositories, Versioning etc.

What are the various Tasks in nformatica Workflow Manager?

The various Tasks in nformatica are

Assignment Task

Command Task

Control Task

Decision Task

E-mail Task

Event Raise Task

Event Wait Task

Session Task

Timer Task

Link Task

What is the use of Source Analyzer?
Source Analyzer is used to create source definition.
What is the use of Target Designer?

Target Designer is used to create Target Definition.

What is Target Designer called prior to nformatica 8.6?

Warehouse Designer

Can we use Reusable Sequence Generator transformation in Mapplet?

No

Why Cache? Explain with an example?

f there are x lot number of records with y number of columns from source data and we need to
extract z number of columns only (very less) then the cache stores those columns for respective
records in the $PMCACHEDR of nformatica Server so that we don't need to extract each record
from database and load into nformatica. Doing this increases the performance of the system.
What is the use of Shared Object?
t is easier to explain this using an example.
Suppose there is an object which is used by many users in the company.

f the object undergoes a change, this change has to be updated to each and every user. nstead,
if the object is made as shared, then the update has to be done to the object and all other users
get the update.

What is Dual table?

Dual table is a table that is created by oracle along with data dictionary. t gives output of exactly
one column name dummy and one record 'x'.

Select * from Dual;

-----------

DUMMY

X

Can we install Power Center nformatica 8.X version on Windows 7?

No we can't install nformatica on Power Center. We can nstall nformatica 9.x on Windows 7.

What are the types of variables in nformatica?

There are three types of variables in nformatica

Predefined variable represented by $

User defined variable represented with $$

System variable denoted by $$$

Difference between nformatica normal join and Oracle Equi join?

Equi join in Oracle is performed on oracle sources (relational sources) while nformatica Equi
joins can be performed on non-relational sources too (oracle and flat file etc).

What is degenerate dimension?

Dimension which has no dimension table of its own and is derived from the fact table.

What is requirements gathering?

t is carried out by Business Analyst. t is nothing but interacting with end users and getting to
know what his requirements are. Based on his requirements, the rest of the phases Analysis,
Design, mplementation and Testing and finally Maintenance are carried on.

What is Junk dimension?

The dimension that is formed by lumping of smaller dimensions is called Junk dimension.
What is Staging Area?

Staging Area is indeed a database where data from different source systems are brought together
and this database acts as an input to Data Cleansing.

What are the types of joins in nformatica and in Oracle?

There are four types of joins in oracle

equi join

non equi join

self join

outer join
Joins in informatica
master join (right outer join)

detailed join (left outer join)

normal join

What is the file extension or format of files for the nformatica Objects like sessions, mappings
etc. in Repository?
The format of files for nformatica Objects in Repository is XML
Where can we find Versioning in nformatica? What happens if Versioning is turned off?
n nformatica, we can find Versioning in Repository Manager. f Versioning is turned off, we will
not be able to track the changes for the respective Sessions/Mappings/Workflows.
What is tracing level? What are the types of tracing level?
Tracing level is the amount of information that nformatica server writes into a log file.
Types of tracing level
Normal
Terse
Verbose init
Verbose data
n joiner transformation, we take the table with lesser number of rows as master while the more
number of rows as detailed. Why?
n joiner, each and every row of the master is compared with every row of the detailed and so, the
less number of rows in master, the less is the number of iterations and so better is the
performance of the system.
What are all the databases the nformatica server on windows can connect to?
nformatica server on windows can connect to
SQL server database
Oracle
Sybase
Teradata
MS Access
MS Excel
nformix
DB2
What are the databases the nformatica server on UNX can connect to?
nformatica on UNX can connect to
Sybase
Teradata
nformix
DB2
Oracle
What is an overview window?
t is a window where you can see all the transformations in a mapping.
n how many ways we can update a source definition?
Two ways
We can reimport the source definition
We can edit the source definition
What is mapping?
Mapping is nothing but data flow between source and target.
What is a session?
Session is a set of instructions that tells the nformatica server when to and how to move the data
from source to targets.
What are the different things we can do using PMCMD command?
We can START, STOP and ABORT session using PMCMD command.
What are the differences between nformatica 6 and 7?
nformatica 7 has
We can use pmcmd command
Union and custom transformation
Version controlling
What is the use of bitmap indexes?
Bitmap indexes are used to join large fact tables to smaller dimension tables.
How can we delete duplicate rows from flat files?
We can make use of sorter transformation and select distinct option.
f a session fails after loading 10000 records into the target how can we start loading into the
target from the 10001th record?
We can run the session with the recovery strategy mode.
What is the limit of joiner transformations?
We cannot use sequence generator or update strategy transformations before or after joiner
transformations.
How does server recognize the source and target databases?
By using ODBC if they are relational, FTP if they are flat files.
What is rank index in a group?
Power Center Designer automatically creates a RANK NDEX port while using Rank
transformation. The purpose of this RANK NDEX port is to store the ranking for the column(s) we
are interested in.
What are the constants or flags for each database operation and their numeric equivalent in
Update Strategy?
nsert DD_NSERT 0
Update DD_UPDATE 1
Delete DD_DELETE 2
Reject DD_REJECT 3
Can you generate reports using nformatica?
No, we cannot generate reports using nformatica, it is just an ETL tool but we can generate
metadata reports.
(SCD-2 by RamaSurReddy) Yes, Using nformatica Data Analyzer Tool we can generate reports.
Can you start batches within a batch?
No, you can't
What are the types of groups in Router Transformation?
There are three types of groups in Router Transformation namely
nput group
Otput group
Default group
What are batches? What are the types of batches?
Batches provide a way to group sessions for either sequential or parallel execution by nformatica
server.
Concurrent batches which run at the same time
Sequential batches which run one after the other
What is Power Center Repository?
Power Center Repository allows you to share metadata among different repositories and to
create a datamart domain.
What r the types of metadata that stores in repository?
Source definition
Target definition
Mappings
Mapplet
Transformations
Differences between dynamic cache and static cache
n case of dynamic cache, if we want to insert a new row, then it will first looks in the lookup
cache and if the row is not present in the cache, it inserts the row into the cache as well as target
but in case of static cache, its stores only into target table and not in cache.
What is the use of source qualifier?
Source qualifier is used to convert different data types to nformatica compatible data types.
What is page code compatibility?
t is nothing but compatibility of code for maintaining data accuracy. t comes into picture when
data is in different languages.
What are Synonyms?
Synonyms are alternative names for database objects such as tables, views, stored procedures
etc.
Syntax
Create [replace] synonym for [schema.]object_name
Types of Lookup Cache?
Static cache
Dynamic cache
Persistent cache
Recache from database
Shared cache
What are various stages of SDLC?
Requirements Gathering, Analysis, Design, mplementation and Testing and Maintenance.
What is Star Schema?
Star Schema is a simplest form of schema which has one fact table and at least one dimension
table. The dimensions here are denormalized.
What is Fact table?
t is a centralized table in Star Schema. Fact table has two types of columns. First type is
measures and second type is the foreign keys for the dimension tables.
What is SnowFlake Schema?
n SnowFlake Schema, the dimensions are further divided into sub dimensions. The dimensions
here are normalized.
What is Dimension table?
A dimension table is one that describes the business entities of an enterprise.
What is operational data store (ODS)?
Operational data store is defined to be structure that is
Subject-oriented
ntegrated
Volatile
and current data that is a day or perhaps a month old.
What is Logical Data Modeling?
Logical Data Modeling is a type of data modeling which represents business requirements of an
organization.
What is Physical Data Modeling?
Physical Data Modeling is a type of data modeling which includes all required tables, columns,
and relationships for the physical implementation of a database.
Difference between mapplet and reusable transformation?
Using mapplet we can make set of transformations reusable whereas in reusable transformation,
we can make only one transformation as reusable.
what is Data cleansing?
t is the process of converting data from different format of files or databases to single required
format.
What are the types of files created by nformatica server during the session running?
Types of files created are
Cache file
Session log file
nformatica server log file
Output file
Reject file
What are the types of repositories created by nformatica repository manager?
Four types of repositories are created using repository manager
Standalone repository
Global repository
Local repository
Versioned repository
What are the two types of processes that run the session?
The two types of processes that run the session are
Load Manager
DTM processes (Data Transformation Manager)
What is the difference between Union and Lookup transformation?
Union transformation is Active while Lookup transformation is Passive.
For Union transformation, the source tables or data should have similar structure while its not the
case with the Lookup transformation.
Lookup transformation can run on source or target tables while Union tables work only on source
tables.
What is throughput in nformatica?
Throughput is nothing but the rate at which nformatica server reads the data from sources and
writes them successfully to the target.
Where can we find the throughput option in nformatica?
We can view this in workflow monitor
n workflow monitor, right click on session, then click on Get Run Properties and under
Source/Target statistics we can find throughput option
What is code page?
Code page consists of encoding to specify characters in set of one or more languages and are
selected based on source language.
What are types of loading in nformatica?
The two types of loading available in nformatica are
Bulk Loading
Normal Loading
What is the difference between local index and global index?
Global index is nothing but a single index covering all partitions whereas local index has separate
index for each partition.
What is Complex Mapping?
Complex Mapping will have the following features
Dificult requirements
Many number of transformations
Complex business logic
How many ways you can add ports?
Two ways
From other transformation
Click on add port button
How many number of sessions can you can you group in batches?
Any number of sessions but the lesser the number of sessions in a batch, the easier the
migration.
What is the difference between Aggregator transformation and Expression Transformation?
Aggregator transformation use aggregator functions and performs calculation on entire group
whereas in Expression transformation performs calculation on row by row basis.
What is the difference between Filter transformation and Router transformation?
Filter transformation drops the data that do not meet the condition whereas Router transformation
captures the data even though the condition is not met and saves it in Default output group.
Filter transformation works on single condition only while Router transformation works on multiple
conditions as well.
Filter transformation gives only one output. Router transformation can give more than one output.
What is the similarity between Router and Filter transformations?
Router transformation and Filter transformations are used to filter the data based on condition.
Both Filter and Router transformation are Active transformations.
Both Filter and Router transformation are connected.
What is the difference between Source Qualifier transformation and Joiner transformation?
Source Qualifier transformation is used to join the data from homogeneous sources while Joiner
transformation is used to join data from heterogeneous sources as well as homogenous sources
from different schemas.
We need matching keys to join two relational sources in Source Qualifier transformation and is
not the case with joiner transformation.
What is PARAM file?
Param file is an ordinary text file where we can define the value for the parameter which is
defined in session.
These parameter files are defined in session properties.
Which transformation should we use to normalize the COBOL and relational sources?
We need to make use of Normalizer Transformation
What are the various transformations which we cannot use in Mapplet?
Transformations which we cannot use in Mapplet are
Normalizer Transformation
XML Source Qualifier Transformation
Target Definition
Cobol Sources
Pre and Post Session Stored Procedures
What is the dfference between Joiner transformation and Lookup transformation?
Joiner is active transformation while Lookup is passive transformation.
Joiner works on source data only while Lookup works on source as well as target data.
Joiner transformation supports equi joins only while Lookup supports equi join as well as non equi
joins.
Joiner transformation is connected while Lookup transformation can be either connected or
unconnected.
What is the use of Lookup transformation?
Lookup transformation is used to check whether matching record exists in the target table and if
the matching record doesn't exist, it inserts accordingly.
How can we improve the session performance in Aggregator transformation?
We can increase the session performance by sending the sorted input to the aggregator
transformation.
What is aggregated cache?
Aggregator cache is a temporary location which stores the input data values while the
aggregation calculation is being carried out.
What is Data Driven?
nformatica Server follows the instructions coded into update strategy within session mapping
which determine how to flag the records for nsert, Update, Delete and Reject.
Difference between connected Lookup and Unconnected Lookup?
Connected lookup receives input directly from mapping pipeline whereas unconnected lookup
receives input from :LKP expression of another transformation.
Connected lookup returns more than one column in a row where as unconnected
lookup returns only one column in each row.
Connected lookup Supports user-defined values while unconnected doesn't.
Connected lookup is not reusable whereas unconnected is.
Performance of connected lookup is lower compared to unconnected lookup.
What is a Mapplet?
Mapplet is an object which consists of set of reusable transformations which can be used in
different mappings.
Can Lookup be done on flat files?
Yes
What is the transformation used in loading 4 flat files of similar structure to a single target?
We can make use of Union transformation
Difference between direct and indirect loading options in sessions?
Direct loading can be used on single transformations while indirect loading can be used on
multiple transformations
n direct loading, we can perform recovery process while in indirect loading, we cannot perform
recovery process.
What are the various techniques for implementing a dimensional model?
Star schema
Snowflake schema
What are the types of dimensions?
There are three types of dimensions
Slowly changing dimensions
Confirmed dimensions
Casual dimensions
What is SQL override?
t is nothing but overriding SQL in source qualifier or lookup for additional logic.
What are the default values for variables?
The default variables for
Number=0
Variable=NULL
Date=1/1/1753
What are the types of data movement in nformatica server?
There are two types of modes in nformatica
Ascii mode
Unicode mode
What happens to the discarded records in Filter transformation?
Discarded rows do not appear in session logs and reject files.
How to increase the performance of session in using Filter transformation?
To increase the performance of session, we need to use Filter transformation as close as possible
to the sources in the mapping.
n case of flat files (which comes through FTP) haven't arrived what happens?
The session is going to fail because of fatal error
Why do we need to reinitialise aggregate cache?
We need to reinitialise the aggregate cache only to remove the previous data present in the
aggrgator cache and to be used by the new source.
What are the various mapping objects available in nformatica?
Mapping objects that are aviailable in nformatica are
Source Definition
Target Definition
Link
Transformation
Mapplet
What is the default source option for Update Strategy?
Data Driven
What is the use of Update Strategy transformation?
Update Strategy is used to perform DML operations like nsert, Update, Delete and Reject on
already populated targets.
What is the default join that the Joiner transformation provides?
Normal join
What are the types of Lookup Cache?
Static cache
Dynamic cache
Persistent cache
Recache from database
Shared cache
What are the differences between unique key and primary key?
Primary key cannot contain null value whereas unique key can contain one and only one null
value
n case of sql server, with default options, primary key is created as a clustered index while
unique key is created as a non clustered index
A unique key is similar to primary key but we can have more than one unique key per table
What are the types of joins in nformatica and in Oracle?
There are four types of joins in oracle
equi join
non equi join
self join
outer join
Joins in informatica
master join (right outer join)
detailed join (left outer join)
normal join
What is the difference between Rowd and RowNum?
Rowd is the physical address of a row. f we know the Rowd, we can read entire row.
RowNum is the temporary number allocated during query execution
What are pseudo columns? What are the various types of pseudo columns?
Pseudo columns are columns which are not in the table but they can be used in sequel queries
as if they are part of the table
RowNum
Rowd
Sysdate
User
Currval
Nextval
What are the various types of Statements in Oracle?
The various Statements that are available in Oracle are
Data Manipulation Language (DML) Statements
Data Definition Language (DDL) Statements
Transaction Control (TC) Statements
Session Control (SC) Statements
System Control (SC) Statements
What are the various types of DML Statements?
Select
Update
Delete
nsert
Merge
What are various AGGREGATOR FUNCTONS?
SUM
AVG
MN
MAX
COUNT
STDDEV
VARANCE
FRST
LAST
What are the types of joins in oracle?
Equijoins or simple joins or inner joins
Non-equijoin
Outer join
Self join
Difference between char and varchar?
Char is a fixed length data type where as varchar is a variable length data type. So in case of
char, though we entered less size for a column, it is going to allocate max size and in case of
varchar, it is not going to allocate the maximum size.
Difference between varchar and varchar2
Varchar and varchar2 are variable length datatypes. Varchar datatype has a size of 2000 bytes
whereas varchar2 has a size of 4000 bytes.
Varchar is in ascii whereas varchar2 is in unicode.
What is Normalisation? Define 1NF, 2NF and 3NF.
t is the process of reducing the complex data structure into a simpler one by removing the
redundancy.
First normal form: First normal form state that each field is atomic.
Second normal form: Second normal form states that data redundancy can be reduced if all the
non key attributes which are dependent on one of the primary keys of a composite primary key
are put to a separate table along with depended primary key . This should also satisfy the 1nf. For
example, if a table has attributes like partid, city, state and country and partid, country for a
composite primary key and city, state depend on country then the table is sepeated as two
different tables as attributes having partid, country and country, state, city.
Third normal form: Third normal form states that if a dependency exist between non key
attributes, then these attributes are isolated to a different table. This should also satisfy the 1nf +
2nf.
========================================================================
Tags:nformatica Questions Answers.Questions asked in nfosys, TCS, CTS, WPRO, Accenture,
ORACLE, CSC, EDS, BM, L&T
When to use Abort, Decode functions?
Ans: Abort can be used to Abort / stop the session on an error condition. f the primary key
column contains NULL, and you need to stop the session from continuing then you may use
ABORT function in the default value for the port. t can be used with F and DECODE function to
Abort the session.
What is constraint based loading ?
Ans: Constraint based loading. the data was loaded into the target table based on the
Constraints.i.e if we want to load the EMP&DEPT data, first it loads the data of DEPT then EMP
because DEPT is PARENT table EMP is CHLD table. n simple terms, it loads PARENT table first
then CHLD table.
n a joiner transformation, you should specify the source with fewer rows as the master source.
Why?
Ans: Joiner transformation compares each row of the master source against the detail source.
The fewer unique rows in the master, the fewer iterations of the join comparison occur, which
speeds the join process.
What is ncremental Aggregation?
Ans: Whenever a session is created for a mapping Aggregate Transformation, the session option
for ncremental Aggregation can be enabled. When PowerCenter performs incremental
aggregation, it passes new source data through the mapping and uses historical cache data to
perform new aggregation calculations incrementally.
What is the default join that source qualifier provides?
Ans: nner equi join.
What are the options in the target session of update strategy transformations?
Ans:
nsert
Delete
Update
Update as update
Update as insert
Update else insert
Truncate table
Which transformation should we use to normalize the COBOL and relational sources?
Ans: Normalizer Transformation.
What is the use of tracing levels in transformation?
Ans: Tracing levels store information about mapping and transformations.
What are the basic needs to join two sources in a source qualifier?
Ans: Two sources should have primary and foreign key relationships. Two sources should have
matching data types.
What is update strategy transformation?
Ans: This transformation is used to maintain the history data or just most recent changes in to
target table.
Describe two levels in which update strategy transformation sets?
Ans: Within a session. When you configure a session, you can instruct the nformatica Server to
either treat all records in the same way (for example, treat all records as inserts), or use
instructions coded into the session mapping to flag records for different database operations.
Within a mapping. Within a mapping, you use the Update Strategy transformation to flag records
for insert, delete, update, or reject.
What are the various types of transformation?
Ans: Various types of transformation are: Aggregator Transformation, Expression Transformation,
Filter Transformation, Joiner Transformation, Lookup
Transformation, Normalizer Transformation, Rank Transformation, Router Transformation,
Sequence Generator Transformation, Stored Procedure Transformation,
Sorter Transformation, Update Strategy Transformation, XML Source Qualifier Transformation,
Advanced External Procedure Transformation, External Transformation.
What is the difference between active transformation and passive transformation?
Ans: An active transformation can change the number of rows that pass through it, but a passive
transformation can not change the number of rows that pass through it.
What is the use of control break statements?
Ans: They execute a set of codes within the loop and endloop.
- See more at: http://www.aired.in/2010/03/informatica-etl-interview-
questions.html#sthash.OffYGKr5.dpuf
FNAL NTERVEW QUESTONS ( ETL - NFORMATCA)
Data warehousing Basics
Definition of data warehousing?
Data warehouse is a Subject oriented, ntegrated, Time variant, Non volatile collection of data
in support of management's decision making process.

Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more about your
company's sales data, you can build a warehouse that concentrates on sales. Using this
warehouse, you can answer questions like "Who was our best customer for this item last year?"
This ability to define a data warehouse by subject matter, sales in this case makes the data
warehouse subject oriented.
ntegrated
ntegration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming conflicts
and inconsistencies among units of measure. When they achieve this, they are said to be
integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change. This is logical
because the purpose of a warehouse is to enable you to analyze what has occurred.
Time Variant
n order to discover trends in business, analysts need large amounts of data. This is very much in
contrast to online transaction processing (OLTP) systems, where performance requirements
demand that historical data be moved to an archive. A data warehouse's focus on change over
time is what is meant by the term time variant.
How many stages in Datawarehousing?
Data warehouse generally includes two stages
ETL
Report Generation
ETL
Short for extract, transform, load, three database functions that are combined into one tool
Extract -- the process of reading data from a source database.
Transform -- the process of converting the extracted data from its previous form into required form
Load -- the process of writing the data into the target database.
ETL is used to migrate data from one database to another, to form data marts and data
warehouses and also to convert databases from one format to another format.
t is used to retrieve the data from various operational databases and is transformed into useful
information and finally loaded into Datawarehousing system.
1 NFORMATCA
2 ABNTO
3 DATASTAGE
4. BOD
5 ORACLE WAREHOUSE BULDERS
Report generation
n report generation, OLAP is used (i.e.) online analytical processing. t is a set of
specification which allows the client applications in retrieving the data for analytical processing.
t is a specialized tool that sits between a database and user in order to provide various analyses
of the data stored in the database.
OLAP Tool is a reporting tool which generates the reports that are useful for Decision support for
top level management.

Business Objects
Cognos
Micro strategy
Hyperion
Oracle Express
6. Microsoft Analysis Services

Different Between OLTP and OLAP
OLTP
OLAP
1
Application Oriented (e.g., purchase order it is functionality of an application)
Subject Oriented (subject in the sense customer, product, item, time)
2
Used to run business
Used to analyze business
3
Detailed data
Summarized data
4
Repetitive access
Ad-hoc access
5
Few Records accessed at a time (tens), simple query
Large volumes accessed at a time(millions), complex query
6
Small database
Large Database
7
Current data
Historical data
8
Clerical User
Knowledge User
9
Row by Row Loading
Bulk Loading
10
Time invariant
Time variant
11
Normalized data
De-normalized data
12
E R schema
Star schema
What are the types of datawarehousing?
EDW (Enterprise datawarehousing)
t provides a central database for decision support throughout the enterprise
t is a collection of DATAMARTS
DATAMART
t is a subset of Datawarehousing
t is a subject oriented database which supports the needs of individuals depts. in an
organizations
t is called high performance query structure
t supports particular line of business like sales, marketing etc..
ODS (Operational data store)
t is defined as an integrated view of operational database designed to support operational
monitoring
t is a collection of operational data sources designed to support Transaction processing
Data is refreshed near real-time and used for business activity
t is an intermediate between the OLTP and OLAP which helps to create an instance reports
4. What are the modeling involved in Data Warehouse Architecture?
5. What are the types of Approach in DWH?
Bottom up approach: first we need to develop data mart then we integrate these data mart into
EDW
Top down approach: first we need to develop EDW then form that EDW we develop data mart
Bottom up
OLTP ETL Data mart DWH OLAP
Top down
OLTP ETL DWH Data mart OLAP
Top down
Cost of initial planning & design is high
Takes longer duration of more than an year
Bottom up
Planning & Designing the Data Marts without waiting for the Global warehouse design
mmediate results from the data marts
Tends to take less time to implement
Errors in critical modules are detected earlier.
Benefits are realized in the early phases.
t is a Best Approach
Data Modeling Types:
Conceptual Data Modeling
Logical Data Modeling
Physical Data Modeling
Dimensional Data Modeling
1. Conceptual Data Modeling
Conceptual data model includes all major entities and relationships and does not contain
much detailed level of information about attributes and is often used in the NTAL PLANNNG
PHASE
Conceptual data model is created by gathering business requirements from various sources
like business documents, discussion with functional teams, business analysts, smart
management experts and end users who do the reporting on the database. Data modelers create
conceptual data model and forward that model to functional team for their review.
Conceptual data modeling gives an idea to the functional and technical team about how
business requirements would be projected in the logical data model.
2. Logical Data Modeling
This is the actual implementation and extension of a conceptual data model. Logical data
model includes all required entities, attributes, key groups, and relationships that represent
business information and define business rules.
3. Physical Data Modeling
Physical data model includes all required tables, columns, relationships, database properties
for the physical implementation of databases. Database performance, indexing strategy, physical
storage and demoralization are important parameters of a physical model.
Logical vs. Physical Data Modeling
Logical Data Model
Physical Data Model
Represents business information and defines business rules
Represents the physical implementation of the model in a database.
Entity
Table
Attribute
Column
Primary Key
Primary Key Constraint
Alternate Key
Unique Constraint or Unique ndex
nversion Key Entry
Non Unique ndex
Rule
Check Constraint, Default Value
Relationship
Foreign Key
Definition
Comment
Dimensional Data Modeling
Dimension model consists of fact and dimension tables
t is an approach to develop the schema DB designs
Types of Dimensional modeling
Star schema
Snow flake schema
Star flake schema (or) Hybrid schema
Multi star schema
What is Star Schema?
The Star Schema Logical database design which contains a centrally located fact table
surrounded by at least one or more dimension tables
Since the database design looks like a star, hence it is called star schema db
The Dimension table contains Primary keys and the textual descriptions
t contain de-normalized business information
A Fact table contains a composite key and measures
The measure are of types of key performance indicators which are used to evaluate the
enterprise performance in the form of success and failure
Eg: Total revenue , Product sale , Discount given, no of customers
To generate meaningful report the report should contain at least one dimension and one fact
table
The advantage of star schema
Less number of joins
mprove query performance
Slicing down
Easy understanding of data.
Disadvantage:
Require more storage space
Example of Star Schema:
Snowflake Schema
n star schema, f the dimension tables are spitted into one or more dimension tables
The de-normalized dimension tables are spitted into a normalized dimension table
Example of Snowflake Schema:
n Snowflake schema, the example diagram shown below has 4 dimension tables, 4 lookup
tables and 1 fact table. The reason is that hierarchies (category, branch, state, and month) are
being broken out of the dimension tables (PRODUCT, ORGANZATON, LOCATON, and TME)
respectively and separately.
t increases the number of joins and poor performance in retrieval of data.
n few organizations, they try to normalize the dimension tables to save space.
Since dimension tables hold less space snow flake schema approach may be avoided.
Bit map indexes cannot be effectively utilized
mportant aspects of Star Schema & Snow Flake Schema
n a star schema every dimension will have a primary key.
n a star schema, a dimension table will not have any parent table.
Whereas in a snow flake schema, a dimension table will have one or more parent tables.
Hierarchies for the dimensions are stored in the dimensional table itself in star schema.
Whereas hierarchies are broken into separate tables in snow flake schema. These hierarchies
help to drill down the data from topmost hierarchies to the lowermost hierarchies.
Star flake schema (or) Hybrid Schema
Hybrid schema is a combination of Star and Snowflake schema
Multi Star schema
Multiple fact tables sharing a set of dimension tables
Confirmed Dimensions are nothing but Reusable Dimensions.
The dimensions which u r using multiple times or in multiple data marts.
Those are common in different data marts
Measure Types (or) Types of Facts
Additive - Measures that can be summed up across all dimensions.
Ex: Sales Revenue
Semi Additive - Measures that can be summed up across few dimensions and not with others
Ex: Current Balance
Non Additive - Measures that cannot be summed up across any of the dimensions.
Ex: Student attendance
Surrogate Key
Joins between fact and dimension tables should be based on surrogate keys
Users should not obtain any information by looking at these keys
These keys should be simple integers
A sample data warehouse schema
WHY NEED STAGNG AREA FOR DWH?
Staging area needs to clean operational data before loading into data warehouse.
Cleaning in the sense your merging data which comes from different source.
t's the area where most of the ETL is done
Data Cleansing
t is used to remove duplications
t is used to correct wrong email addresses
t is used to identify missing data
t used to convert the data types
t is used to capitalize name & addresses.
Types of Dimensions:
There are three types of Dimensions
Confirmed Dimensions
Junk Dimensions Garbage Dimension
Degenerative Dimensions
Slowly changing Dimensions
Garbage Dimension or Junk Dimension
Confirmed is something which can be shared by multiple Fact Tables or multiple Data Marts.
Junk Dimensions is grouping flagged values
Degenerative Dimension is something dimensional in nature but exist fact table.(nvoice No)
Which is neither fact nor strictly dimension attributes. These are useful for some kind of
analysis. These are kept as attributes in fact table called degenerated dimension
Degenerate dimension: A column of the key section of the fact table that does not have the
associated dimension table but used for reporting and analysis, such column is called degenerate
dimension or line item dimension.
For ex, we have a fact table with customer_id, product_id, branch_id, employee_id, bill_no, and
date in key section and price, quantity, amount in measure section. n this fact table, bill_no from
key section is a single value; it has no associated dimension table. nstead of creating a
Separate dimension table for that single value, we can nclude it in fact table to improve
performance. SO here the column, bill_no is a degenerate dimension or line item dimension.

nformatica Architecture
The Power Center domain
t is a primary unit of the Administration.
Can have single and multiple domains.
t is a collection of nodes and services.
Nodes
A node is the logical representation of a machine in a domain
One node in the domain acts as a gateway node to receive service requests from clients and
route them to the appropriate service and node
ntegration Service:
ntegration Service does all the real job. t extracts data from sources, processes it as per the
business logic and loads data to targets.
Repository Service:
Repository Service is used to fetch the data from the repository and sends it back to the
requesting components (mostly client tools and integration service)
Power Center Repository:
Repository is nothing but a relational database which stores all the metadata created in Power
Center.
Power Center Client Tools:
The Power Center Client consists of multiple tools.
Power Center Administration Console:
This is simply a web-based administration tool you can use to administer the Power Center
installation.

Q. How can you define a transformation? What are different types of transformations available in
nformatica?
A. A transformation is a repository object that generates, modifies, or passes data. The Designer
provides a set of transformations that perform specific functions. For example, an Aggregator
transformation performs calculations on groups of data. Below are the various transformations
available in nformatica:
Aggregator
Custom
Expression
External Procedure
Filter
nput
Joiner
Lookup
Normalizer
Rank
Router
Sequence Generator
Sorter
Source Qualifier
Stored Procedure
Transaction Control
Union
Update Strategy
XML Generator
XML Parser
XML Source Qualifier
Q. What is a source qualifier? What is meant by Query Override?
A. Source Qualifier represents the rows that the PowerCenter Server reads from a relational or
flat file source when it runs a session. When a relational or a flat file source definition is added to
a mapping, it is connected to a Source Qualifier transformation.
PowerCenter Server generates a query for each Source Qualifier Transformation whenever it
runs the session. The default query is SELET statement containing all the source columns.
Source Qualifier has capability to override this default query by changing the default settings of
the transformation properties. The list of selected ports or the order they appear in the default
query should not be changed in overridden query.
Q. What is aggregator transformation?
A. The Aggregator transformation allows performing aggregate calculations, such as averages
and sums. Unlike Expression Transformation, the Aggregator transformation can only be used to
perform calculations on groups. The Expression transformation permits calculations on a rowby-
row basis only.
Aggregator Transformation contains group by ports that indicate how to group the data. While
grouping the data, the aggregator transformation outputs the last row of each group unless
otherwise specified in the transformation properties.
Various group by functions available in nformatica are : AVG, COUNT, FRST, LAST, MAX,
MEDAN, MN, PERCENTLE, STDDEV, SUM, VARANCE.
Q. What is ncremental Aggregation?
A. Whenever a session is created for a mapping Aggregate Transformation, the session option for
ncremental Aggregation can be enabled. When PowerCenter performs incremental aggregation,
it passes new source data through the mapping and uses historical cache data to perform new
aggregation calculations incrementally.
Q. How Union Transformation is used?
A. The union transformation is a multiple input group transformation that can be used to merge
data from various sources (or pipelines). This transformation works just like UNON ALL
statement in SQL, that is used to combine result set of two SELECT statements.
Q. Can two flat files be joined with Joiner Transformation?
A. Yes, joiner transformation can be used to join data from two flat file sources.
Q. What is a look up transformation?
A. This transformation is used to lookup data in a flat file or a relational table, view or synonym. t
compares lookup transformation ports (input ports) to the source column values based on the
lookup condition. Later returned values can be passed to other transformations.
Q. Can a lookup be done on Flat Files?
A. Yes.
Q. What is a mapplet?
A. A mapplet is a reusable object that is created using mapplet designer. The mapplet contains
set of transformations and it allows us to reuse that transformation logic in multiple mappings.
Q. What does reusable transformation mean?
A. Reusable transformations can be used multiple times in a mapping. The reusable
transformation is stored as a metadata separate from any other mapping that uses the
transformation. Whenever any changes to a reusable transformation are made, all the mappings
where the transformation is used will be invalidated.
Q. What is update strategy and what are the options for update strategy?
A. nformatica processes the source data row-by-row. By default every row is marked to be
inserted in the target table. f the row has to be updated/inserted based on some logic Update
Strategy transformation is used. The condition can be specified in Update Strategy to mark the
processed row for update or insert.
Following options are available for update strategy:
DD_NSERT: f this is used the Update Strategy flags the row for insertion. Equivalent numeric
value of DD_NSERT is 0.
DD_UPDATE: f this is used the Update Strategy flags the row for update. Equivalent numeric
value of DD_UPDATE is 1.
DD_DELETE: f this is used the Update Strategy flags the row for deletion. Equivalent numeric
value of DD_DELETE is 2.
DD_REJECT: f this is used the Update Strategy flags the row for rejection. Equivalent numeric
value of DD_REJECT is 3.

Q. What are the types of loading in nformatica?
There are two types of loading, 1. Normal loading and 2. Bulk loading.
n normal loading, it loads record by record and writes log for that. t takes comparatively a longer
time to load data to the target.
n bulk loading, it loads number of records at a time to target database. t takes less time to load
data to target.
Q. What is aggregate cache in aggregator transformation?
The aggregator stores data in the aggregate cache until it completes aggregate calculations.
When you run a session that uses an aggregator transformation, the informatica server creates
index and data caches in memory to process the transformation. f the informatica server requires
more space, it stores overflow values in cache files.
Q. What type of repositories can be created using nformatica Repository Manager?
A. nformatica PowerCenter includes following type of repositories:
Standalone Repository: A repository that functions individually and this is unrelated to any other
repositories.
Global Repository: This is a centralized repository in a domain. This repository can
contain shared objects across the repositories in a domain. The objects are shared through global
shortcuts.
Local Repository: Local repository is within a domain and it s not a global repository. Local
repository can connect to a global repository using global shortcuts and can use objects in its
shared folders.
Versioned Repository: This can either be local or global repository but it allows version control
for the repository. A versioned repository can store multiple copies, or versions of an object. This
feature allows efficiently developing, testing and deploying metadata in the production
environment.
Q. What is a code page?
A. A code page contains encoding to specify characters in a set of one or more languages. The
code page is selected based on source of the data. For example if source contains Japanese text
then the code page should be selected to support Japanese text.
When a code page is chosen, the program or application for which the code page is set, refers to
a specific set of data that describes the characters the application recognizes. This influences the
way that application stores, receives, and sends character data.
Q. Which all databases PowerCenter Server on Windows can connect to?
A. PowerCenter Server on Windows can connect to following databases:
BM DB2
nformix
Microsoft Access
Microsoft Excel
Microsoft SQL Server
Oracle
Sybase
Teradata
Q. Which all databases PowerCenter Server on UNX can connect to?
A. PowerCenter Server on UNX can connect to following databases:
BM DB2
nformix
Oracle
Sybase
Teradata
Q. How to execute PL/SQL script from nformatica mapping?
A. Stored Procedure (SP) transformation can be used to execute PL/SQL Scripts. n SP
Transformation PL/SQL procedure name can be specified. Whenever the session is executed,
the session will call the pl/sql procedure.
Q. What is Data Driven?
The informatica server follows instructions coded into update strategy transformations within the
session mapping which determine how to flag records for insert, update, delete or reject. f we do
not choose data driven option setting, the informatica server ignores all update strategy
transformations in the mapping.
Q. What are the types of mapping wizards that are provided in nformatica?
The designer provide two mapping wizard.
1. Getting Started Wizard - Creates mapping to load static facts and dimension tables as well as
slowly growing dimension tables.
2. Slowly Changing Dimensions Wizard - Creates mappings to load slowly changing dimension
tables based on the amount of historical dimension data we want to keep and the method we
choose to handle historical dimension data.
Q. What is Load Manager?
A. While running a Workflow, the PowerCenter Server uses the Load Manager
process and the Data Transformation Manager Process (DTM) to run the workflow and carry out
workflow tasks. When the PowerCenter Server runs a workflow, the Load Manager performs the
following tasks:
1. Locks the workflow and reads workflow properties.
2. Reads the parameter file and expands workflow variables.
3. Creates the workflow log file.
4. Runs workflow tasks.
5. Distributes sessions to worker servers.
6. Starts the DTM to run sessions.
7. Runs sessions from master servers.
8. Sends post-session email if the DTM terminates abnormally.
When the PowerCenter Server runs a session, the DTM performs the following tasks:
1. Fetches session and mapping metadata from the repository.
2. Creates and expands session variables.
3. Creates the session log file.
4. Validates session code pages if data code page validation is enabled. Checks
Query conversions if data code page validation is disabled.
5. Verifies connection object permissions.
6. Runs pre-session shell commands.
7. Runs pre-session stored procedures and SQL.
8. Creates and runs mappings, reader, writer, and transformation threads to extract,
transform, and load data.
9. Runs post-session stored procedures and SQL.
10. Runs post-session shell commands.
11. Sends post-session email.
Q. What is Data Transformation Manager?
A. After the load manager performs validations for the session, it creates the DTM
process. The DTM process is the second process associated with the session run. The
primary purpose of the DTM process is to create and manage threads that carry out
the session tasks.
The DTM allocates process memory for the session and divide it into buffers. This
is also known as buffer memory. t creates the main thread, which is called the
master thread. The master thread creates and manages all other threads.
f we partition a session, the DTM creates a set of threads for each partition to
allow concurrent processing.. When nformatica server writes messages to the
session log it includes thread type and thread D.
Following are the types of threads that DTM creates:
Master Thread - Main thread of the DTM process. Creates and manages all other
threads.
Mapping Thread - One Thread to Each Session. Fetches Session and Mapping
nformation.
Pre and Post Session Thread - One Thread each to Perform Pre and Post Session
Operations.
Reader Thread - One Thread for Each Partition for Each Source Pipeline.
Writer Thread - One Thread for Each Partition if target exist in the source pipeline
write to the target.
Transformation Thread - One or More Transformation Thread For Each Partition.
Q. What is Session and Batches?
Session - A Session s A set of instructions that tells the nformatica Server How
And When To Move Data From Sources To Targets. After creating the session, we
can use either the server manager or the command line program pmcmd to start
or stop the session.
Batches - t Provides A Way to Group Sessions For Either Serial Or Parallel Execution By The
nformatica Server. There Are Two Types Of Batches:
1. Sequential - Run Session One after the Other.
2. Concurrent - Run Session At The Same Time.
Q. How many ways you can update a relational source definition and what
are they?
A. Two ways
1. Edit the definition
2. Reimport the definition
Q. What is a transformation?
A. t is a repository object that generates, modifies or passes data.
Q. What are the designer tools for creating transformations?
A. Mapping designer
Transformation developer
Mapplet designer
Q. n how many ways can you create ports?
A. Two ways
1. Drag the port from another transformation
2. Click the add button on the ports tab.
Q. What are reusable transformations?
A. A transformation that can be reused is called a reusable transformation
They can be created using two methods:
1. Using transformation developer
2. Create normal one and promote it to reusable
Q. s aggregate cache in aggregator transformation?
A. The aggregator stores data in the aggregate cache until it completes aggregate calculations.
When u run a session that uses an aggregator transformation, the nformatica server creates
index and data caches in memory to process the transformation. f the nformatica server requires
more space, it stores overflow values in cache files.
Q. What r the settings that u use to configure the joiner transformation?
Master and detail source
Type of join
Condition of the join
Q. What are the join types in joiner transformation?
A. Normal (Default) -- only matching rows from both master and detail
Master outer -- all detail rows and only matching rows from master
Detail outer -- all master rows and only matching rows from detail
Full outer -- all rows from both master and detail (matching or non matching)
Q. What are the joiner caches?
A. When a Joiner transformation occurs in a session, the nformatica Server reads all the records
from the master source and builds index and data caches based on the master rows. After
building the caches, the Joiner transformation reads records
from the detail source and performs joins.
Q. What r the types of lookup caches?
Static cache: You can configure a static or read-only cache for only lookup table. By default
nformatica server creates a static cache. t caches the lookup table and lookup values in the
cache for each row that comes into the transformation. When the lookup condition is true, the
nformatica server does not update the cache while it processes the lookup transformation.
Dynamic cache: f you want to cache the target table and insert new rows into cache and
the target, you can create a look up transformation to use dynamic cache. The nformatica server
dynamically inserts data to the target table.
Persistent cache: You can save the lookup cache files and reuse them the next time the
nformatica server processes a lookup transformation configured to use the cache.
Recache from database: f the persistent cache is not synchronized with the lookup table,
you can configure the lookup transformation to rebuild the lookup cache.
Shared cache: You can share the lookup cache between multiple transactions. You can
share unnamed cache between transformations in the same mapping.
Q. What is Transformation?
A: Transformation is a repository object that generates, modifies, or passes data.
Transformation performs specific function. They are two types of transformations:
1. Active
Rows, which are affected during the transformation or can change the no of rows that pass
through it. Eg: Aggregator, Filter, Joiner, Normalizer, Rank, Router, Source qualifier, Update
Strategy, ERP Source Qualifier, Advance External Procedure.
2. Passive
Does not change the number of rows that pass through it. Eg: Expression, External Procedure,
nput, Lookup, Stored Procedure, Output, Sequence Generator, XML Source Qualifier.
Q. What are Options/Type to run a Stored Procedure?
A: Normal: During a session, the stored procedure runs where the
transformation exists in the mapping on a row-by-row basis. This is useful for calling the stored
procedure for each row of data that passes through the mapping, such as running a calculation
against an input port. Connected stored procedures run only in normal mode.
Pre-load of the Source. Before the session retrieves data from the source, the stored procedure
runs. This is useful for verifying the existence of tables or performing joins of data in a temporary
table.
Post-load of the Source. After the session retrieves data from the source, the stored procedure
runs. This is useful for removing temporary tables.
Pre-load of the Target. Before the session sends data to the target, the stored procedure runs.
This is useful for verifying target tables or disk space on the target system.
Post-load of the Target. After the session sends data to the target, the stored procedure runs. This
is useful for re-creating indexes on the database. t must contain at least one nput and one
Output port.
Q. What kinds of sources and of targets can be used in nformatica?
Sources may be Flat file, relational db or XML.
Target may be relational tables, XML or flat files.
Q: What is Session Process?
A: The Load Manager process. Starts the session, creates the DTM process, and
sends post-session email when the session completes.
Q. What is DTM process?
A: The DTM process creates threads to initialize the session, read, write, transform
data and handle pre and post-session operations.
Q. What is the different type of tracing levels?
Tracing level represents the amount of information that nformatica Server writes in a log file.
Tracing levels store information about mapping and transformations. There are 4 types of tracing
levels supported
1. Normal: t specifies the initialization and status information and summarization of the success
rows and target rows and the information about the skipped rows due to transformation errors.
2. Terse: Specifies Normal + Notification of data
3. Verbose nitialization: n addition to the Normal tracing, specifies the location of the data cache
files and index cache files that are treated and detailed transformation statistics for each and
every transformation within the mapping.
4. Verbose Data: Along with verbose initialization records each and every record processed by
the informatica server.
Q. TYPES OF DMENSONS?
A dimension table consists of the attributes about the facts. Dimensions store the textual
descriptions of the business.
Conformed Dimension:
Conformed dimensions mean the exact same thing with every possible fact table to which they
are joined.
Eg: The date dimension table connected to the sales facts is identical to the date dimension
connected to the inventory facts.
Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text attributes that are
unrelated to any particular dimension. The junk dimension is simply a structure that provides a
convenient place to store the junk attributes.
Eg: Assume that we have a gender dimension and marital status dimension. n the fact table we
need to maintain two keys referring to these dimensions. nstead of that create a junk dimension
which has all the combinations of gender and marital status (cross join gender and marital status
table and create a junk table). Now we can maintain only one key in the fact table.
Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and doesn't have its
own dimension table.
Eg: A transactional code in a fact table.
Slowly changing dimension:
Slowly changing dimensions are dimension tables that have slowly increasing
data as well as updates to existing data.
Q. What are the output files that the nformatica server creates during the
session running?
nformatica server log: nformatica server (on UNX) creates a log for all status and
error messages (default name: pm.server.log). t also creates an error log for error
messages. These files will be created in nformatica home directory
Session log file: nformatica server creates session log file for each session. t writes
information about session into log files such as initialization process, creation of sql
commands for reader and writer threads, errors encountered and load summary. The
amount of detail in session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each target in mapping.
Session detail includes information such as table name, number of rows written or
rejected. You can view this file by double clicking on the session in monitor window.
Performance detail file: This file contains information known as session performance
details which helps you where performance can be improved. To generate this file
select the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write to
targets.
Control file: nformatica server creates control file and a target file when you run a
session that uses the external loader. The control file contains the information about
the target flat file such as data format and loading instructions for the external
loader.
Post session email: Post session email allows you to automatically communicate
information about a session run to designated recipients. You can create two
different messages. One if the session completed successfully the other if the session
fails.
ndicator file: f you use the flat file as a target, you can configure the nformatica
server to create indicator file. For each target row, the indicator file contains a
number to indicate whether the row was marked for insert, update, delete or reject.
Output file: f session writes to a target file, the nformatica server creates the
target file based on file properties entered in the session property sheet.
Cache files: When the nformatica server creates memory cache it also creates cache
files.
For the following circumstances nformatica server creates index and data cache
files:
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation
Q. What is meant by lookup caches?
A. The nformatica server builds a cache in memory when it processes the first row
of a data in a cached look up transformation. t allocates memory for the cache
based on the amount you configure in the transformation or session properties. The
nformatica server stores condition values in the index cache and output values in
the data cache.
Q. How do you identify existing rows of data in the target table using lookup
transformation?
A. There are two ways to lookup the target table to verify a row exists or not :
1. Use connect dynamic cache lookup and then check the values of NewLookuprow
Output port to decide whether the incoming record already exists in the table / cache
or not.
2. Use Unconnected lookup and call it from an expression transformation and check
the Lookup condition port value (Null/ Not Null) to decide whether the incoming
record already exists in the table or not.
Q. What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is grouped to certain
levels of dimensions. Retrieving the required data from the actual table, which have millions of
records will take more time and also affects the server performance. To avoid this we can
aggregate the table to certain required level and can use it. This tables reduces the load in the
database server and increases the performance of the query and can retrieve the result very
fastly.
Q. What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a data warehouse. For
example: Based on design you can decide to put the sales data in each transaction. Now, level of
granularity would mean what detail you are willing to put for each transactional fact. Product sales
with respect to each minute or you want to aggregate it upto minute and put that data.
Q. What is session?
A session is a set of instructions to move data from sources to targets.
Q. What is worklet?
Worklet are objects that represent a set of workflow tasks that allow to reuse a set of workflow
logic in several window.
Use of Worklet: You can bind many of the tasks in one place so that they can easily get identified
and also they can be of a specific purpose.
Q. What is workflow?
A workflow is a set of instructions that tells the nformatica server how to execute the tasks.
Q. Why cannot we use sorted input option for incremental aggregation?
n incremental aggregation, the aggregate calculations are stored in historical cache on the
server. n this historical cache the data need not be in sorted order. f you give sorted input, the
records come as presorted for that particular run but in the historical cache the data may not be in
the sorted order. That is why this option is not allowed.
Q. What is target load order plan?
You specify the target loadorder based on source qualifiers in a mapping. f you have the multiple
source qualifiers connected to the multiple targets, you can designate the order in which
informatica server loads data into the targets.
The Target load Plan defines the order in which data extract from source qualifier transformation.
n Mappings (tab) Target Load Order Plan
Q. What is constraint based loading?
Constraint based load order defines the order of loading the data into the multiple targets based
on primary and foreign keys constraints.
Set the option is: Double click the session
Configure Object > check the Constraint Based Loading
Q. What is the status code in stored procedure transformation?
Status code provides error handling for the informatica server during the session. The stored
procedure issues a status code that notifies whether or not stored procedure completed
successfully. This value cannot see by the user. t only used by the informatica server to
determine whether to continue running the session or stop.
Q. Define nformatica Repository?
The nformatica repository is a relational database that stores information, or metadata, used by
the nformatica Server and Client tools. Metadata can include information such as mappings
describing how to transform source data, sessions indicating when you want the nformatica
Server to perform the transformations, and connect strings for sources and targets.
The repository also stores administrative information such as usernames and passwords,
permissions and privileges, and product version.
Use repository manager to create the repository. The Repository Manager connects to the
repository database and runs the code needed to create the repository tables. These tables
stores metadata in specific format the informatica server, client tools use.
Q. What is a metadata?
Designing a data mart involves writing and storing a complex set of instructions. You need to
know where to get data (sources), how to change it, and where to write the information (targets).
PowerMart and PowerCenter call this set of instructions metadata. Each piece of metadata (for
example, the description of a source table in an operational database) can contain comments
about it.
n summary, Metadata can include information such as mappings describing how to transform
source data, sessions indicating when you want the nformatica Server to perform the
transformations, and connect strings for sources and targets.

Q. What is metadata reporter?
t is a web based application that enables you to run reports against repository metadata. With a
Meta data reporter you can access information about your repository without having knowledge of
sql, transformation language or underlying tables in the repository.
Q. What are the types of metadata that stores in repository?
Source definitions. Definitions of database objects (tables, views, synonyms) or files that provide
source data.
Target definitions. Definitions of database objects or files that contain the target data. Multi-
dimensional metadata. Target definitions that are configured as cubes and dimensions.
Mappings. A set of source and target definitions along with transformations containing business
logic that you build into the transformation. These are the instructions that the nformatica Server
uses to transform and move data.
Reusable transformations. Transformations that you can use in multiple mappings.
Mapplets. A set of transformations that you can use in multiple mappings.
Sessions and workflows. Sessions and workflows store information about how and when the
nformatica Server moves data. A workflow is a set of instructions that describes how and when to
run tasks related to extracting, transforming, and loading data. A session is a type of task that you
can put in a workflow. Each session corresponds to a single mapping.
Following are the types of metadata that stores in the repository
Database Connections
Global Objects
Multidimensional Metadata
Reusable Transformations
Short cuts
Transformations
Q. How can we store previous session logs?
Go to Session Properties > Config Object > Log Options
Select the properties as follows..
Save session log by > SessionRuns
Save session log for these runs > Change the number that you want to save the number of log
files (Default is 0)
f you want to save all of the logfiles created by every run, and then select the option
Save session log for these runs > Session TimeStamp
You can find these properties in the session/workflow Properties.
Q. What is Changed Data Capture?
Changed Data Capture (CDC) helps identify the data in the source system that has changed
since the last extraction. With CDC data extraction takes place at the same time the insert update
or delete operations occur in the source tables and the change data is stored inside the database
in change tables.
The change data thus captured is then made available to the target systems in a controlled
manner.
Q. What is an indicator file? and how it can be used?
ndicator file is used for Event Based Scheduling when you don't know when the Source Data is
available. A shell command, script or a batch file creates and send this indicator file to the
directory local to the nformatica Server. Server waits for the indicator file to appear before
running the session.
Q. What is audit table? and What are the columns in it?
Audit Table is nothing but the table which contains about your workflow names and session
names. t contains information about workflow and session status and their details.
WKFL_RUN_D
WKFL_NME
START_TMST
END_TMST
ROW_NSERT_CNT
ROW_UPDATE_CNT
ROW_DELETE_CNT
ROW_REJECT_CNT
Q. f session fails after loading 10000 records in the target, how can we load 10001th record
when we run the session in the next time?
Select the Recovery Strategy in session properties as Resume from the last check point. Note
Set this property before running the session
Q. nformatica Reject File How to identify rejection reason
D - Valid data or Good Data. Writer passes it to the target database. The target accepts it unless
a database error occurs, such as finding a duplicate key while inserting.
O - Overflowed Numeric Data. Numeric data exceeded the specified precision or scale for the
column. Bad data, if you configured the mapping target to reject overflow or truncated data.
N - Null Value. The column contains a null value. Good data. Writer passes it to the target, which
rejects it if the target database does not accept null values.
T - Truncated String Data. String data exceeded a specified precision for the column, so the
ntegration Service truncated it. Bad data, if you configured the mapping target to reject overflow
or truncated data.
Also to be noted that the second column contains column indicator flag value 'D' which signifies
that the Row ndicator is valid.
Now let us see how Data in a Bad File looks like:
0,D,7,D,John,D,5000.375,O,,N,BrickLand Road Singapore,T
Q. What is nsert Else Update and Update Else nsert?
These options are used when dynamic cache is enabled.
nsert Else Update option applies to rows entering the lookup transformation with the row
type of insert. When this option is enabled the integration service inserts new rows in the cache
and updates existing rows. When disabled, the ntegration Service does not update existing rows.
Update Else nsert option applies to rows entering the lookup transformation with the row
type of update. When this option is enabled, the ntegration Service updates existing rows, and
inserts a new row if it is new. When disabled, the ntegration Service does not insert new rows.
Q. What are the Different methods of loading Dimension tables?
Conventional Load - Before loading the data, all the Table constraints will be checked against the
data.
Direct load (Faster Loading) - All the Constraints will be disabled. Data will be loaded directly.
Later the data will be checked against the table constraints and the bad data won't be indexed.
Q. What are the different types of Commit intervals?
The different commit intervals are:
Source-based commit. The nformatica Server commits data based on the number of
source rows. The commit point is the commit interval you configure in the session properties.
Target-based commit. The nformatica Server commits data based on the number of target
rows and the key constraints on the target table. The commit point also depends on the buffer
block size and the commit interval.
Q. How to add source flat file header into target file?
Edit Task-->Mapping-->Target-->Header Options--> Output field names
Q. How to load name of the file into relation target?
Source Definition-->Properties-->Add currently processed file name port
Q. How to return multiple columns through un-connect lookup?
Suppose your look table has f_name,m_name,l_name and you are using unconnected lookup. n
override SQL of lookup use f_name||~||m_name||~||l_name you can easily get this value using
unconnected lookup in expression. Use substring function in expression transformation to
separate these three columns and make then individual port for downstream transformation
/Target.
-----------------------------------------------------------------------------------------
Q. What is Factless fact table? n which purpose we are using this in our DWH projects? Plz give
me the proper answer?
t is a fact table which does not contain any measurable data.
EX: Student attendance fact (it contains only Boolean values, whether student attended class or
not ? Yes or No.)

A Factless fact table contains only the keys but there is no measures or in other way we can say
that it contains no facts. Generally it is used to integrate the fact tables

Factless fact table contains only foreign keys. We can have two kinds of aggregate functions from
the factless fact one is count and other is distinct count.

2 purposes of factless fact

1. Coverage: to indicate what did NOT happen. Like to
Like: which product did not sell well in a particular region?
2. Event tracking: To know if the event took place or not.
Like: Fact for tracking student's attendance will not contain any measures.
Q. What is staging area?
Staging area is nothing but to apply our logic to extract the data from source and cleansing the
data and put the data into meaningful and summaries of the data for data warehouse.

Q. What is constraint based loading
Constraint based load order defines the order of loading the data into the multiple targets based
on primary and foreign keys constraints.
Q. Why union transformation is active transformation?
the only condition for a transformation to bcum active is row number changes.
Now the thing is how a row number can change. Then there are
2 conditions:
1. either the no of rows coming in and going out is diff.
eg: in case of filter we have the data like
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
and we have a filter condition like dept=4 then the o/p wld
b like

id name dept row_num
1 aa 4 1
3 cc 4 2

So row num changed and it is an active transformation

2. or the order of the row changes
eg: when Union transformation pulls in data, suppose we have
2 sources
sources1:
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
source2:
id name dept row_num
4 aaa 4 4
5 bbb 3 5
6 ccc 4 6

it never restricts the data from any source so the data can
come in any manner

id name dept row_num old row_num
1 aa 4 1 1
4 aaa 4 2 4
5 bbb 3 3 5
2 bb 3 4 2
3 cc 4 5 3
6 ccc 4 6 6

so the row_num are changing . Thus we say that union is an active transformation
Q. What is use of batch file in informatica? How many types of batch file in informatica?
With the batch file, we can run sessions either in sequential or in concurrently.
Grouping of Sessions is known as Batch.
Two types of batches:
1)Sequential: Runs Sessions one after another.
2)Concurrent: Run the Sessions at the same time.

f u have sessions with source-target dependencies u have to go for sequential batch to start the
sessions one after another. f u have several independent sessions u can use concurrent batches
Which run all the sessions at the same time

Q. What is joiner cache?
When we use the joiner transformation an integration service maintains the cache, all the records
are stored in joiner cache. Joiner caches have 2 types of cache 1.ndex cache 2. Joiner cache.

ndex cache stores all the port values which are participated in the join condition and data cache
have stored all ports which are not participated in the join condition.
Q. What is the location of parameter file in nformatica?
$PMBWPARAM
Q. How can you display only hidden files in UNX
$ ls -la
total 16
8 drwxrwxrwx 2 zzz yyy 4096 Apr 26 12:00 ./
8 drwxrwxrwx 9 zzz yyy 4096 Jul 31 16:59 ../

Correct answer is

ls -a|grep "^\."
$ls -a

Q. How to delete the data in the target table after loaded.
SQ---> Properties tab-->Post SQL
delete from target_tablename

SQL statements executed using the source database connection, after a pipeline is run write post
sql in target table as truncate table name. we have the property in session truncate option.
Q. What is polling in informatica?
t displays the updated information about the session in the monitor window. The monitor window
displays the status of each session when you poll the nformatica server.
Q. How i will stop my workflow after 10 errors
Session level property error handling mention condition stop on errors: 10

--->Config object > Error Handling > Stop on errors
Q. How can we calculate fact table size?
A fact table is multiple of combination of dimension tables

ie if we want 2 find the fact table size of 3years of historical date with 200 products and 200
stores
3*365*200*200=fact table size
Q. Without using emailtask how will send a mail from informatica?
by using 'mailx' command in unix of shell scripting
Q. How will compare two mappings in two different repositories?
in the designer client , goto mapping tab there is one
option that is 'compare', here we will compare two mappings in two different repository

in informatica designer go to mapping tab--->compare..

we can compare 2 folders within the same repository ..
we can compare 2 folders within different repository ..

Q. What is constraint based load order
Constraint based load order defines the order in which data loads into the multiple targets based
on primary key and foreign key relationship.
Q. What is target load plan
Suppose i have 3 pipelines in a single mapping designer

emp source--->sq--->tar1
dept source--->sq--->tar2
bonus source--->sq--->tar3

my requirement is to load first in tar2 then tar1 and then finally tar3

for this type of loading to control the extraction of data from source by source qualifier we use
target load plan.

Q. What is meant by data driven.. in which scenario we use that..?
Data driven is available at session level. it says that when we r using update strategy t/r ,how the
integration service fetches the data and how to update/insert row in the database log.
Data driven is nothing but instruct the source rows that should take action on target
i.e(update,delete,reject,insert). f we use the update strategy transformation in a mapping then will
select the data driven option in session.
Q. How to run workflow in unix?
Syntax: pmcmd startworkflow -sv <service name> -d <domain name> -u <user name> -p
<password> -f <folder name> <workflow name>

Example
Pmcmd start workflow service
${NFA_SERVCE} -domain
${NFA_DOMAN} -uv xxx_PMCMD_D -pv PSWD -folder
${ETLFolder} -wait ${ETLWorkflow} \
Q. What is the main difference between a Joiner Transformation and Union Transformation?
Joiner Transformation merge horizontally
Union Transformation merge vertically

A joiner Transformation is used to join data from hertogenous database ie (Sql database and flat
file) where has Union transformation is used to join data from
the same relational sources.....(oracle table and another Oracle table)

Join Transformation combines data record horizontally based on join condition.
And combine data from two different sources having different metadata.
Join transformation supports heterogeneous, homogeneous data source.

Union Transformation combines data record vertically from multiple sources, having same
metadata.
Union transformation also support heterogeneous data source.
Union transformation functions as UNON ALL set operator.





Q. What is constraint based loading exactly? And how to do this? think it is when we have
primary key-foreign key relationship. s it correct?
Constraint Based Load order defines load the data into multiple targets depend on the primary
key foreign key relation.

set the option is: Double click the session
Configure Object check the Constraint Based Loading

Q. Difference between top down(w.h inmon)and bottom up(ralph kimball)approach?
Top Down approach:-

As per W.H.NWON, first we need to build the Data warehouse after that we need to build up the
DataMart but this is so what difficult to maintain the DWH.

Bottom up approach;-

As per Ralph Kimbal, first we need to build up the Data Marts then we need to build up the
Datawarehouse..
this approach is most useful in real time while creating the Data warehouse.


Q. What are the different caches used in informatica?
Static cache
Dynamic cache
Shared cache
Persistent cache

Q. What is the command to get the list of files in a directory in unix?
$ls -lrt

Q. How to import multiple flat files in to single target where there is no common column in the flat
files
in workflow session properties in Mapping tab in properties choose Source filetype - ndirect
Give the Source filename : <file_path>

This <file_path> file should contain all the multiple files which you want to Load
Q. How to connect two or more table with single source qualifier?
Create a Oracle source with how much ever column you want and write the join query in SQL
query override. But the column order and data type should be same as in the SQL query.
Q. How to call unconnected lookup in expression transformation?
:LKP.LKP_NAME(PORTS)
Q. What is diff between connected and unconnected lookup?
Connected lookup:

t is used to join the two tables
it returns multiple rows
it must be in mapping pipeline
u can implement lookup condition
using connect lookup u can generate sequence numbers by
enabling dynamic lookup cache.

Unconnected lookup:
it returns single output through return port
it acts as a lookup function(:lkp)
it is called by another t/r.
not connected either source r target.
------
CONNECTED LOOKUP:
>> t will participated in data pipeline
>> t contains multiple inputs and multiple outputs.
>> t supported static and dynamic cache.

UNCONNECTED LOOKUP:
>> t will not participated in data pipeline
>> t contains multiple inputs and single output.
>> t supported static cache only.
Q. Types of partitioning in nformatica?
Partition 5 types

1. Simple pass through
2. Key range
3. Hash
4. Round robin
5. Database

Q. Which transformation uses cache?
1. Lookup transformation
2. Aggregator transformation
3. Rank transformation
4. Sorter transformation
5. Joiner transformation

Q. Explain about union transformation?
A union transformation is a multiple input group transformation, which is used to merge the data
from multiple sources similar to UNON All SQL statements to combine the results from 2 or more
sql statements.
Similar to UNON All statement, the union transformation doesn't remove duplicate rows. t is an
active transformation.

Q. Explain about Joiner transformation?
Joiner transformation is used to join source data from two related heterogeneous sources.
However this can also be used to join data from the same source. Joiner t/r join sources with at
least one matching column. t uses a condition that matches one or more pair of columns
between the 2 sources.
To configure a Joiner t/r various settings that we do are as below:
1) Master and detail source
2) Types of join
3) Condition of the join

Q. Explain about Lookup transformation?
Lookup t/r is used in a mapping to look up data in a relational table, flat file, view or synonym.
The informatica server queries the look up source based on the look up ports in the
transformation. t compares look up t/r port values to look up source column values based on the
look up condition.
Look up t/r is used to perform the below mentioned tasks:
1) To get a related value.
2) To perform a calculation.
3) To update SCD tables.

Q. How to identify this row for insert and this row for update in dynamic lookup cache?
Based on NEW LOOKUP ROW.. nformatica server indicates which one is insert and which one
is update.
Newlookuprow- 0...no change
Newlookuprow- 1...nsert
Newlookuprow- 2...update

Q. How many ways can we implement SCD2?
1) Date range
2) Flag
3) Versioning

Q. How will you check the bottle necks in informatica? From where do you start checking?
You start as per this order

1. Target
2. Source
3. Mapping
4. Session
5. System
Q. What is incremental aggregation?
When the aggregator transformation executes all the output data will get stored in the temporary
location called aggregator cache. When the next time the mapping runs the aggregator
transformation runs for the new records loaded after the first run. These output values will get
incremented with the values in the aggregator cache. This is called incremental aggregation. By
this way we can improve performance...
---------------------------

ncremental aggregation means applying only the captured changes in the source to aggregate
calculations in a session.
When the source changes only incrementally and if we can capture those changes, then we
can configure the session to process only those changes. This allows informatica server to
update target table incrementally, rather than forcing it to process the entire source and
recalculate the same calculations each time you run the session. By doing this obviously the
session performance increases.

Q. How can i explain my project architecture in interview..? Tell me your project flow from source
to target..?

Project architecture is like
1. Source Systems: Like Mainframe,Oracle,People soft,DB2.
2. Landing tables: These are tables act like source. Used for easy to access, for backup purpose,
as reusable for other mappings.
3. Staging tables: From landing tables we extract the data into staging tables after all validations
done on the data.
4. Dimension/Facts: These are the tables those are used for analysis and make decisions by
analyzing the data.
5. Aggregation tables: These tables have summarized data useful for managers who wants to
view monthly wise sales, year wise sales etc.
6. Reporting layer: 4 and 5 phases are useful for reporting developers to generate reports. hope
this answer helps you.

Q. What type of transformation is not supported by mapplets?

Normalizer transformation
COBOL sources, joiner
XML source qualifier transformation
XML sources
Target definitions
Pre & Post Session stored procedures
Other mapplets

Q. How informatica recognizes mapping?
All are organized by ntegration service.
Power center talks to ntegration Service and ntegration service talk to session. Session has
mapping Structure. These are flow of Execution.

Q. Can every transformation reusable? How?
Except source qualifier transformation, all transformations support reusable property. Reusable
transformation developed in two ways.
1. n mapping which transformation do you want to reuse, select the transformation and double
click on it, there you got option like make it as reusable transformation
option. There you need to check the option for converting non reusable to reusable
transformation. but except for source qualifier trans.
2. By using transformation developer

Q. What is Pre Sql and Post Sql?
Pre SQL means that the integration service runs SQL commands against the source database
before it reads the data from source.

Post SQL means integration service runs SQL commands against target database after it writes
to the target.


Q. nsert else update option in which situation we will use?

if the source table contain multiple records .if the record specified in the associated port to insert
into lookup cache. it does not find a record in the lookup cache when it is used find the particular
record & change the data in the associated port.
----------------------

We set this property when the lookup TRFM uses dynamic cache and the session property
TREAT SOURCE ROWS AS "nsert" has been set.
--------------------

This option we use when we want to maintain the history.

f records are not available in target table then it inserts the records in to target and records are
available in target table then it updates the records.

Q. What is an incremental loading? in which situations we will use incremental loading?

ncremental Loading is an approach. Let suppose you a mapping for load the data from employee
table to a employee_target table on the hire date basis. Again let suppose you already move the
employee data from source to target up to the employees hire date 31-12-2009.Your organization
now want to load data on employee_target today. Your target already have the data of that
employees having hire date up to 31-12-2009.so you now pickup the source data which are hiring
from 1-1-2010 to till date. That's why you needn't take the data before than that date, if you do
that wrongly it is overhead for loading data again in target which is already exists. So in source
qualifier you filter the records as per hire date and you can also parameterized the hire date that
help from which date you want to load data upon target.
This is the concept of ncremental loading.

Q. What is target update override?
By Default the integration service updates the target based on key columns. But we might want to
update non-key columns also, at that point of time we can override the
UPDATE statement for each target in the mapping. The target override affects only when the
source rows are marked as update by an update strategy in the mapping.

Q. What is the Mapping parameter and Mapping variable?
Mapping parameter: Mapping parameter is constant values that can be defined before mapping
run. A mapping parameter reuses the mapping for various constant values.

Mapping variable: Mapping variable is represent a value that can be change during the mapping
run that can be stored in repository the integration service retrieve that value from repository and
incremental value for next run.

Q. What is rank and dense rank in informatica with any examples and give sql query for this both
ranks
for eg: the file contains the records with column
100
200(repeated rows)
200
300
400
500
the rank function gives output as
1
2
2
4
5
6
and dense rank gives
1
2
2
3
4
5

for eg: the file contains the records with column
empno sal
100 1000
200(repeated rows) 2000
200 3000
300 4000
400 5000
500 6000

Rank :

select rank() over (partition by empno order by sal) from emp

1
2
2
4
5
6

Dense Rank
select dense_rank() over (partition by empno order by sal) from emp
and dense rank gives

1
2
2
3
4
5

Q. What is the incremental aggregation?
The first time you run an upgraded session using incremental aggregation, the ntegration Service
upgrades the index and data cache files. f you want to partition a session using a mapping with
incremental aggregation, the ntegration Service realigns the index and data cache files.

Q. What is session parameter?
Parameter file is a text file where we can define the values to the parameters .session parameters
are used for assign the database connection values

Q. What is mapping parameter?
A mapping parameter represents a constant value that can be defined before mapping run. A
mapping parameter defines a parameter file which is saved with an extension.prm a mapping
parameter reuse the various constant values.

Q. What is parameter file?
A parameter file can be a text file. Parameter file is to define the values for parameters and
variables used in a session. A parameter file is a file created by text editor such as word pad or
notepad. You can define the following values in parameter file
Mapping parameters
Mapping variables
Session parameters

Q. What is session override?
Session override is an option in informatica at session level. Here we can manually give a sql
query which is issued to the database when the session runs. t is nothing but over riding the
default sql which is generated by a particular transformation at mapping level.
Q. What are the diff. b/w informatica versions 8.1.1 and 8.6.1?
Little change in the Administrator Console. n 8.1.1 we can do all the creation of S and repository
Service, web service, Domain, node, grid ( if we have licensed version),n 8.6.1 the nformatica
Admin console we can manage both Domain page and security page. Domain Page means all
the above like creation of S and repository Service, web service, Domain, node, grid ( if we have
licensed version) etc. Security page means creation of users, privileges, LDAP configuration,
Export mport user and Privileges etc.

Q. What are the uses of a Parameter file?
Parameter file is one which contains the values of mapping variables.
type this in notepad.save it .
foldername.sessionname
$$inputvalue1=
---------------------------------
Parameter files are created with an extension of .PRM

These are created to pass values those can be changed for Mapping Parameter and Session
Parameter during mapping run.

Mapping Parameters:
A Parameter is defined in a parameter file for which a Parameter is create already in the Mapping
with Data Type , Precision and scale.

The Mapping parameter file syntax (xxxx.prm).
[FolderName.WF:WorkFlowName.ST:SessionName]
$$ParameterName1=Value
$$ParameterName2=Value

After that we have to select the properties Tab of Session and Set Parameter file name including
physical path of this xxxx.prm file.

Session Parameters:
The Session Parameter files syntax (yyyy.prm).
[FolderName.SessionName]
$nputFileValue1=Path of the source Flat file

After that we have to select the properties Tab of Session and Set Parameter file name including
physical path of this yyyy.prm file.

Do following changes in Mapping Tab of Source Qualifier's
Properties section
Attributes values

Source file Type ---------> Direct
Source File Directory --------> Empty
Source File Name --------> $nputFileValue1

Q. What is the default data driven operation in informatica?
This is default option for update strategy transformation.
The integration service follows instructions coded in update strategy within session mapping
determine how to flag records for insert,delete,update,reject. f you do not data driven option
setting, the integration service ignores update strategy transformations in the mapping.
Q. What is threshold error in informatica?
When the target is used by the update strategy DD_REJECT,DD_UPDATE and some limited
count, then if it the number of rejected records exceed the count then the
session ends with failed status. This error is called Threshold Error.


Q. SO many times i saw "$PM parser error ". What is meant by PM?
PM: POWER MART
1) Parsing error will come for the input parameter to the lookup.
2) nformatica is not able to resolve the input parameter CLASS for your lookup.
3) Check the Port CLASS exists as either input port or a variable port in your expression.
4) Check data type of CLASS and the data type of input parameter for your lookup.

Q. What is a candidate key?
A candidate key is a combination of attributes that can be uniquely used to identify a database
record without any extraneous data (unique). Each table may have one or more candidate keys.
One of these candidate keys is selected as the table primary key else are called Alternate Key.

Q. What is the difference between Bitmap and Btree index?
Bitmap index is used for repeating values.
ex: Gender: male/female
Account status:Active/nactive
Btree index is used for unique values.
ex: empid.
Q. What is ThroughPut in nformatica?
Thoughtput is the rate at which power centre server read the rows in bytes from source or writes
the rows in bytes into the target per second.

You can find this option in workflow monitor. Right click on session choose properties and
Source/Target Statictics tab you can find thoughtput details for each instance of source and
target.

Q. What are set operators in Oracle
UNON
UNON ALL
MNUS
NTERSECT

Q. How i can Schedule the nformatica job in "Unix Cron scheduling tool"?

Crontab
The crontab (cron derives from chronos, Greek for time; tab stands for table) command, found in
Unix and Unix-like operating systems, is used to schedule commands to be executed periodically.
To see what crontabs are currently running on your system, you can open a terminal and run:
sudo crontab -l
To edit the list of cronjobs you can run:
sudo crontab -e
This will open a the default editor (could be vi or pico, if you want you can change the default
editor) to let us manipulate the crontab. f you save and exit the editor, all your cronjobs are saved
into crontab. Cronjobs are written in the following format:
* * * * * /bin/execute/this/script.sh
Scheduling explained
As you can see there are 5 stars. The stars represent different date parts in the following order:
1. minute (from 0 to 59)
2. hour (from 0 to 23)
3. day of month (from 1 to 31)
4. month (from 1 to 12)
5. day of week (from 0 to 6) (0=Sunday)
Execute every minute
f you leave the star, or asterisk, it means every. Maybe
that's a bit unclear. Let's use the the previous example
again:
* * * * * /bin/execute/this/script.sh
They are all still asterisks! So this means
execute /bin/execute/this/script.sh:
1. every minute
2. of every hour
3. of every day of the month
4. of every month
5. and every day in the week.
n short: This script is being executed every minute.
Without exception.
Execute every Friday 1AM
So if we want to schedule the script to run at 1AM every
Friday, we would need the following cronjob:
0 1 * * 5 /bin/execute/this/script.sh
Get it? The script is now being executed when the system
clock hits:
1. minute: 0
2. of hour: 1
3. of day of month: * (every day of month)
4. of month: * (every month)
5. and weekday: 5 (=Friday)
Execute on weekdays 1AM
So if we want to schedule the script to run at 1AM every Friday, we would need the following
cronjob:
0 1 * * 1-5 /bin/execute/this/script.sh
Get it? The script is now being executed when the system
clock hits:
1. minute: 0
2. of hour: 1
3. of day of month: * (every day of month)
4. of month: * (every month)
5. and weekday: 1-5 (=Monday til Friday)
Execute 10 past after every hour on the 1st of every month
Here's another one, just for practicing
10 * 1 * * /bin/execute/this/script.sh
Fair enough, it takes some getting used to, but it offers great flexibility.

Q. Can anyone tell me the difference between persistence and dynamic caches? On which
conditions we are using these caches?
Dynamic:--
1)When you use a dynamic cache, the nformatica Server updates the lookup cache as it passes
rows to the target.
2)n Dynamic, we can update catch will New data also.
3) Dynamic cache, Not Reusable
(when we need Updated cache data, That only we need Dynamic Cache)

Persistent:--
1)a Lookup transformation to use a non-persistent or persistent cache. The PowerCenter Server
saves or deletes lookup cache files after a successful session based on the Lookup Cache
Persistent property.
2) Persistent, we are not able to update the catch with New data.
3) Persistent catch is Reusable.

(When we need Previous Cache data, That only we need Persistent Cache)
----------------------------------
few more additions to the above answer.....
1. Dynamic lookup allows modifying cache where as Persistent lookup does not allow us to
modify cache.
2. Dynamic lookup use 'newlookup row', a default port in the cache but persistent does use any
default ports in cache.
3.As session completes dynamic cache removed but the persistent cache saved in informatica
power centre server.

Q. How to obtain performance data for individual transformations?
There is a property at session level Collect Performance Data, you can select that property. t
gives you performance details for all the transformations.
Q. List of Active and Passive Transformations in nformatica?
Active Transformation - An active transformation changes the number of rows that pass through
the mapping.
Source Qualifier Transformation
Sorter Transformations
Aggregator Transformations
Filter Transformation
Union Transformation
Joiner Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Update Strategy Transformation
Advanced External Procedure Transformation
Passive Transformation - Passive transformations do not change the number of rows that pass
through the mapping.
Expression Transformation
Sequence Generator Transformation
Lookup Transformation
Stored Procedure Transformation
XML Source Qualifier Transformation
External Procedure Transformation
Q. Eliminating of duplicate records without using dynamic lookups?
Hi U can eliminate duplicate records by an simple one line SQL Query.

Select id, count (*) from seq1 group by id having count (*)>1;

Below are the ways to eliminate the duplicate records:
1. By enabling the option in Source Qualifier transformation as select distinct.
2. By enabling the option in sorter transformation as select distinct.
3. By enabling all the values as group by in Aggregator transformation.
Q. Can anyone give idea on how do we perform test load in informatica? What do we test as part
of test load in informatica?
With a test load, the nformatica Server reads and transforms data without writing to targets. The
nformatica Server does everything, as if running the full session. The nformatica Server writes
data to relational targets, but rolls back the data when the session completes. So, you can enable
collect performance details property and analyze the how efficient your mapping is. f the session
is running for a long time, you may like to find out the bottlenecks that are existing. t may be
bottleneck of type target, source, mapping etc.
The basic idea behind test load is to see the behavior of nformatica Server with your session.
Q. What is ODS (Operational Data Store)?
A collection of operation or bases data that is extracted from operation databases and
standardized, cleansed, consolidated, transformed, and loaded into enterprise data architecture.
An ODS is used to support data mining of operational data, or as the store for base data that is
summarized for a data warehouse.
The ODS may also be used to audit the data warehouse to assure summarized and derived data
is calculated properly. The ODS may further become the enterprise shared operational database,
allowing operational systems that are being reengineered to use the ODS as there operation
databases.

Q. How many tasks are there in informatica?
Session Task
Email Task
Command Task
Assignment Task
Control Task
Decision Task
Event-Raise
Event- Wait
Timer Task
Link Task


Q. What are business components in nformatica?

Domains
Nodes
Services

Q. WHAT S VERSONNG?
t's used to keep history of changes done on the mappings and workflows
1. Check in: You check in when you are done with your changes so that everyone can see those
changes.
2. Check out: You check out from the main stream when you want to make any change to the
mapping/workflow.
3. Version history: t will show you all the changes made and who made it.

Q. Diff between $$$sessstarttime and sessstarttime?
$$$SessStartTime - Returns session start time as a string value (String datatype)
SESSSTARTTME - Returns the date along with date timestamp (Date datatype)
Q. Difference between $,$$,$$$ in nformatica?
1. $ Refers
These are the system variables/Session Parameters like $Bad file,$input
file, $output file, $DB connection,$source,$target etc..
2. $$ Refers
User defined variables/Mapping Parameters like $$State,$$Time, $$Entity, $$Business_Date, $
$SRC,etc.
3. $$$ Refers
System Parameters like $$$SessStartTime
$$$SessStartTime returns the session start time as a string value. The format of the string
depends on the database you are using.
$$$SessStartTime returns the session start time as a string value --> The format of the string
depends on the database you are using.
Q. Finding Duplicate Rows based on Multiple Columns?
SELECT firstname, COUNT(firstname), surname, COUNT(surname), email, COUNT(email)
FROM employee
GROUP BY firstname, surname, email
HAVNG (COUNT(firstname) > 1) AND (COUNT(surname) > 1) AND (COUNT(email) > 1);

Q. Finding Nth Highest Salary in Oracle?
Pick out the Nth highest salary, say the 4th highest salary.
Select * from
(select ename,sal,dense_rank() over (order by sal desc) emp_rank from emp)
where emp_rank=4;
Q. Find out the third highest salary?
SELECT MN(sal) FROM emp WHERE
sal N (SELECT distinct TOP 3 sal FROM emp ORDER BY sal DESC);

Q. How do you handle error logic in nformatica? What are the transformations that you used
while handling errors? How did you reload those error records in target?
Row indicator: t generally happens when working with update strategy transformation. The
writer/target rejects the rows going to the target
Column indicator:
D -Valid
o - Overflow
n - Null
t - Truncate
When the data is with nulls, or overflow it will be rejected to write the data to the target
The reject data is stored on reject files. You can check the data and reload the data in to the
target using reject reload utility.
Q. Difference between STOP and ABORT?
Stop - f the ntegration Service is executing a Session task when you issue the stop command,
the ntegration Service stops reading data. t continues processing and writing data and
committing data to targets. f the ntegration Service cannot finish processing and committing
data, you can issue the abort command.
Abort - The ntegration Service handles the abort command for the Session task like the stop
command, except it has a timeout period of 60 seconds. f the ntegration Service cannot finish
processing and committing data within the timeout period, it kills the DTM process and terminates
the session.
Q. WHAT S NLNE VEW?
An inline view is term given to sub query in FROM clause of query which can be used as table.
nline view effectively is a named sub query
Ex : Select Tab1.col1,Tab1.col.2,nview.col1,nview.Col2
From Tab1, (Select statement) nview
Where Tab1.col1=nview.col1
SELECT DNAME, ENAME, SAL FROM EMP ,
(SELECT DNAME, DEPTNO FROM DEPT) D
WHERE A.DEPTNO = B.DEPTNO
n the above query (SELECT DNAME, DEPTNO FROM DEPT) D is the inline view.
nline views are determined at runtime, and in contrast to normal view they are not stored in the
data dictionary,
Disadvantage of using this is
1. Separate view need to be created which is an overhead
2. Extra time taken in parsing of view
This problem is solved by inline view by using select statement in sub query and using that as
table.
Advantage of using inline views:
1. Better query performance
2. Better visibility of code
Practical use of nline views:
1. Joining Grouped data with non grouped data
2. Getting data to use in another query
Q. WHAT S GENERATED KEY AND GENERATED COLUMN D N NORMALZER
TRANSFORMATON?
The integration service increments the generated key (GK) sequence number each time it
process a source row. When the source row contains a multiple-occurring column or a multiple-
occurring group of columns, the normalizer transformation returns a row for each occurrence.
Each row contains the same generated key value.
The normalizer transformation has a generated column D (GCD) port for each multiple-
occurring column. The GCD is an index for the instance of the multiple-occurring data. For
example, if a column occurs 3 times in a source record, the normalizer returns a value of 1, 2 or 3
in the generated column D.
Q. WHAT S DFFERENCE BETWEEN SUBSTR AND NSTR?
NSTR function search string for sub-string and returns an integer indicating the position of the
character in string that is the first character of this occurrence.
SUBSTR function returns a portion of string, beginning at character position, substring_length
characters long. SUBSTR calculates lengths using characters as defined by the input character
set.
Q. WHAT ARE DFFERENT ORACLE DATABASE OBJECTS?
TABLES
VEWS
NDEXES
SYNONYMS
SEQUENCES
TABLESPACES
Q. WHAT S @@ERROR?
The @@ERROR automatic variable returns the error code of the last Transact-SQL statement. f
there was no error, @@ERROR returns zero. Because @@ERROR is reset after each Transact-
SQL statement, it must be saved to a variable if it is needed to process it further after checking it.
Q. WHAT S DFFERENCE BETWEEN CO-RELATED SUB QUERY AND NESTED SUB
QUERY?
Correlated subquery runs once for each row selected by the outer query. t contains a reference
to a value from the row selected by the outer query.
Nested subquery runs only once for the entire nesting (outer) query. t does not contain any
reference to the outer query row.
For example,
Correlated Subquery:
Select e1.empname, e1.basicsal, e1.deptno from emp e1 where e1.basicsal = (select
max(basicsal) from emp e2 where e2.deptno = e1.deptno)
Nested Subquery:
Select empname, basicsal, deptno from emp where (deptno, basicsal) in (select deptno,
max(basicsal) from emp group by deptno)

Q. HOW DOES ONE ESCAPE SPECAL CHARACTERS WHEN BULDNG SQL QUERES?
The LKE keyword allows for string searches. The '_' wild card character is used to match exactly
one character, '%' is used to match zero or more occurrences of any characters. These
characters can be escaped in SQL. Example:
SELECT name FROM emp WHERE id LKE '%\_%' ESCAPE '\';
Use two quotes for every one displayed. Example:
SELECT 'Frankss Oracle site' FROM DUAL;
SELECT 'A quoted word.' FROM DUAL;
SELECT 'A double quoted word.' FROM DUAL;

Q. DFFERENCE BETWEEN SURROGATE KEY AND PRMARY KEY?
Surrogate key:
1. Query processing is fast.
2. t is only numeric
3. Developer develops the surrogate key using sequence generator transformation.
4. Eg: 12453
Primary key:
1. Query processing is slow
2. Can be alpha numeric
3. Source system gives the primary key.
4. Eg: C10999
Q. HOW DOES ONE ELMNATE DUPLCATE ROWS N AN ORACLE TABLE?
Method 1:
DELETE from table_name A
where rowid > (select min(rowid) from table_name B where A.key_values = B.key_values);
Method 2:
Create table table_name2 as select distinct * from table_name1;
drop table table_name1;
rename table table_name2 as table_name1;
n this method, all the indexes,constraints,triggers etc have to be re-created.
Method 3:
DELETE from table_name t1
where exists (select 'x' from table_name t2 where t1.key_value=t2.key_value and t1.rowid >
t2.rowid)
Method 4:
DELETE from table_name where rowid not in (select max(rowid) from my_table group by
key_value )
Q. QUERY TO RETREVE NTH ROW FROM AN ORACLE TABLE?
The query is as follows:
select * from my_table where rownum <= n
MNUS
select * from my_table where rownum < n;
Q. How does the server recognize the source and target databases?
f it is relational - By using ODBC connection
FTP connection - By using flat file
Q. WHAT ARE THE DFFERENT TYPES OF NDEXES SUPPORTED BY ORACLE?
1. B-tree index
2. B-tree cluster index
3. Hash cluster index
4. Reverse key index
5. Bitmap index
6. Function Based index

Q. TYPES OF NORMALZER TRANSFORMATON?
There are two types of Normalizer transformation.
VSAM Normalizer transformation
A non-reusable transformation that is a Source Qualifier transformation for a COBOL source. The
Mapping Designer creates VSAM Normalizer columns from a COBOL source in a mapping. The
column attributes are read-only. The VSAM Normalizer receives a multiple-occurring source
column through one input port.
Pipeline Normalizer transformation
A transformation that processes multiple-occurring data from relational tables or flat files. You
might choose this option when you want to process multiple-occurring data from another
transformation in the mapping.
A VSAM Normalizer transformation has one input port for a multiple-occurring column. A pipeline
Normalizer transformation has multiple input ports for a multiple-occurring column.
When you create a Normalizer transformation in the Transformation Developer, you create a
pipeline Normalizer transformation by default. When you create a pipeline Normalizer
transformation, you define the columns based on the data the transformation receives from
another type of transformation such as a Source Qualifier transformation.
The Normalizer transformation has one output port for each single-occurring input port.
Q. WHAT ARE ALL THE TRANSFORMATON YOU USED F SOURCE AS XML FLE?
XML Source Qualifier
XML Parser
XML Generator

Q. List the files in ascending order in UNX?
ls -lt (sort by last date modified)
ls ltr (reverse)
ls lS (sort by size of the file)
Q. How do identify the empty line in a flat file in UNX? How to remove it?
grep v ^$ filename
Q. How do send the session report (.txt) to manager after session is completed?
Email variable %a (attach the file) %g attach session log file
Q. How to check all the running processes in UNX?
$> ps ef
Q. How can i display only and only hidden file in the current directory?
ls -a|grep "^\."
Q. How to display the first 10 lines of a file?
# head -10 logfile
Q. How to display the last 10 lines of a file?
# tail -10 logfile
Q. How did you schedule sessions in your project?
1. Run once Set 2 parameter date and time when session should start.
2. Run Every nformatica server run session at regular interval as we configured, parameter
Days, hour, minutes, end on, end after, forever.
3. Customized repeat Repeat every 2 days, daily frequency hr, min, every week, every month.
Q. What is lookup override?
This feature is similar to entering a custom query in a Source Qualifier transformation. When
entering a Lookup SQL Override, you can enter the entire override, or generate and edit the
default SQL statement.
The lookup query override can include WHERE clause.
Q. What is Sql Override?
The Source Qualifier provides the SQL Query option to override the default query. You can enter
any SQL statement supported by your source database. You might enter your own SELECT
statement, or have the database perform aggregate calculations, or call a stored procedure or
stored function to read the data and perform some tasks.
Q. How to get sequence value using Expression?
v_temp = v_temp+1
o_seq = F(SNULL(v_temp), 0, v_temp)
Q. How to get Unique Record?
Source > SQ > SRT > EXP > FLT OR RTR > TGT
n Expression:
flag = Decode(true,eid=pre_eid, 'Y','N')
flag_out = flag
pre_eid = eid
Q. WHAT ARE THE DFFERENT TRANSACTON LEVELS AVALABLE N TRANSACTON
CONTROL TRANSFORMATON (TCL)?
The following are the transaction levels or built-in variables:
TC_CONTNUE_TRANSACTON: The ntegration Service does not perform any
transaction change for this row. This is the default value of the expression.
TC_COMMT_BEFORE: The ntegration Service commits the transaction, begins a new
transaction, and writes the current row to the target. The current row is in the new transaction.
TC_COMMT_AFTER: The ntegration Service writes the current row to the target, commits
the transaction, and begins a new transaction. The current row is in the committed transaction.
TC_ROLLBACK_BEFORE: The ntegration Service rolls back the current transaction,
begins a new transaction, and writes the current row to the target. The current row is in the new
transaction.
TC_ROLLBACK_AFTER: The ntegration Service writes the current row to the target, rolls
back the transaction, and begins a new transaction. The current row is in the rolled back
transaction.
Q. What is difference between grep and find?

Grep is used for finding any string in the file.
Syntax - grep <String> <filename>
Example - grep 'compu' details.txt
Display the whole line,in which line compu string is found.

Find is used to find the file or directory in given path,
Syntax - find <filename>
Example - find compu*
Display all file names starting with computer
Q. WHAT ARE THE DFFERENCE BETWEEN DDL, DML AND DCL COMMANDS?
DDL is Data Definition Language statements
CREATE to create objects in the database
ALTER alters the structure of the database
DROP delete objects from the database
TRUNCATE remove all records from a table, including all spaces allocated for the
records are removed
COMMENT add comments to the data dictionary
GRANT gives user's access privileges to database
REVOKE withdraw access privileges given with the GRANT command
DML is Data Manipulation Language statements
SELECT retrieve data from the a database
NSERT insert data into a table
UPDATE updates existing data within a table
DELETE deletes all records from a table, the space for the records remain
CALL call a PL/SQL or Java subprogram
EXPLAN PLAN explain access path to data
LOCK TABLE control concurrency
DCL is Data Control Language statements
COMMT save work done
SAVEPONT identify a point in a transaction to which you can later roll back
ROLLBACK restore database to original since the last COMMT
SET TRANSACTON Change transaction options like what rollback segment to use
Q. What is Stored Procedure?
A stored procedure is a named group of SQL statements that have been previously created and
stored in the server database. Stored procedures accept input parameters so that a single
procedure can be used over the network by several clients using different input data. And when
the procedure is modified, all clients automatically get the new version. Stored procedures reduce
network traffic and improve performance. Stored procedures can be used to help ensure the
integrity of the database.
Q. What is View?
A view is a tailored presentation of the data contained in one or more tables (or other views).
Unlike a table, a view is not allocated any storage space, nor does a view actually contain data;
rather, a view is defined by a query that extracts or derives data from the tables the view
references. These tables are called base tables.
Views present a different representation of the data that resides within the base tables. Views are
very powerful because they allow you to tailor the presentation of data to different types of users.
Views are often used to:
Provide an additional level of table security by restricting access to a predetermined set of
rows and/or columns of a table
Hide data complexity
Simplify commands for the user
Present the data in a different perspective from that of the base table
solate applications from changes in definitions of base tables
Express a query that cannot be expressed without using a view
Q. What is Trigger?
A trigger is a SQL procedure that initiates an action when an event (NSERT, DELETE or
UPDATE) occurs. Triggers are stored in and managed by the DBMS. Triggers are used to
maintain the referential integrity of data by changing the data in a systematic fashion. A trigger
cannot be called or executed; the DBMS automatically fires the trigger as a result of a data
modification to the associated table. Triggers can be viewed as similar to stored procedures in
that both consist of procedural logic that is stored at the database level. Stored procedures,
however, are not event-drive and are not attached to a specific table as triggers are. Stored
procedures are explicitly executed by invoking a CALL to the procedure while triggers are
implicitly executed. n addition, triggers can also execute stored Procedures.
Nested Trigger: A trigger can also contain NSERT, UPDATE and DELETE logic within itself, so
when the trigger is fired because of data modification it can also cause another data modification,
thereby firing another trigger. A trigger that contains data modification logic within itself is called a
nested trigger.
Q. What is View?
A simple view can be thought of as a subset of a table. t can be used for retrieving data, as well
as updating or deleting rows. Rows updated or deleted in the view are updated or deleted in the
table the view was created with. t should also be noted that as data in the original table changes,
so does data in the view, as views are the way to look at part of the original table. The results of
using a view are not permanently stored in the database. The data accessed through a view is
actually constructed using standard T-SQL select command and can come from one to many
different base tables or even other views.
Q. What is ndex?
An index is a physical structure containing pointers to the data. ndices are created in an existing
table to locate rows more quickly and efficiently. t is possible to create an index on one or more
columns of a table, and each index is given a name. The users cannot see the indexes; they are
just used to speed up queries. Effective indexes are one of the best ways to improve performance
in a database application. A table scan happens when there is no index available to help a query.
n a table scan SQL Server examines every row in the table to satisfy the query results. Table
scans are sometimes unavoidable, but on large tables, scans have a terrific impact on
performance. Clustered indexes define the physical sorting of a database table's rows in the
storage media. For this reason, each database table may
have only one clustered index. Non-clustered indexes are created outside of the database table
and contain a sorted list of references to the table itself.
Q. What is the difference between clustered and a non-clustered index?
A clustered index is a special type of index that reorders the way records in the table are
physically stored. Therefore table can have only one clustered index. The leaf nodes of a
clustered index contain the data pages. A nonclustered index is a special type of index in which
the logical order of the index does not match the physical stored order of the rows on disk. The
leaf node of a nonclustered index does not consist of the data pages. nstead, the leaf nodes
contain index rows.
Q. What is Cursor?
Cursor is a database object used by applications to manipulate data in a set on a row-by row
basis, instead of the typical SQL commands that operate on all the rows in the set at one time.
n order to work with a cursor we need to perform some steps in the following order:
Declare cursor
Open cursor
Fetch row from the cursor
Process fetched row
Close cursor
Deallocate cursor
Q. What is the difference between a HAVNG CLAUSE and a WHERE CLAUSE?
1. Specifies a search condition for a group or an aggregate. HAVNG can be used only with the
SELECT statement.
2. HAVNG is typically used in a GROUP BY clause. When GROUP BY is not used, HAVNG
behaves like a WHERE clause.
3. Having Clause is basically used only with the GROUP BY function in a query. WHERE Clause
is applied to each row before they are part of the GROUP BY function in a query.
RANK CACHE
Sample Rank Mapping
When the Power Center Server runs a session with a Rank transformation, it compares an input
row with rows in the data cache. f the input row out-ranks a Stored row, the Power Center Server
replaces the stored row with the input row.
Example: Power Center caches the first 5 rows if we are finding top 5 salaried Employees. When
6th row is read, it compares it with 5 rows in cache and places it in Cache is needed.
1) RANK NDEX CACHE:
The index cache holds group information from the group by ports. f we are Using Group By on
DEPTNO, then this cache stores values 10, 20, 30 etc.
All Group By Columns are in RANK NDEX CACHE. Ex. DEPTNO
2) RANK DATA CACHE:
t holds row data until the Power Center Server completes the ranking and is generally larger than
the index cache. To reduce the data cache size, connect only the necessary input/output ports to
subsequent transformations.
All Variable ports if there, Rank Port, All ports going out from RANK Transformations are stored in
RANK DATA CACHE.
Example: All ports except DEPTNO n our mapping example.
Aggregator Caches
1. The Power Center Server stores data in the aggregate cache until it completes Aggregate
calculations.
2. t stores group values in an index cache and row data in the data cache. f the Power Center
Server requires more space, it stores overflow values in cache files.
Note: The Power Center Server uses memory to process an Aggregator transformation with
sorted ports. t does not use cache memory. We do not need to configure cache memory for
Aggregator transformations that use sorted ports.
1) Aggregator ndex Cache:
The index cache holds group information from the group by ports. f we are using Group By on
DEPTNO, then this cache stores values 10, 20, 30 etc.
All Group By Columns are in AGGREGATOR NDEX CACHE. Ex. DEPTNO
2) Aggregator Data Cache:
DATA CACHE is generally larger than the AGGREGATOR NDEX CACHE.
Columns in Data Cache:
Variable ports if any
Non group by input/output ports.
Non group by input ports used in non-aggregate output expression.
Port containing aggregate function
JONER CACHES
Joiner always caches the MASTER table. We cannot disable caching. t builds ndex cache and
Data Cache based on MASTER table.
1) Joiner ndex Cache:
All Columns of MASTER table used in Join condition are in JONER NDEX CACHE.
Example: DEPTNO in our mapping.
2) Joiner Data Cache:
Master column not in join condition and used for output to other transformation or target table are
in Data Cache.
Example: DNAME and LOC in our mapping example.
Lookup Cache Files
1. Lookup ndex Cache:
Stores data for the columns used in the lookup condition.
2. Lookup Data Cache:
For a connected Lookup transformation, stores data for the connected output ports, not
including ports used in the lookup condition.
For an unconnected Lookup transformation, stores data from the return port.
OLTP and OLAP
Logical Data Modeling Vs Physical Data Modeling
Router Transformation And Filter Transformation
Source Qualifier And Lookup Transformation
Mapping And Mapplet
Joiner Transformation And Lookup Transformation
Dimension Table and Fact Table
Connected Lookup and Unconnected Lookup
Connected Lookup
Unconnected Lookup
Receives input values directly from the pipeline.
Receives input values from the result of a :LKP expression in another transformation.
We can use a dynamic or static cache.
We can use a static cache.
Cache includes all lookup columns used in the mapping.
Cache includes all lookup/output ports in the lookup condition and the lookup/return port.
f there is no match for the lookup condition, the Power Center Server returns the default value for
all output ports.
f there is no match for the lookup condition, the Power Center Server returns NULL.
f there is a match for the lookup condition, the Power Center Server returns the result of the
lookup condition for all lookup/output ports.
f there is a match for the lookup condition, the Power Center Server returns the result of the
lookup condition into the return port.
Pass multiple output values to another transformation.
Pass one output value to another transformation.
Supports user-defined default values
Does not support user-defined default values.
Cache Comparison
Persistence and Dynamic Caches
Dynamic
1) When you use a dynamic cache, the nformatica Server updates the lookup cache as it passes
rows to the target.
2) n Dynamic, we can update catch will new data also.
3) Dynamic cache, Not Reusable.
(When we need updated cache data, That only we need Dynamic Cache)
Persistent
1) A Lookup transformation to use a non-persistent or persistent cache. The PowerCenter Server
saves or deletes lookup cache files after a successful session based on the Lookup Cache
Persistent property.
2) Persistent, we are not able to update the catch with new data.
3) Persistent catch is Reusable.
(When we need previous cache data, that only we need Persistent Cache)
View And Materialized View
Star Schema And Snow Flake Schema
nformatica - Transformations
n nformatica, Transformations help to transform the source data according to the requirements
of target system and it ensures the quality of the data being loaded into target.
Transformations are of two types: Active and Passive.
Active Transformation
An active transformation can change the number of rows that pass through it from source to
target. (i.e) t eliminates rows that do not meet the condition in transformation.
Passive Transformation
A passive transformation does not change the number of rows that pass through it (i.e) t passes
all rows through the transformation.
Transformations can be Connected or Unconnected.
Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the
mapping.
Unconnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. t is
called within another transformation, and returns a value to that transformation.
Following are the list of Transformations available in nformatica:
Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Sequence Generator Transformation
Stored Procedure Transformation
Sorter Transformation
Update Strategy Transformation
XML Source Qualifier Transformation
n the following pages, we will explain all the above nformatica Transformations and their
significances in the ETL process in detail.
=========================================================================
=====
Aggregator Transformation
Aggregator transformation is an Active and Connected transformation.
This transformation is useful to perform calculations such as averages and sums (mainly to
perform calculations on multiple rows or groups).
For example, to calculate total of daily sales or to calculate average of monthly or yearly sales.
Aggregate functions such as AVG, FRST, COUNT, PERCENTLE, MAX, SUM etc. can be used
in aggregate transformation.
=========================================================================
=====
Expression Transformation
Expression transformation is a Passive and Connected transformation.
This can be used to calculate values in a single row before writing to the target.
For example, to calculate discount of each product
or to concatenate first and last names
or to convert date to a string field.
=========================================================================
=====
Filter Transformation
Filter transformation is an Active and Connected transformation.
This can be used to filter rows in a mapping that do not meet the condition.
For example,
To know all the employees who are working in Department 10 or
To find out the products that falls between the rate category $500 and $1000.
=========================================================================
=====
Joiner Transformation
Joiner Transformation is an Active and Connected transformation. This can be used to join two
sources coming from two different locations or from same location. For example, to join a flat file
and a relational source or to join two flat files or to join a relational source and a XML source.
n order to join two sources, there must be at least one matching port. While joining two sources it
is a must to specify one source as master and the other as detail.
The Joiner transformation supports the following types of joins:
1)Normal
2)Master Outer
3)Detail Outer
4)Full Outer
Normal join discards all the rows of data from the master and detail source that do not match,
based on the condition.
Master outer join discards all the unmatched rows from the master source and keeps all the rows
from the detail source and the matching rows from the master source.
Detail outer join keeps all rows of data from the master source and the matching rows from the
detail source. t discards the unmatched rows from the detail source.
Full outer join keeps all rows of data from both the master and detail sources.
=========================================================================
=====
Lookup transformation
Lookup transformation is Passive and it can be both Connected and UnConnected as well. t is
used to look up data in a relational table, view, or synonym. Lookup definition can be imported
either from source or from target tables.
For example, if we want to retrieve all the sales of a product with an D 10 and assume that the
sales data resides in another table. Here instead of using the sales table as one more source,
use Lookup transformation to lookup the data for the product, with D 10 in sales table.
Connected lookup receives input values directly from mapping pipeline whereas
Unconnected lookup receives values from: LKP expression from another transformation.
Connected lookup returns multiple columns from the same row whereas
Unconnected lookup has one return port and returns one column from each row.
Connected lookup supports user-defined default values whereas
Unconnected lookup does not support user defined values.
=========================================================================
=====
Normalizer Transformation
Normalizer Transformation is an Active and Connected transformation.
t is used mainly with COBOL sources where most of the time data is stored in de-normalized
format.
Also, Normalizer transformation can be used to create multiple rows from a single row of data.
=========================================================================
=====
Rank Transformation
Rank transformation is an Active and Connected transformation.
t is used to select the top or bottom rank of data.
For example,
To select top 10 Regions where the sales volume was very high
or
To select 10 lowest priced products.
=========================================================================
=====
Router Transformation
Router is an Active and Connected transformation. t is similar to filter transformation.
The only difference is, filter transformation drops the data that do not meet the condition whereas
router has an option to capture the data that do not meet the condition. t is useful to test multiple
conditions.
t has input, output and default groups.
For example, if we want to filter data like where State=Michigan, State=California, State=New
York and all other States. t's easy to route data to different tables.
=========================================================================
=====
Sequence Generator Transformation
Sequence Generator transformation is a Passive and Connected transformation. t is used to
create unique primary key values or cycle through a sequential range of numbers or to replace
missing keys.
t has two output ports to connect transformations. By default it has two fields CURRVAL and
NEXTVAL (You cannot add ports to this transformation).
NEXTVAL port generates a sequence of numbers by connecting it to a transformation or target.
CURRVAL is the NEXTVAL value plus one or NEXTVAL plus the ncrement By value.
=========================================================================
=====
Sorter Transformation
Sorter transformation is a Connected and an Active transformation.
t allows sorting data either in ascending or descending order according to a specified field.
Also used to configure for case-sensitive sorting, and specify whether the output rows should be
distinct.
=========================================================================
=====
Source Qualifier Transformation
Source Qualifier transformation is an Active and Connected transformation. When adding a
relational or a flat file source definition to a mapping, it is must to connect it to a Source Qualifier
transformation.
The Source Qualifier performs the various tasks such as
Overriding Default SQL query,
Filtering records;
join data from two or more tables etc.
=========================================================================
=====
Stored Procedure Transformation
Stored Procedure transformation is a Passive and Connected & Unconnected transformation. t is
useful to automate time-consuming tasks and it is also used in error handling, to drop and
recreate indexes and to determine the space in database, a specialized calculation etc.
The stored procedure must exist in the database before creating a Stored Procedure
transformation, and the stored procedure can exist in a source, target, or any database with a
valid connection to the nformatica Server. Stored Procedure is an executable script with SQL
statements and control statements, user-defined variables and conditional statements.
=========================================================================
=====
Update Strategy Transformation
Update strategy transformation is an Active and Connected transformation.
t is used to update data in target table, either to maintain history of data or recent changes.
You can specify how to treat source rows in table, insert, update, delete or data driven.
=========================================================================
=====
XML Source Qualifier Transformation
XML Source Qualifier is a Passive and Connected transformation.
XML Source Qualifier is used only with an XML source definition.
t represents the data elements that the nformatica Server reads when it executes a session with
XML sources.
=========================================================================
=====
Constraint-Based Loading
n the Workflow Manager, you can specify constraint-based loading for a session. When you
select this option, the ntegration Service orders the target load on a row-by-row basis. For every
row generated by an active source, the ntegration Service loads the corresponding transformed
row first to the primary key table, then to any foreign key tables. Constraint-based loading
depends on the following requirements:
Active source: Related target tables must have the same active source.
Key relationships: Target tables must have key relationships.
Target connection groups: Targets must be in one target connection group.
Treat rows as insert. Use this option when you insert into the target. You cannot use updates with
constraint based loading.
Active Source:
When target tables receive rows from different active sources, the ntegration Service reverts to
normal loading for those tables, but loads all other targets in the session using constraint-based
loading when possible. For example, a mapping contains three distinct pipelines. The first two
contain a source, source qualifier, and target. Since these two targets receive data from different
active sources, the ntegration Service reverts to normal loading for both targets. The third
pipeline contains a source, Normalizer, and two targets. Since these two targets share a single
active source (the Normalizer), the ntegration Service performs constraint-based loading: loading
the primary key table first, then the foreign key table.
Key Relationships:
When target tables have no key relationships, the ntegration Service does not perform
constraint-based loading.
Similarly, when target tables have circular key relationships, the ntegration Service reverts to a
normal load. For example, you have one target containing a primary key and a foreign key related
to the primary key in a second target. The second target also contains a foreign key that
references the primary key in the first target. The ntegration Service cannot enforce constraint-
based loading for these tables. t reverts to a normal load.
Target Connection Groups:
The ntegration Service enforces constraint-based loading for targets in the same target
connection group. f you want to specify constraint-based loading for multiple targets that receive
data from the same active source, you must verify the tables are in the same target connection
group. f the tables with the primary key-foreign key relationship are in different target connection
groups, the ntegration Service cannot enforce constraint-based loading when you run the
workflow. To verify that all targets are in the same target connection group, complete the following
tasks:
Verify all targets are in the same target load order group and receive data from the same
active source.
Use the default partition properties and do not add partitions or partition points.
Define the same target type for all targets in the session properties.
Define the same database connection name for all targets in the session properties.
Choose normal mode for the target load type for all targets in the session properties.
Treat Rows as nsert:
Use constraint-based loading when the session option Treat Source Rows As is set to insert. You
might get inconsistent data if you select a different Treat Source Rows As option and you
configure the session for constraint-based loading.
When the mapping contains Update Strategy transformations and you need to load data to a
primary key table first, split the mapping using one of the following options:
Load primary key table in one mapping and dependent tables in another mapping. Use
constraint-based loading to load the primary table.
Perform inserts in one mapping and updates in another mapping.
Constraint-based loading does not affect the target load ordering of the mapping. Target load
ordering defines the order the ntegration Service reads the sources in each target load order
group in the mapping. A target load order group is a collection of source qualifiers,
transformations, and targets linked together in a mapping. Constraint based loading establishes
the order in which the ntegration Service loads individual targets within a set of targets receiving
data from a single source qualifier.
Example
The following mapping is configured to perform constraint-based loading:
n the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing
the T1 primary key. T_3 has a primary key that T_4 references as a foreign key.
Since these tables receive records from a single active source, SQ_A, the ntegration Service
loads rows to the target in the following order:
1. T_1
2. T_2 and T_3 (in no particular order)
3. T_4
The ntegration Service loads T_1 first because it has no foreign key dependencies and contains
a primary key referenced by T_2 and T_3. The ntegration Service then loads T_2 and T_3, but
since T_2 and T_3 have no dependencies, they are not loaded in any particular order. The
ntegration Service loads T_4 last, because it has a foreign key that references a primary key in
T_3.After loading the first set of targets, the ntegration Service begins reading source B. f there
are no key relationships between T_5 and T_6, the ntegration Service reverts to a normal load
for both targets.
f T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data
from a single active source, the Aggregator AGGTRANS, the ntegration Service loads rows to
the tables in the following order:
T_5
T_6
T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database
connection for each target, and you use the default partition properties. T_5 and T_6 are in
another target connection group together if you use the same database connection for each
target and you use the default partition properties. The ntegration Service includes T_5 and T_6
in a different target connection group because they are in a different target load order group from
the first four targets.
Enabling Constraint-Based Loading:
When you enable constraint-based loading, the ntegration Service orders the target load on a
row-by-row basis. To enable constraint-based loading:
1. n the General Options settings of the Properties tab, choose nsert for the Treat Source Rows
As property.
2. Click the Config Object tab. n the Advanced settings, select Constraint Based Load Ordering.
3. Click OK.
Target Load Plan
When you use a mapplet in a mapping, the Mapping Designer lets you set the target load plan for
sources within the mapplet.
Setting the Target Load Order
You can configure the target load order for a mapping containing any type of target definition. n
the Designer, you can set the order in which the ntegration Service sends rows to targets in
different target load order groups in a mapping. A target load order group is the collection of
source qualifiers, transformations, and targets linked together in a mapping. You can set the
target load order if you want to maintain referential integrity when inserting, deleting, or updating
tables that have the primary key and foreign key constraints.
The ntegration Service reads sources in a target load order group concurrently, and it processes
target load order groups sequentially.
To specify the order in which the ntegration Service sends data to targets, create one source
qualifier for each target within a mapping. To set the target load order, you then determine in
which order the ntegration Service reads each source in the mapping.
The following figure shows two target load order groups in one mapping:
n this mapping, the first target load order group includes TEMS, SQ_TEMS, and T_TEMS. The
second target load order group includes all other objects in the mapping, including the
TOTAL_ORDERS target. The ntegration Service processes the first target load order group, and
then the second target load order group.
When it processes the second target load order group, it reads data from both sources at the
same time.
To set the target load order:
Create a mapping that contains multiple target load order groups.
Click Mappings > Target Load Plan.
The Target Load Plan dialog box lists all Source Qualifier transformations in the mapping
and the targets that receive data from each source qualifier.
Select a source qualifier from the list.
Click the Up and Down buttons to move the source qualifier within the load order.
Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click OK.
Mapping Parameters & Variables
Mapping parameters and variables represent values in mappings and mapplets.
When we use a mapping parameter or variable in a mapping, first we declare the mapping
parameter or variable for use in each mapplet or mapping. Then, we define a value for the
mapping parameter or variable before we run the session.
Mapping Parameters
A mapping parameter represents a constant value that we can define before running a session.
A mapping parameter retains the same value throughout the entire session.
Example: When we want to extract records of a particular month during ETL process, we will
create a Mapping Parameter of data type and use it in query to compare it with the timestamp
field in SQL override.
After we create a parameter, it appears in the Expression Editor.
We can then use the parameter in any expression in the mapplet or mapping.
We can also use parameters in a source qualifier filter, user-defined join, or extract override, and
in the Expression Editor of reusable transformations.
Mapping Variables
Unlike mapping parameters, mapping variables are values that can change between sessions.
The ntegration Service saves the latest value of a mapping variable to the repository at the
end of each successful session.
We can override a saved value with the parameter file.
We can also clear all saved values for the session in the Workflow Manager.
We might use a mapping variable to perform an incremental read of the source. For example, we
have a source table containing time stamped transactions and we want to evaluate the
transactions on a daily basis. nstead of manually entering a session override to filter source data
each time we run the session, we can create a mapping variable, $$ncludeDateTime. n the
source qualifier, create a filter to read only rows whose transaction date equals $
$ncludeDateTime, such as:
TMESTAMP = $$ncludeDateTime
n the mapping, use a variable function to set the variable value to increment one day each time
the session runs. f we set the initial value of $$ncludeDateTime to 8/1/2004, the first time the
ntegration Service runs the session, it reads only rows dated 8/1/2004. During the session, the
ntegration Service sets $$ncludeDateTime to 8/2/2004. t saves 8/2/2004 to the repository at the
end of the session. The next time it runs the session, it reads only rows from August 2, 2004.
Used in following transformations:
Expression
Filter
Router
Update Strategy
nitial and Default Value:
When we declare a mapping parameter or variable in a mapping or a mapplet, we can enter an
initial value. When the ntegration Service needs an initial value, and we did not declare an initial
value for the parameter or variable, the ntegration Service uses a default value based on the
data type of the parameter or variable.
Data ->Default Value
Numeric ->0
String ->Empty String
Date time ->1/1/1
Variable Values: Start value and current value of a mapping variable
Start Value:
The start value is the value of the variable at the start of the session. The ntegration Service
looks for the start value in the following order:
Value in parameter file
Value saved in the repository
nitial value
Default value
Current Value:
The current value is the value of the variable as the session progresses. When a session starts,
the current value of a variable is the same as the start value. The final current value for a variable
is saved to the repository at the end of a successful session. When a session fails to complete,
the ntegration Service does not update the value of the variable in the repository.
Note: f a variable function is not used to calculate the current value of a mapping variable, the
start value of the variable is saved to the repository.
Variable Data type and Aggregation Type When we declare a mapping variable in a mapping, we
need to configure the Data type and aggregation type for the variable. The S uses the aggregate
type of a Mapping variable to determine the final current value of the mapping variable.
Aggregation types are:
Count: nteger and small integer data types are valid only.
Max: All transformation data types except binary data type are valid.
Min: All transformation data types except binary data type are valid.
Variable Functions
Variable functions determine how the ntegration Service calculates the current value of a
mapping variable in a pipeline.
SetMaxVariable: Sets the variable to the maximum value of a group of values. t ignores rows
marked for update, delete, or reject. Aggregation type set to Max.
SetMinVariable: Sets the variable to the minimum value of a group of values. t ignores rows
marked for update, delete, or reject. Aggregation type set to Min.
SetCountVariable: ncrements the variable value by one. t adds one to the variable value when a
row is marked for insertion, and subtracts one when the row is Marked for deletion. t ignores
rows marked for update or reject. Aggregation type set to Count.
SetVariable: Sets the variable to the configured value. At the end of a session, it compares the
final current value of the variable to the start value of the variable. Based on the aggregate type of
the variable, it saves a final value to the repository.
Creating Mapping Parameters and Variables
Open the folder where we want to create parameter or variable.
n the Mapping Designer, click Mappings > Parameters and Variables. -or- n the Mapplet
Designer, click Mapplet > Parameters and Variables.
Click the add button.
Enter name. Do not remove $$ from name.
Select Type and Data type. Select Aggregation type for mapping variables.
Give nitial Value. Click ok.
Example: Use of Mapping of Mapping Parameters and Variables
EMP will be source table.
Create a target table MP_MV_EXAMPLE having columns: EMPNO, ENAME, DEPTNO,
TOTAL_SAL, MAX_VAR, MN_VAR, COUNT_VAR and SET_VAR.
TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that changes every
month)
SET_VAR: We will be added one month to the HREDATE of every employee.
Create shortcuts as necessary.
Creating Mapping
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex: m_mp_mv_example
4. Drag EMP and target table.
5. Transformation -> Create -> Select Expression for list -> Create > Done.
6. Drag EMPNO, ENAME, HREDATE, SAL, COMM and DEPTNO to Expression.
7. Create Parameter $$Bonus and Give initial value as 200.
8. Create variable $$var_max of MAX aggregation type and initial value 1500.
9. Create variable $$var_min of MN aggregation type and initial value 1500.
10. Create variable $$var_count of COUNT aggregation type and initial value 0. COUNT
is visible when datatype is NT or SMALLNT.
11. Create variable $$var_set of MAX aggregation type.
12. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MN_VAR,
out_COUNT_VAR and out_SET_VAR.
13. Open expression editor for TOTAL_SAL. Do the same as we did earlier for SAL+ COMM. To
add $$BONUS to it, select variable tab and select the parameter from mapping parameter. SAL +
COMM + $$Bonus
14. Open Expression editor for out_max_var.
15. Select the variable function SETMAXVARABLE from left side pane. Select
$$var_max from variable tab and SAL from ports tab as shown below. SETMAXVARABLE($
$var_max,SAL)
17. Open Expression editor for out_min_var and write the following expression:
SETMNVARABLE($$var_min,SAL). Validate the expression.
18. Open Expression editor for out_count_var and write the following expression:
SETCOUNTVARABLE($$var_count). Validate the expression.
19. Open Expression editor for out_set_var and write the following expression:
SETVARABLE($$var_set,ADD_TO_DATE(HREDATE,'MM',1)). Validate.
20. Click OK. Expression Transformation below:
21. Link all ports from expression to target and Validate Mapping and Save it.
22. See mapping picture on next page.
PARAMETER FLE
A parameter file is a list of parameters and associated values for a workflow, worklet, or session.
Parameter files provide flexibility to change these variables each time we run a workflow or
session.
We can create multiple parameter files and change the file we use for a session or workflow. We
can create a parameter file using a text editor such as WordPad or Notepad.
Enter the parameter file name and directory in the workflow or session properties.
A parameter file contains the following types of parameters and variables:
Workflow variable: References values and records information in a workflow.
Worklet variable: References values and records information in a worklet. Use predefined worklet
variables in a parent workflow, but we cannot use workflow variables from the parent workflow in
a worklet.
Session parameter: Defines a value that can change from session to session, such as a database
connection or file name.
Mapping parameter and Mapping variable
USNG A PARAMETER FLE
Parameter files contain several sections preceded by a heading. The heading identifies the
ntegration Service, ntegration Service process, workflow, worklet, or session to which we want
to assign parameters or variables.
Make session and workflow.
Give connection information for source and target table.
Run workflow and see result.
Sample Parameter File for Our example:
n the parameter file, folder and session names are case sensitive.
Create a text file in notepad with name Para_File.txt
[Practice.ST:s_m_MP_MV_Example]
$$Bonus=1000
$$var_max=500
$$var_min=1200
$$var_count=0
CONFGURNG PARAMTER FLE
We can specify the parameter file name and directory in the workflow or session properties.
To enter a parameter file in the workflow properties:
1. Open a Workflow in the Workflow Manager.
2. Click Workflows > Edit.
3. Click the Properties tab.
4. Enter the parameter directory and name in the Parameter Filename field.
5. Click OK.
To enter a parameter file in the session properties:
1. Open a session in the Workflow Manager.
2. Click the Properties tab and open the General Options settings.
3. Enter the parameter directory and name in the Parameter Filename field.
4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt
5. Click OK.
Mapplet
A mapplet is a reusable object that we create in the Mapplet Designer.
t contains a set of transformations and lets us reuse that transformation logic in multiple
mappings.
Created in Mapplet Designer in Designer Tool.
We need to use same set of 5 transformations in say 10 mappings. So instead of making 5
transformations in every 10 mapping, we create a mapplet of these 5 transformations. Now we
use this mapplet in all 10 mappings. Example: To create a surrogate key in target. We create a
mapplet using a stored procedure to create Primary key for target table. We give target table
name and key column name as input to mapplet and get the Surrogate key as output.
Mapplets help simplify mappings in the following ways:
nclude source definitions: Use multiple source definitions and source qualifiers to provide source
data for a mapping.
Accept data from sources in a mapping
nclude multiple transformations: As many transformations as we need.
Pass data to multiple transformations: We can create a mapplet to feed data to multiple
transformations. Each Output transformation in a mapplet represents one output group in a
mapplet.
Contain unused ports: We do not have to connect all mapplet input and output ports in a
mapping.
Mapplet nput:
Mapplet input can originate from a source definition and/or from an nput transformation in the
mapplet. We can create multiple pipelines in a mapplet.
We use Mapplet nput transformation to give input to mapplet.
Use of Mapplet nput transformation is optional.
Mapplet Output:
The output of a mapplet is not connected to any target table.
We must use Mapplet Output transformation to store mapplet output.
A mapplet must contain at least one Output transformation with at least one connected port in the
mapplet.
Example1: We will join EMP and DEPT table. Then calculate total salary. Give the output to
mapplet out transformation.
EMP and DEPT will be source tables.
Output will be given to transformation Mapplet_Out.
Steps:
Open folder where we want to create the mapping.
Click Tools -> Mapplet Designer.
Click Mapplets-> Create-> Give name. Ex: mplt_example1
Drag EMP and DEPT table.
Use Joiner transformation as described earlier to join them.
Transformation -> Create -> Select Expression for list -> Create -> Done
Pass all ports from joiner to expression and then calculate total salary as described in
expression transformation.
Now Transformation -> Create -> Select Mapplet Out from list > Create -> Give name and
then done.
Pass all ports from expression to Mapplet output.
Mapplet -> Validate
Repository -> Save
Use of mapplet in mapping:
We can mapplet in mapping by just dragging the mapplet from mapplet folder on left pane as we
drag source and target tables.
When we use the mapplet in a mapping, the mapplet object displays only the ports from the nput
and Output transformations. These are referred to as the mapplet input and mapplet output ports.
Make sure to give correct connection information in session.
Making a mapping: We will use mplt_example1, and then create a filter
transformation to filter records whose Total Salary is >= 1500.
mplt_example1 will be source.
Create target table same as Mapplet_out transformation as in picture above. Creating
Mapping
Open folder where we want to create the mapping.
Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_mplt_example1
Drag mplt_Example1 and target table.
Transformation -> Create -> Select Filter for list -> Create -> Done.
Drag all ports from mplt_example1 to filter and give filter condition.
Connect all ports from filter to target. We can add more transformations after filter if
needed.
Validate mapping and Save it.
Make session and workflow.
Give connection information for mapplet source tables.
Give connection information for target table.
Run workflow and see result.
ndirect Loading For Flat Files
Suppose, you have 10 flat files of same structure. All the flat files have same number of columns
and data type. Now we need to transfer all the 10 files to same target.
Names of files are say EMP1, EMP2 and so on.
Solution1:
1. mport one flat file definition and make the mapping as per need.
2. Now in session give the Source File name and Source File Directory location of one file.
3. Make workflow and run.
4. Now open session after workflow completes. Change the Filename and Directory to give
information of second file. Run workflow again.
5. Do the above for all 10 files.
Solution2:
1. mport one flat file definition and make the mapping as per need.
2. Now in session give the Source Directory location of the files.
3. Now in Fieldname use $nputFileName. This is a session parameter.
4. Now make a parameter file and give the value of $nputFileName.
$nputFileName=EMP1.txt
5. Run the workflow
6. Now edit parameter file and give value of second file. Run workflow again.
7. Do same for remaining files.
Solution3:
1. mport one flat file definition and make the mapping as per need.
2. Now make a notepad file that contains the location and name of each 10 flat files.
Sample:
D:\EMP1.txt
E:\EMP2.txt
E:\FLES\DWH\EMP3.txt and so on
3. Now make a session and in Source file name and Source File Directory location fields, give the
name and location of above created file.
4. n Source file type field, select ndirect.
5. Click Apply.
6. Validate Session
7. Make Workflow. Save it to repository and run.
ncremental Aggregation
When we enable the session option-> ncremental Aggregation the ntegration Service performs
incremental aggregation, it passes source data through the mapping and uses historical cache
data to perform aggregation calculations incrementally.
When using incremental aggregation, you apply captured changes in the source to aggregate
calculations in a session. f the source changes incrementally and you can capture changes, you
can configure the session to process those changes. This allows the ntegration Service to
update the target incrementally, rather than forcing it to process the entire source and recalculate
the same data each time you run the session.
For example, you might have a session using a source that receives new data every day. You can
capture those incremental changes because you have added a filter condition to the mapping that
removes pre-existing data from the flow of data. You then enable incremental aggregation.
When the session runs with incremental aggregation enabled for the first time on March 1, you
use the entire source. This allows the ntegration Service to read and store the necessary
aggregate data. On March 2, when you run the session again, you filter out all the records except
those time-stamped March 2. The ntegration Service then processes the new data and updates
the target accordingly. Consider using incremental aggregation in the following circumstances:
You can capture new source data. Use incremental aggregation when you can capture new
source data each time you run the session. Use a Stored Procedure or Filter transformation to
process new data.
ncremental changes do not significantly change the target. Use incremental aggregation when
the changes do not significantly change the target. f processing the incrementally changed
source alters more than half the existing target, the session may not benefit from using
incremental aggregation. n this case, drop the table and recreate the target with complete source
data.
Note: Do not use incremental aggregation if the mapping contains percentile or median functions.
The ntegration Service uses system memory to process these functions in addition to the cache
memory you configure in the session properties. As a result, the ntegration Service does not
store incremental aggregation values for percentile and median functions in disk caches.
ntegration Service Processing for ncremental Aggregation
(i)The first time you run an incremental aggregation session, the ntegration Service processes
the entire source. At the end of the session, the ntegration Service stores aggregate data from
that session run in two files, the index file and the data file. The ntegration Service creates the
files in the cache directory specified in the Aggregator transformation properties.
(ii)Each subsequent time you run the session with incremental aggregation, you use the
incremental source changes in the session. For each input record, the ntegration Service checks
historical information in the index file for a corresponding group. f it finds a corresponding group,
the ntegration Service performs the aggregate operation incrementally, using the aggregate data
for that group, and saves the incremental change. f it does not find a corresponding group, the
ntegration Service creates a new group and saves the record data.
(iii)When writing to the target, the ntegration Service applies the changes to the existing target. t
saves modified aggregate data in the index and data files to be used as historical data the next
time you run the session.
(iv) f the source changes significantly and you want the ntegration Service to continue saving
aggregate data for future incremental changes, configure the ntegration Service to overwrite
existing aggregate data with new aggregate data.
Each subsequent time you run a session with incremental aggregation, the ntegration Service
creates a backup of the incremental aggregation files. The cache directory for the Aggregator
transformation must contain enough disk space for two sets of the files.
(v)When you partition a session that uses incremental aggregation, the ntegration Service
creates one set of cache files for each partition.
The ntegration Service creates new aggregate data, instead of using historical data, when you
perform one of the following tasks:
Save a new version of the mapping.
Configure the session to reinitialize the aggregate cache.
Move the aggregate files without correcting the configured path or directory for the files in
the session properties.
Change the configured path or directory for the aggregate files without moving the files to
the new location.
Delete cache files.
Decrease the number of partitions.
When the ntegration Service rebuilds incremental aggregation files, the data in the
previous files is lost.
Note: To protect the incremental aggregation files from file corruption or disk failure, periodically
back up the files.
Preparing for ncremental Aggregation:
When you use incremental aggregation, you need to configure both mapping and session
properties:
mplement mapping logic or filter to remove pre-existing data.
Configure the session for incremental aggregation and verify that the file directory has
enough disk space for the aggregate files.
Configuring the Mapping
Before enabling incremental aggregation, you must capture changes in source data. You can use
a Filter or Stored Procedure transformation in the mapping to remove pre-existing source data
during a session.
Configuring the Session
Use the following guidelines when you configure the session for incremental aggregation:
(i) Verify the location where you want to store the aggregate files.
The index and data files grow in proportion to the source data. Be sure the cache directory has
enough disk space to store historical data for the session.
When you run multiple sessions with incremental aggregation, decide where you want the files
stored. Then, enter the appropriate directory for the process variable, $PMCacheDir, in the
Workflow Manager. You can enter session-specific directories for the index and data files.
However, by using the process variable for all sessions using incremental aggregation, you can
easily change the cache directory when necessary by changing $PMCacheDir.
Changing the cache directory without moving the files causes the ntegration Service to
reinitialize the aggregate cache and gather new aggregate data.
n a grid, ntegration Services rebuild incremental aggregation files they cannot find. When an
ntegration Service rebuilds incremental aggregation files, it loses aggregate history.
(ii) Verify the incremental aggregation settings in the session properties.
You can configure the session for incremental aggregation in the Performance settings on the
Properties tab.
You can also configure the session to reinitialize the aggregate cache. f you choose to reinitialize
the cache, the Workflow Manager displays a warning indicating the ntegration Service overwrites
the existing cache and a reminder to clear this option after running the session.
When should we go for hash partitioning?
Scenarios for choosing hash partitioning:
Not enough knowledge about how much data maps into a give range.
Sizes of range partition differ quite substantially, or are difficult to balance manually
Range partitioning would cause data to be clustered undesirably.
Features such as parallel DML, partition pruning, joins etc are important.
You Can Define Following Partition Types n Workflow Manager:
1) Database Partitioning
The integration service queries the BM db2 or oracle system for table partition information. t
reads partitioned data from the corresponding nodes in the database. Use database partitioning
with oracle or BM db2 source instances on a multi-node table space. Use database partitioning
with db2 targets
2) Hash Partitioning
Use hash partitioning when you want the integration service to distribute rows to the partitions by
group. For example, you need to sort items by item id, but you do not know how many items have
a particular id number
3) Key Range
you specify one or more ports to form a compound partition key. The integration service passes
data to each partition depending on the ranges you specify for each port. Use key range
partitioning where the sources or targets in the pipeline are partitioned by key range.
4) Simple Pass-Through
The integration service passes all rows at one partition point to the next partition point without
redistributing them. Choose pass-through partitioning where you want to create an additional
pipeline stage to improve performance, but do not want to change the distribution of data across
partitions
5) Round-Robin
The integration service distributes data evenly among all partitions. Use round-robin partitioning
where you want each partition to process approximately the same number of rows.
Partition Types Overview
Creating Partition Tables
To create a partition table gives the following statement
Create table sales (year number(4),
product varchar2(10),
amt number(10))
partition by range (year)
(
partition p1 values less than (1992) ,
partition p2 values less than (1993),
partition p5 values less than (MAXVALUE)
);
The following example creates a table with list partitioning
Create table customers (custcode number(5),
Name varchar2(20),
Addr varchar2(10,2),
City varchar2(20),
Bal number(10,2))
Partition by list (city),
Partition north_ndia values ('DELH','CHANDGARH'),
Partition east_ndia values ('KOLKOTA','PATNA'),
Partition south_ndia values ('HYDERABAD','BANGALORE',
'CHENNA'),
Partition west ndia values ('BOMBAY','GOA');
alter table sales add partition p6 values less than (1996);

alter table customers add partition central_ndia values ('BHOPAL','NAGPUR');SSS
Alter table sales drop partition p5;
Alter table sales merge partition p2 and p3 into
partition p23;
The following statement adds a new set of cities ('KOCH', 'MANGALORE') to an existing partition
list.
ALTER TABLE customers
MODFY PARTTON south_india
ADD VALUES ('KOCH', 'MANGALORE');
The statement below drops a set of cities ('KOCH' and 'MANGALORE') from an existing partition
value list.
ALTER TABLE customers
MODFY PARTTON south_india
DROP VALUES ('KOCH','MANGALORE');

SPLTTNG PARTTONS

You can split a single partition into two partitions. For example to split the partition p5 of sales
table into two partitions give the following command.

Alter table sales split partition p5 into
(Partition p6 values less than (1996),
Partition p7 values less then (MAXVALUE));

TRUNCATNG PARTTON

Truncating a partition will delete all rows from the partition.

To truncate a partition give the following statement

Alter table sales truncate partition p5;

LSTNG NFORMATON ABOUT PARTTON TABLES

To see how many partitioned tables are there in your schema give the following statement

Select * from user_part_tables;

To see on partition level partitioning information

Select * from user_tab_partitions;
TASKS
The Workflow Manager contains many types of tasks to help you build workflows and worklets.
We can create reusable tasks in the Task Developer.
Types of tasks:
Task Type
Tool where task can be created
Reusable or not
Session
Task Developer
Yes
Email
Workflow Designer
Yes
Command
Worklet Designer
Yes
Event-Raise
Workflow Designer
No
Event-Wait
Worklet Designer
No
Timer
No
Decision
No
Assignment
No
Control
No
SESSON TASK
A session is a set of instructions that tells the Power Center Server how and when to move data
from sources to targets.
To run a session, we must first create a workflow to contain the Session task.
We can run as many sessions in a workflow as we need. We can run the Session tasks
sequentially or concurrently, depending on our needs.
The Power Center Server creates several files and in-memory caches depending on the
transformations and options used in the session.
EMAL TASK
The Workflow Manager provides an Email task that allows us to send email during a workflow.
Created by Administrator usually and we just drag and use it in our mapping.
Steps:
1. n the Task Developer or Workflow Designer, choose Tasks-Create.
2. Select an Email task and enter a name for the task. Click Create.
3. Click Done.
4. Double-click the Email task in the workspace. The Edit Tasks dialog box appears.
5. Click the Properties tab.
6. Enter the fully qualified email address of the mail recipient in the Email User Name field.
7. Enter the subject of the email in the Email Subject field. Or, you can leave this field blank.
8. Click the Open button in the Email Text field to open the Email Editor.
9. Click OK twice to save your changes.
Example: To send an email when a session completes:
Steps:
1. Create a workflow wf_sample_email
2. Drag any session task to workspace.
3. Edit Session task and go to Components tab.
4. See On Success Email Option there and configure it.
5. n Type select reusable or Non-reusable.
6. n Value, select the email task to be used.
7. Click Apply -> Ok.
8. Validate workflow and Repository -> Save
9. We can also drag the email task and use as per need.
10. We can set the option to send email on success or failure in components tab of a
session task.
COMMAND TASK
The Command task allows us to specify one or more shell commands in UNX or DOS
commands in Windows to run during the workflow.
For example, we can specify shell commands in the Command task to delete reject files, copy a
file, or archive target files.
Ways of using command task:
1. Standalone Command task: We can use a Command task anywhere in the workflow or worklet
to run shell commands.
2. Pre- and post-session shell command: We can call a Command task as the pre- or post-
session shell command for a Session task. This is done in COMPONENTS TAB of a session. We
can run it in Pre-Session Command or Post Session Success Command or Post Session Failure
Command. Select the Value and Type option as we did in Email task.
Example: to copy a file sample.txt from D drive to E.
Command: COPY D:\sample.txt E:\ in windows
Steps for creating command task:
1. n the Task Developer or Workflow Designer, choose Tasks-Create.
2. Select Command Task for the task type.
3. Enter a name for the Command task. Click Create. Then click done.
4. Double-click the Command task. Go to commands tab.
5. n the Commands tab, click the Add button to add a command.
6. n the Name field, enter a name for the new command.
7. n the Command field, click the Edit button to open the Command Editor.
8. Enter only one command in the Command Editor.
9. Click OK to close the Command Editor.
10. Repeat steps 5-9 to add more commands in the task.
11. Click OK.
Steps to create the workflow using command task:
1. Create a task using the above steps to copy a file in Task Developer.
2. Open Workflow Designer. Workflow -> Create -> Give name and click ok.
3. Start is displayed. Drag session say s_m_Filter_example and command task.
4. Link Start to Session task and Session to Command Task.
5. Double click link between Session and Command and give condition in editor as
6. $S_M_FLTER_EXAMPLE.Status=SUCCEEDED
7. Workflow-> Validate
8. Repository > Save
WORKNG WTH EVENT TASKS
We can define events in the workflow to specify the sequence of task execution.
Types of Events:
Pre-defined event: A pre-defined event is a file-watch event. This event Waits for a specified file to
arrive at a given location.
User-defined event: A user-defined event is a sequence of tasks in the Workflow. We create
events and then raise them as per need.
Steps for creating User Defined Event:
1. Open any workflow where we want to create an event.
2. Click Workflow-> Edit -> Events tab.
3. Click to Add button to add events and give the names as per need.
4. Click Apply -> Ok. Validate the workflow and Save it.
Types of Events Tasks:
EVENT RASE: Event-Raise task represents a user-defined event. We use this task to raise a
user defined event.
EVENT WAT: Event-Wait task waits for a file watcher event or user defined event to occur before
executing the next session in the workflow.
Example1: Use an event wait task and make sure that session s_filter_example runs when
abc.txt file is present in D:\FLES folder.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.
2. Task -> Create -> Select Event Wait. Give name. Click create and done.
3. Link Start to Event Wait task.
4. Drag s_filter_example to workspace and link it to event wait task.
5. Right click on event wait task and click EDT -> EVENTS tab.
6. Select Pre Defined option there. n the blank space, give directory and filename to watch.
Example: D:\FLES\abc.tct
7. Workflow validate and Repository Save.
Example 2: Raise a user defined event when session s_m_filter_example succeeds. Capture this
event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.
2. Workflow -> Edit -> Events Tab and add events EVENT1 there.
3. Drag s_m_filter_example and link it to START task.
4. Click Tasks -> Create -> Select EVENT RASE from list. Give name
5. ER_Example. Click Create and then done. Link ER_Example to s_m_filter_example.
6. Right click ER_Example -> EDT -> Properties Tab -> Open Value for User Defined Event and
Select EVENT1 from the list displayed. Apply -> OK.
7. Click link between ER_Example and s_m_filter_example and give the condition
$S_M_FLTER_EXAMPLE.Status=SUCCEEDED
8. Click Tasks -> Create -> Select EVENT WAT from list. Give name EW_WAT. Click Create
and then done.
9. Link EW_WAT to START task.
10. Right click EW_WAT -> EDT-> EVENTS tab.
11. Select User Defined there. Select the Event1 by clicking Browse Events button.
12. Apply -> OK.
13. Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAT.
14. Mapping -> Validate
15. Repository -> Save.
Run workflow and see.
TMER TASK
The Timer task allows us to specify the period of time to wait before the Power Center Server
runs the next task in the workflow. The Timer task has two types of settings:
Absolute time: We specify the exact date and time or we can choose a user-defined workflow
variable to specify the exact time. The next task in workflow will run as per the date and time
specified.
Relative time: We instruct the Power Center Server to wait for a specified period of time after the
Timer task, the parent workflow, or the top-level workflow starts.
Example: Run session s_m_filter_example relative to 1 min after the timer task.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_timer_task_example -> Click ok.
2. Click Tasks -> Create -> Select TMER from list. Give name TMER_Example. Click Create
and then done.
3. Link TMER_Example to START task.
4. Right click TMER_Example-> EDT -> TMER tab.
5. Select Relative Time Option and Give 1 min and Select 'From start time of this task' Option.
6. Apply -> OK.
7. Drag s_m_filter_example and link it to TMER_Example.
8. Workflow-> Validate and Repository -> Save.
DECSON TASK
The Decision task allows us to enter a condition that determines the execution of the workflow,
similar to a link condition.
The Decision task has a pre-defined variable called $Decision_task_name.condition that
represents the result of the decision condition.
The Power Center Server evaluates the condition in the Decision task and sets the pre-defined
condition variable to True (1) or False (0).
We can specify one decision condition per Decision task.
Example: Command Task should run only if either s_m_filter_example or
S_M_TOTAL_SAL_EXAMPLE succeeds. f any of s_m_filter_example or
S_M_TOTAL_SAL_EXAMPLE fails then S_m_sample_mapping_EMP should run.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_decision_task_example -> Click ok.
2. Drag s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE to workspace and link both of
them to START task.
3. Click Tasks -> Create -> Select DECSON from list. Give name DECSON_Example. Click
Create and then done. Link DECSON_Example to both s_m_filter_example and
S_M_TOTAL_SAL_EXAMPLE.
4. Right click DECSON_Example-> EDT -> GENERAL tab.
5. Set 'Treat nput Links As' to OR. Default is AND. Apply and click OK.
6. Now edit decision task again and go to PROPERTES Tab. Open the Expression editor by
clicking the VALUE section of Decision Name attribute and enter the following condition:
$S_M_FLTER_EXAMPLE.Status = SUCCEEDED OR $S_M_TOTAL_SAL_EXAMPLE.Status =
SUCCEEDED
7. Validate the condition -> Click Apply -> OK.
8. Drag command task and S_m_sample_mapping_EMP task to workspace and link them to
DECSON_Example task.
9. Double click link between S_m_sample_mapping_EMP & DECSON_Example & give the
condition: $DECSON_Example.Condition = 0. Validate & click OK.
10. Double click link between Command task and DECSON_Example and give the
condition: $DECSON_Example.Condition = 1. Validate and click OK.
11. Workflow Validate and repository Save.
Run workflow and see the result.
CONTROL TASK
We can use the Control task to stop, abort, or fail the top-level workflow or the parent workflow
based on an input link condition.
A parent workflow or worklet is the workflow or worklet that contains the Control task.
We give the condition to the link connected to Control Task.
Control Option
Description
Fail Me
Fails the control task.
Fail Parent
Marks the status of the WF or worklet that contains the
Control task as failed.
Stop Parent
Stops the WF or worklet that contains the Control task.
Abort Parent
Aborts the WF or worklet that contains the Control task.
Fail Top-Level WF
Fails the workflow that is running.
Stop Top-Level WF
Stops the workflow that is running.
Abort Top-Level WF
Aborts the workflow that is running.
Example: Drag any 3 sessions and if anyone fails, then Abort the top level workflow.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_control_task_example -> Click ok.
2. Drag any 3 sessions to workspace and link all of them to START task.
3. Click Tasks -> Create -> Select CONTROL from list. Give name cntr_task.
4. Click Create and then done.
5. Link all sessions to the control task cntr_task.
6. Double click link between cntr_task and any session say s_m_filter_example and give the
condition: $S_M_FLTER_EXAMPLE.Status = SUCCEEDED.
7. Repeat above step for remaining 2 sessions also.
8. Right click cntr_task-> EDT -> GENERAL tab. Set 'Treat nput Links As' to OR. Default is
AND.
9. Go to PROPERTES tab of cntr_task and select the value 'Fail top level
10. Workflow' for Control Option. Click Apply and OK.
11. Workflow Validate and repository Save.
Run workflow and see the result.
ASSGNMENT TASK
The Assignment task allows us to assign a value to a user-defined workflow variable.
See Workflow variable topic to add user defined variables.
To use an Assignment task in the workflow, first create and add the
Assignment task to the workflow. Then configure the Assignment task to assign values or
expressions to user-defined variables.
We cannot assign values to pre-defined workflow.
Steps to create Assignment Task:
1. Open any workflow where we want to use Assignment task.
2. Edit Workflow and add user defined variables.
3. Choose Tasks-Create. Select Assignment Task for the task type.
4. Enter a name for the Assignment task. Click Create. Then click done.
5. Double-click the Assignment task to open the Edit Task dialog box.
6. On the Expressions tab, click Add to add an assignment.
7. Click the Open button in the User Defined Variables field.
8. Select the variable for which you want to assign a value. Click OK.
9. Click the Edit button in the Expression field to open the Expression Editor.
10. Enter the value or expression you want to assign.
11. Repeat steps 7-10 to add more variable assignments as necessary.
12. Click OK.
Scheduler
We can schedule a workflow to run continuously, repeat at a given time or interval, or we can
manually start a workflow. The ntegration Service runs a scheduled workflow as configured.
By default, the workflow runs on demand. We can change the schedule settings by editing the
scheduler. f we change schedule settings, the ntegration Service reschedules the workflow
according to the new settings.
A scheduler is a repository object that contains a set of schedule settings.
Scheduler can be non-reusable or reusable.
The Workflow Manager marks a workflow invalid if we delete the scheduler associated with the
workflow.
f we choose a different ntegration Service for the workflow or restart the ntegration Service, it
reschedules all workflows.
f we delete a folder, the ntegration Service removes workflows from the schedule.
The ntegration Service does not run the workflow if:
The prior workflow run fails.
We remove the workflow from the schedule
The ntegration Service is running in safe mode
Creating a Reusable Scheduler
For each folder, the Workflow Manager lets us create reusable schedulers so we can reuse the
same set of scheduling settings for workflows in the folder.
Use a reusable scheduler so we do not need to configure the same set of scheduling settings in
each workflow.
When we delete a reusable scheduler, all workflows that use the deleted scheduler becomes
invalid. To make the workflows valid, we must edit them and replace the missing scheduler.
Steps:
Open the folder where we want to create the scheduler.
n the Workflow Designer, click Workflows > Schedulers.
Click Add to add a new scheduler.
n the General tab, enter a name for the scheduler.
Configure the scheduler settings in the Scheduler tab.
Click Apply and OK.
Configuring Scheduler Settings
Configure the Schedule tab of the scheduler to set run options, schedule options, start options,
and end options for the schedule.
There are 3 run options:
Run on Demand
Run Continuously
Run on Server initialization
1. Run on Demand:
ntegration Service runs the workflow when we start the workflow manually.
2. Run Continuously:
ntegration Service runs the workflow as soon as the service initializes. The ntegration Service
then starts the next run of the workflow as soon as it finishes the previous run.
3. Run on Server initialization
ntegration Service runs the workflow as soon as the service is initialized. The ntegration Service
then starts the next run of the workflow according to settings in Schedule Options.
Schedule options for Run on Server initialization:
Run Once: To run the workflow just once.
Run every: Run the workflow at regular intervals, as configured.
Customized Repeat: ntegration Service runs the workflow on the dates and times specified in
the Repeat dialog box.
Start options for Run on Server initialization:
Start Date
Start Time
End options for Run on Server initialization:
End on: S stops scheduling the workflow in the selected date.
End After: S stops scheduling the workflow after the set number of
Workflow runs.
Forever: S schedules the workflow as long as the workflow does not fail.
Creating a Non-Reusable Scheduler
n the Workflow Designer, open the workflow.
Click Workflows > Edit.
n the Scheduler tab, choose Non-reusable. Select Reusable if we want to select an existing
reusable scheduler for the workflow.
Note: f we do not have a reusable scheduler in the folder, we must
Create one before we choose Reusable.
Click the right side of the Scheduler field to edit scheduling settings for the non- reusable
scheduler
f we select Reusable, choose a reusable scheduler from the Scheduler
Browser dialog box.
Click Ok.
Points to Ponder:
To remove a workflow from its schedule, right-click the workflow in the Navigator window and
choose Unscheduled Workflow.
To reschedule a workflow on its original schedule, right-click the workflow in the Navigator window
and choose Schedule Workflow.
Pushdown Optimization Overview
You can push transformation logic to the source or target database using pushdown
optimization. When you run a session configured for pushdown optimization, the ntegration
Service translates the transformation logic into SQL queries and sends the SQL queries to the
database. The source or target database executes the SQL queries to process the
transformations.
The amount of transformation logic you can push to the database depends on the
database, transformation logic, and mapping and session configuration. The ntegration Service
processes all transformation logic that it cannot push to a database.
Use the Pushdown Optimization Viewer to preview the SQL statements and mapping logic
that the ntegration Service can push to the source or target database. You can also use the
Pushdown Optimization Viewer to view the messages related to pushdown optimization.
The following figure shows a mapping containing transformation logic that can be pushed
to the source database:
This mapping contains a Filter transformation that filters out all items except those with an
D greater than 1005. The ntegration Service can push the transformation logic to the database.
t generates the following SQL statement to process the transformation logic:
NSERT NTO TEMS(TEM_D, TEM_NAME, TEM_DESC, n_PRCE) SELECT
TEMS.TEM_D, TEMS.TEM_NAME, TEMS.TEM_DESC, CAST(TEMS.PRCE AS NTEGER)
FROM TEMS WHERE (TEMS.TEM_D >1005)
The ntegration Service generates an NSERT SELECT statement to get the D, NAME,
and DESCRPTON columns from the source table. t filters the data using a WHERE clause. The
ntegration Service does not extract data from the database at this time.
Pushdown Optimization Types
You can configure the following types of pushdown optimization:
Source-side pushdown optimization. The ntegration Service pushes as much transformation
logic as possible to the source database.
Target-side pushdown optimization. The ntegration Service pushes as much transformation logic
as possible to the target database.
Full pushdown optimization. The ntegration Service attempts to push all transformation logic to
the target database. f the ntegration Service cannot push all transformation logic to the
database, it performs both source-side and target-side pushdown optimization.
Running Source-Side Pushdown Optimization Sessions
When you run a session configured for source-side pushdown optimization, the ntegration
Service analyzes the mapping from the source to the target or until it reaches a downstream
transformation it cannot push to the database.
The ntegration Service generates and executes a SELECT statement based on the
transformation logic for each transformation it can push to the database. Then, it reads the results
of this SQL query and processes the remaining transformations.
Running Target-Side Pushdown Optimization Sessions
When you run a session configured for target-side pushdown optimization, the ntegration Service
analyzes the mapping from the target to the source or until it reaches an upstream transformation
it cannot push to the database. t generates an NSERT, DELETE, or UPDATE statement based
on the transformation logic for each transformation it can push to the database. The ntegration
Service processes the transformation logic up to the point that it can push the transformation logic
to the target database. Then, it executes the generated SQL.
Running Full Pushdown Optimization Sessions
To use full pushdown optimization, the source and target databases must be in the same
relational database management system. When you run a session configured for full pushdown
optimization, the ntegration Service analyzes the mapping from the source to the target or until it
reaches a downstream transformation it cannot push to the target database. t generates and
executes SQL statements against the source or target based on the transformation logic it can
push to the database.
When you run a session with large quantities of data and full pushdown optimization, the
database server must run a long transaction. Consider the following database performance
issues when you generate a long transaction:
A long transaction uses more database resources.
A long transaction locks the database for longer periods of time. This reduces database
concurrency and increases the likelihood of deadlock.
A long transaction increases the likelihood of an unexpected event.
To minimize database performance issues for long transactions, consider using source-side or
target-side pushdown optimization.
ntegration Service Behavior with Full Optimization
When you configure a session for full optimization, the ntegration Service analyzes the mapping
from the source to the target or until it reaches a downstream transformation it cannot push to the
target database. f the ntegration Service cannot push all transformation logic to the target
database, it tries to push all transformation logic to the source database. f it cannot push all
transformation logic to the source or target, the ntegration Service pushes as much
transformation logic to the source database, processes intermediate transformations that it
cannot push to any database, and then pushes the remaining transformation logic to the target
database. The ntegration Service generates and executes an NSERT SELECT, DELETE, or
UPDATE statement for each database to which it pushes transformation logic.
For example, a mapping contains the following transformations:
The Rank transformation cannot be pushed to the source or target database. f you configure the
session for full pushdown optimization, the ntegration Service pushes the Source Qualifier
transformation and the Aggregator transformation to the source, processes the Rank
transformation, and pushes the Expression transformation and target to the target database. The
ntegration Service does not fail the session if it can push only part of the transformation logic to
the database.
Active and dle Databases
During pushdown optimization, the ntegration Service pushes the transformation logic to one
database, which is called the active database. A database that does not process transformation
logic is called an idle database. For example, a mapping contains two sources that are joined by
a Joiner transformation. f the session is configured for source-side pushdown optimization, the
ntegration Service pushes the Joiner transformation logic to the source in the detail pipeline,
which is the active database. The source in the master pipeline is the idle database because it
does not process transformation logic.
The ntegration Service uses the following criteria to determine which database is active or idle:
1. When using full pushdown optimization, the target database is active and the source database
is idle.
2. n sessions that contain a Lookup transformation, the source or target database is active, and
the lookup database is idle.
3. n sessions that contain a Joiner transformation, the source in the detail pipeline is active, and
the source in the master pipeline is idle.
4. n sessions that contain a Union transformation, the source in the first input group is active.
The sources in other input groups are idle.
To push transformation logic to an active database, the database user account of the active
database must be able to read from the idle databases.
Working with Databases
You can configure pushdown optimization for the following databases:
BM DB2
Microsoft SQL Server
Netezza
Oracle
Sybase ASE
Teradata
Databases that use ODBC drivers
When you push transformation logic to a database, the database may produce different output
than the ntegration Service. n addition, the ntegration Service can usually push more
transformation logic to a database if you use a native driver, instead of an ODBC driver.
Comparing the Output of the ntegration Service and Databases
The ntegration Service and databases can produce different results when processing the same
transformation logic. The ntegration Service sometimes converts data to a different format when
it reads data. The ntegration Service and database may also handle null values, case sensitivity,
and sort order differently.
The database and ntegration Service produce different output when the following settings and
conversions are different:
Nulls treated as the highest or lowest value. The ntegration Service and a database can treat null
values differently. For example, you want to push a Sorter transformation to an Oracle database.
n the session, you configure nulls as the lowest value in the sort order. Oracle treats null values
as the highest value in the sort order.
Sort order. The ntegration Service and a database can use different sort orders. For example,
you want to push the transformations in a session to a Microsoft SQL Server database, which is
configured to use a sort order that is not case sensitive. You configure the session properties to
use the binary sort order, which is case sensitive. The results differ based on whether the
ntegration Service or Microsoft SQL Server database process the transformation logic.
Case sensitivity. The ntegration Service and a database can treat case sensitivity differently. For
example, the ntegration Service uses case sensitive queries and the database does not. A Filter
transformation uses the following filter condition: F(col_varchar2 = 'CA', TRUE, FALSE). You
need the database to return rows that match 'CA.' However, if you push this transformation logic
to a Microsoft SQL Server database that is not case sensitive, it returns rows that match the
values 'Ca,' 'ca,' 'cA,' and 'CA.'
Numeric values converted to character values. The ntegration Service and a database can
convert the same numeric value to a character value in different formats. The database can
convert numeric values to an unacceptable character format. For example, a table contains the
number 1234567890. When the ntegration Service converts the number to a character value, it
inserts the characters '1234567890.' However, a database might convert the number to '1.2E9.'
The two sets of characters represent the same value. However, if you require the characters in
the format '1234567890,' you can disable pushdown optimization.
Precision. The ntegration Service and a database can have different precision for particular
datatypes. Transformation datatypes use a default numeric precision that can vary from the native
datatypes. For example, a transformation Decimal datatype has a precision of 1-28. The
corresponding Teradata Decimal datatype has a precision of 1-18. The results can vary if the
database uses a different precision than the ntegration Service.
Using ODBC Drivers
When you use native drivers for all databases, except Netezza, the ntegration Service generates
SQL statements using native database SQL. When you use ODBC drivers, the ntegration
Service usually cannot detect the database type. As a result, it generates SQL statements using
ANS SQL. The ntegration Service can generate more functions when it generates SQL
statements using the native language than ANS SQL.
Note: Although the ntegration Service uses an ODBC driver for the Netezza database, the
ntegration Service detects that the database is Netezza and generates native database SQL
when pushing the transformation logic to the Netezza database.
n some cases, ANS SQL is not compatible with the database syntax. The following sections
describe problems that you can encounter when you use ODBC drivers. When possible, use
native drivers to prevent these problems.
Working with Dates
The ntegration Service and database can process dates differently. When you configure the
session to push date conversion to the database, you can receive unexpected results or the
session can fail.
The database can produce different output than the ntegration Service when the following date
settings and conversions are different:
Date values converted to character values. The ntegration Service converts the transformation
Date/Time datatype to the native datatype that supports subsecond precision in the database.
The session fails if you configure the datetime format in the session to a format that the database
does not support. For example, when the ntegration Service performs the ROUND function on a
date, it stores the date value in a character column, using the format MM/DD/YYYY
HH:M:SS.US. When the database performs this function, it stores the date in the default date
format for the database. f the database is Oracle, it stores the date as the default DD-MON-YY. f
you require the date to be in the format MM/DD/YYYY HH:M:SS.US, you can disable pushdown
optimization.
Date formats for TO_CHAR and TO_DATE functions. The ntegration Service uses the date
format in the TO_CHAR or TO_DATE function when the ntegration Service pushes the function
to the database. The database converts each date string to a datetime value supported by the
database.
For example, the ntegration Service pushes the following expression to the database:
TO_DATE( DATE_PROMSED, 'MM/DD/YY' )
The database interprets the date string in the DATE_PROMSED port based on the specified date
format string MM/DD/YY. The database converts each date string, such as 01/22/98, to the
supported date value, such as Jan 22 1998 00:00:00.
f the ntegration Service pushes a date format to an BM DB2, a Microsoft SQL Server, or a
Sybase database that the database does not support, the ntegration Service stops pushdown
optimization and processes the transformation.
The ntegration Service converts all dates before pushing transformations to an Oracle or
Teradata database. f the database does not support the date format after the date conversion,
the session fails.
HH24 date format. You cannot use the HH24 format in the date format string for Teradata. When
the ntegration Service generates SQL for a Teradata database, it uses the HH format string
instead.
Blank spaces in date format strings. You cannot use blank spaces in the date format string in
Teradata. When the ntegration Service generates SQL for a Teradata database, it substitutes the
space with 'B.'
Handling subsecond precision for a Lookup transformation. f you enable subsecond precision for
a Lookup transformation, the database and ntegration Service perform the lookup comparison
using the subsecond precision, but return different results. Unlike the ntegration Service, the
database does not truncate the lookup results based on subsecond precision. For example, you
configure the Lookup transformation to show subsecond precision to the millisecond. f the lookup
result is 8:20:35.123456, a database returns 8:20:35.123456, but the ntegration Service returns
8:20:35.123.
SYSDATE built-in variable. When you use the SYSDATE built-in variable, the ntegration Service
returns the current date and time for the node running the service process. However, when you
push the transformation logic to the database, the SYSDATE variable returns the current date
and time for the machine hosting the database. f the time zone of the machine hosting the
database is not the same as the time zone of the machine running the ntegration Service
process, the results can vary.

You might also like