You are on page 1of 90

INFORMATICA POWERCENTER

Transaction:

A transaction is a business operation

Technical point of view :

It is a set of DML operations(Insert,UPDATE_DATE,Delete)

OLTP System=OLTP applications(Front end)+Database(Back end)

Data warehousing=ETL Development+BI Development

Enterprise Data warehouse:

An Enterprise Data warehouse is a relational DB which is specially designed for analyzing the
business and making decisions to achieve the business goals and responding to business
problems ,but not designed for business transactional processing

A Data warehouse is a concept of consolidating the data from multiple OLTP data bases

Storage Capacity point of view Relational DB is categorized in to three types

1.Low range

2.Mid range

3.High range

1.Low range DB:

Can organized and managed mega bytes of information

Example:Ms-Access
2.Mid range DB:

Can organized and managed Giga bytes of information

Example:Oracle,Microsoft SQL SERVER,Sybase,DB2,Informix,Postgress SQL

3.High range DB:

Can organized and managed Tera bytes and Peta Bytes of information

Example:Teradata,Netezza,GreenPlum,Hadoop.

Storage point of view data base categorized in to two types.

1.NFS-Normal File storage

2.DFS-Distributed File storage

Data storage Patterns:

There are two types of data storage patterns which are supported by relational DB

1.NFS-Normal File storage

2.DFS-Distributed File storage

NFS-Normal File storage:

1.Single Disk for storing the data

2.Shared every thing architecture(data shared in single disk)

3.Data reads in Sequential

4.All Mid range DB are developed on platform of NFS.

5.Limit scalability or expansion

6.Strongly recommended for OLTP applications

7.Recomended for data warehousing for small and medium scale enterprises with storage
capacity of gigabytes

8.default processor in NFS is only one

9.Disk cant scalable in NFS


Example:Oracle,Sybase,SQL server,DB2,Redbrics,Informix,Postgress SQL

Note:Processor is a S/W component run as .exe

DFS-Distributed File Storage:

1.Multiple disks for storing the data

2.Storing nothing architecture (every processor has dedicated memory& disk that is not shared
by another processor)

3.Data reads in parallel(supports parallelism)

4.Unlimited Scalability

5.Designed only for Building Enterprise data warehouse but not for OLTP

Example:Teradata,Netteza,Hadoop,green plum
Enterprise DWH database Evaluation:

1.Data base that supports enormous storage capacity(Billions of rows and Tera bytes)

2. DB that supports distributed file storage pattern

3.DB that supports nothing architecture

4.Database that supports unlimited scalability(expansion)

5.DB that massively parallel processing

6.DB that supports mature optimizers to handle complex SQL Queries( Run the queries more
faster with less system resource usage

7..DB that supports High Availability(Users can access)

8.100% data without data loss even S/W,H/W components are down

9.Data base that supports parallel loading

10.That DB supports low TCO (total cost of owner ship) ease to set up ,administrate & Manage

11. Single DB server that can provide access to hundreds of users concurrently

Data Acquisition:

It is a process of extracting the data from multiple source systems,transforming the data into
consistent format and load in to a target system,To implement the ETL process we need ETL
tools

Types of ETL tools

Two types of ETL tools to build Data Acquisition

1.GUI based ETL tool

2.Program Based ETL tool

Code Based ETL:

ETL applications are developed using programming languages such as

SQL, PLSQL, SAS, Teradata,ETL utilities


GUI Based ETL:

ETL applications are developed using simple graphical user interface,point& click features

Example:Informatca,Data stage, Abnitio,SSIS

MSBI is a package it has(ETL+Reporting=SSIS+SSRS)

Data Cleansing:

It is a process of filtering or rejecting Un wanted source data or records

Data Scrubbing: It is the process of Deriving new attributes or columns

Data Merging:

It is the process of combining the data from multiple source systems

Data merging are two types


1.Join

2.Union

Data warehouse:

1.Data warehouse is a relational DB that is used to store the historical data for query& Analysis

2.Data in a Data warehouse is derived from source system(OLTP/SOI)

SOR-->Source of records

OLTP: (Online transactional Processing)

Computer system that stores time sensitive transaction related data that is processed
immediately and analysis and always kept current.

Difference Between OLTP And Data ware house

Tables in Data Warehouse:

There are two types of tables we have in Data Warehouse

1. Dimension Table

2. Fact Table
1. Dimension Table:

Stores textual or descriptive information about business process

Dimension tables example s in Retail Domain:

Customer,Product,Stores,Employees,Pramotions,Time

Dimension tables example s in Banking Domain:

Applictions, Customers, Products, Branches, Promotions, Time, Billing cycle Dimension

Fact Table:

Fact table stores measurements or metrics of a business process

Fact table examples in Retail Domain:

Sales,Purchase,Inventry

Fact tables examples in Banking Domain

1. SA_LoanTransaction Fact

2. CC_Transaction Fact

3. CC_Statement Fact

Fact table consists of Keys and Measures and Fact table consist of Composite Primary key
Composite Primary Key
Store Prod Date
Key(X) Key(X) Key(X) Revenue(X)
S1 P1 D1 3000
S1 P2 D1 2000
S2 P1 D1 2000

Types of Fact tables:

There are three types of fact tables

1. Fact Less Fact table

2. Cumulative Fact table

3. Snap shot Fact table

1. Fact less Fact table:

1.Fact less Fact table consist of only keys and No Measures

2.Fact less Fact table is to record the events

3.Fact less Fact table acts as a Bridge between the Dimensional tables

Example of Fact less Fact table: Employee Attendence Fact less Fact

Dimension Tables
Auditorim Sponsors Time Paticipant Events
Aud Id Sponsor Id Date Key Paticipant Id Event Id
Sponsor Month Paticipant Event
Aud Name Name Key Name Name
Aud Type Contribution Qtr Gender Event Type
Aud Mgr Address Year Address Event Desc
Aud
Address

Fact Table
Aud Id Sponsor Id Paticipant id Event id
A1 S1 P1 E1
A1 S1 P2 E1
A2 S1 P3 E1
2. Cumulative Fact table:

It consist of additive fact it describes what happened over a period of time

Ex: Sales Fact table, Order Fact table

3. Snapshot Fact table:

It consist of semi additive facts and non additive facts it describes states of things in a
particular instance of time

Ex: Bank Fact table, Inventory Fact table

Degenarate Dimension Key:

Key In a Fact table that is not associated with any Dimension

Example:Order Id,Sale Id, Bill No,Invoice etc

Types of Facts:

There are 3 types of Facts in Fact tables

1. Additive Facts

2.Semi Additive Facts

3. Non Additive Facts

1.Additive Fact: Business measurements in a fact table that can be summed up through all of
the dimensional Keys
Fact Table
Store Key Prod Key Date Key Revenue
S1 P1 12-Jan-15 600
S1 P2 12-Jan-15 400
S2 P2 12-Jan-15 800
S2 P3 13-Jan-15 500
S3 P1 13-Jan-15 700
S3 P3 14-Jan-15 900

Reports generation using Keys In above Fact table

Revenue Report By Revenue Report By Revenue Report By


Store Product Date
Store Key Revenue Product Key Revenue Date Key Revenue
S1 1000 P1 1300 12-Jan-15 1800
S2 1300 P2 1200 13-Jan-15 1200
S3 1600 P3 1400 14-Jan-15 900
Bank Fact table:

Semi Additive Fact: Business measurements in a fact table that can be summed up across only
few Dimensional Keys

Transaction Profit
Acct Id Date Balance Margin
21653 12-Jan-15 700000 -
21654 12-Jan-15 400000 -
21653 13-Jan-15 900000 -
21654 13-Jan-15 600000 -
Reports:

Balance By Acct Id
Acct
Id Balance Balance
21653 1600000 900000
21654 1000000 600000

Balance By Date
Date Key Balance
12-Jan-15 1100000
13-Jan-15 1500000
The above example is for Semi additive Fact

3.Non Additive Fact:Business measurements in a fact table that cannot be summed up across
any Dimension KeysNote: In a Fact table percentage are always non additive

SEM1 80%
SEM2 60%
TOTAL 140% Wrong

Note: Example of Non Additive Fact is Unit Price

Types of Dimensions:

The following are the diff types of dimensions in DW

1. Confirmed Dimension

2. Degenerated Dimension

3. Shrunken Dimension

4. Junk Dimension

5.Dirty Dimension

Types of Dimensions:

Conformed Dimension: A Dimension that is shared across multiple Fact table that is called
Conformed Dimension Or Dimension that is used to join Data mart
Banking Domain:

Degenerated Dimension:

If a fact table act as dimension and it’s shared with another fact table (or) maintains foreign key
in another fact table .such a table called degenerated dimension.

Shrunken Dimension:

Dimension that is subsetof toanother dimension

Or

Dimension that is not directly linked to the Fact table


Junk Dimension:

Dimension that is organized based on low cardinality indicator or flag values

Cardinality is no of unique values in a column or Cardinality expresses the minimum and the
maximum no of instances of an entity ‘B’ that can be associated to an instance of Entity ‘A’

The Minimum and Maximum no can be 0,1 or “n”

Dirty Dimension:

If a record occurs more than one time in a table by the difference of non key attribute such a
table is called dirty dimension

Orders:

Order Order Payment Payment Mode Comm/Non


Id Date Mode Type Comm Amount
111 - Cash Cash No -
112 - Cash Cash No -
113 - Credit Master No -
114 - Cash Cash No -
115 - Cash Cash No -
116 - Credit Visa Yes -
117 - Cash Cash No -
Payment Mode Comm/Non
Ord Ind Id Payment Type Comm
1 Cash Cash No
2 Credit Master No
3 Credit Visa Yes

Order Order
Order Id Date Id Amount
111 - 1 -
112 - 1 -
113 - 2 -
114 - 1 -
115 - 1 -
116 - 3 -
117 - 1 -

Slowly Changing Dimension:

Dimension that change slowly and irregularly

Or

Dimension that change across time

There are three choices to handle slowly changing dimensions

1.SCD TYPE1

2.SCD TYPE-II

3.SCD TYPE-III

1. SCD TYPE-I:

Most recent changes are maintained

Type1 is current status

Type1 is used for error correction

CID CNAME DOB


11 BEN 12-JAN-1967
12 ALEN 15-FEB-1966
CKEY CID CNAME DOB
101 11 BEN 12-JAN-1967
102 12 ALEN 15-FEB-1966

SCD TYPE-II:

Change is inserted as a new record

Type-II is used to maintain historical status

PRODUCTS
PID PNAME PRICE EFF_DATE
11 ABC 300 12-JAN-10
12 PQR 270 15-JAN-10
PRODUCT PRICE OF 12
CHANGED 199 27-AUG-11

PKEY PID PNAME PRICE EFF_DATE END_DATE


100 11 ABC 300 12-JAN-10
101 12 PQR 270 15-JAN-10 26-AUG-11
102 12 PQR 199 27-AUG-11

Type-II Dimension is referred as Dirty Dimension

Type-II Dimension has redundant data

SCD Type-III: Change is appended as a new column

Type-III is used to maintain partial history status

CID CNAME LOC


11 BEN HYD
12 TOM CHE

CURR
CKEY CID CNAME LOC PREVLOC
101 11 BEN HYD
102 12 TOM CHE
CID CNAME LOC
11 BEN HYD
12 TOM BNG

CURR
CKEY CID CNAME LOC PREVLOC
101 11 BEN HYD -
102 12 TOM BNG CHE

CID CNAME LOC


11 BEN KER
12 TOM BNG

CURR
CKEY CID CNAME LOC PREVLOC
101 11 BEN KER HYD
102 12 TOM BNG CHE

Role Play Dimension: Dimension that is recycled in multiple applications within the DB

Data Modeling:

Model: Business presentation of the structure of the data in one or more database

OLTP:ER-Mode is used

Model is normalized

Model is efficient to wards transaction

Datawarehouse:Dimensional model is used


Model designed based on Facts&Dimensions

Model is efficient in query processiong

Schema:Scema is a collection of users’objects can be a Table,View or Synanim

Types of Schema:

1.Star Schema

2.Snow Flake Schema

3.Galaxy Schema

1.Star Schema: In a star schema a centre of a star is Fact table and corners are Dimension
tables

In simple start schema consist of only one Fact table

Star schema Dimension ‘s do not have parent tables

Star schema Dimension’s are Denarmalized

Star schema is De Normalized(every thing in one table) efficient in query processing


2. Snow Flake Schema

Snow flake schema dimensions have one or more parent tables

Snow flake schema is normalized

Snow flake schema is efficient in transaction processing

Customer
Cid Cname Gender Geoid
11 C1 1 111
12 C2 1 111
13 C3 0 112
14 C4 1 111

Geography
Geoid City State Country Region
111 Hyd Ts India Asia
112 VSP Ap India Asia

Cid Cname Gender Geoid City State Country Region


11 C1 1 111 Hyd Ts India Asia
12 C2 1 111 Hyd Ts India Asia
13 C3 0 112 VSP Ap India Asia
14 C4 1 111 Hyd Ts India Asia

Star schema use more space than Snow flake schema


Galaxy Schema:

Multiple Fact tables are connected to multiple Dimensions tables

Index: (Fast accessing path)

1.B*Tree Index

2.BitMap Index

1.B*Tree Index

It is used on High Cardinality columns

Example for B*Tree Index=EMPNO

2.BitMap Index

It is used on Low Cardinality columns

Example for Bit Map Index=GENDER


1.Flat File to Oracle:

Using SQLLDR to load the flat file data in to Oracle Table

STEP1:

Create Data file with few sample records

=============================================================================

STATE_ID,STATE_NAME,COUNTRY_ID

250.00,Rio Negro,111

251.00,Buenos Aires,111

252.00,Victoria,115

253.00,South Australia,115

254.00,Queensland,115

255.00,Northern Territory,115

256.00,New South Wales,115

257.00,Australian Capital Territory,115

258.00,Sao Paulo,110

259.00,Santa Catarina,110

260.00,Rio de Janeiro,110

=============================================================================

Save the file in the following directory

C:\SOURCE\States.txt

STEP2:

Create table in the oracle data base using the script given below
CREATE TABLE STATES(STATE_ID NUMBER(5,2),STATE_NAME VARCHAR2(25),COUNTRY_ID
NUMBER(3));

STEP3: Create control file using note pad file:

LOAD DATA

INFILE 'C:\SOURCE\States.txt'

APPEND INTO TABLE STATES

FIELDS TERMINATED BY "," (STATE_ID,STATE_NAME,COUNTRY_ID)

Save control file in the following directory C:\SOURCE\”States.ctl”

STEP4:

Connect to SQL and use the following command SQL>HOST CMD

C:\oracle\product\10.2.0\db_1\BIN>SQLLDR scott@oracle/tiger direct=true skip=1


log=c:\SOURCE\States.log control=c:\SOURCE\States.ctl

C:\oracle\product\10.2.0\db_1\BIN>exit

STEP5:

SQL>SELECT * FROM STATES

Informatica Powercenter 9.5:

Informatica Powercenter is a data integration tool from Informatica corporation which was founded in
1993 Redwood City of Las Angels California.

Informatica Power center is a GUI based data integration plotform which can access the data from
various types of source systems,transform the data into universal format

It is a client server based ETL product.

Informatica Products:

Informatica Power Center

Informatica Power Mart

Informatica Power Exchange


Informatica Power Analyzer

Informatica Data Quality

Master Data Management

Informatica B2B Integration

Informatica Information Life Cycle Management

Informatica Power Center Architecture:

When we install Informatic Power center the following components installed.

1.Power Center Clients:

1.Power Center Designer

2. Power Center Work flow Manager

3. Power Center Workflow Monitor

4. Power Center Repository Manager

2. Power Center Repository

3.Power Center Repository Services(PCRS)

4.Power Center Integration Services(PCIS)

5.Power Center Domain

6.Informatica Administrator(Web Client)

1.Power Center Clients:

1.Power Center Designer: It is an GUI based client component which allow you to design ETL
applications known as Mapping.

A Mapping defines Extraction,Transformation,Loading

The following objects can be created from designer client


1.Source Definition(Meta Data)

2.Target Definition

3.Mappings with Business rules

2. Power Center Work flow Manager:

It is a GUI based client component which allow you create following objects.

1.Session

2.Work flow

3.Schedulers

Session:

A Session is a task that executes mapping .

A session is a set of instructions that tells ETL server

Work flow:

A Work flow is a set of instructions that tells how to execute the session tasks

A Work flow is designed with two types of batch process

1.Sequential Batch Process

2.Concurrent or Parallel batch Process

1.Sequential Batch Process:

Sequential batch process is recommended when hence it is a dependency between data loads
2.Concurrent batch Process:

Work flow executes the session tasks all at once this is recommended when there is no dependency
between data loads.

Work flow is a top object in the object development hierarchy

Schedule:It is an Automation of executing the work flow

Power Center work flow Monitor:

a)It is a GUI based client component which allows you to monitor the execution of sessions and work
flows on ETL server

b)Can collect ETL statistics such as

No of records Extracted
No of records Rejected

No of recorded Loaded

Through Put in Power Center :

It defines the efficiency or the rate at which records are extracted/sec,the records are loaded/sec

Through put can also be express in bytes/sec

Can evaluate the ETL server Efficiency

Users can access session Log(Execution Log)

Development of ETL Objects:

Step1: Create Source Definition

Step2: Create Target Definition

Step3: Design Mapping (ETL application with or with without Business rules)

Step4: Create Session for each Mapping

Step5: Design Work flow

Step6: Run Work flow

Step7: Monitor Workflow

Power Center Repository Manager:

It is a GUI based Administrative client which is used to perform the following tasks

a) Create ,Edit ,Delete Folders


b) Objects Back up and Restore
c) Assign users to access the folders with read ,write, execute permissions
Power Center Repository:

A repository is a brain of ETL system that stores ETL objects or Meta Data.

A relational DB is required to create repository

Repository DB that consist of system tables that stores ETL objects


Power Centre Repository Service[PCRS]:

A Power Center client component connects to the repository DB using repository service

A repository service is a set of process that insert ,UPDATE_DATE,delete,retrive metadata from


repository

Instance: Instance is a Image of Original Objects

Or

Instance is a Image of physical Objects

Note: Repository Service provides Design Level Environment

Power Center Integration Service(PCIS):

An Integraton service is an ETL server that performs Extraction,Transformation,Loading

It provided run time Environment where ETL objects are executed integration Service creates Log’s and
saved in the repository data base through repository service.

Integration service consists of following server components

1.Reader
2.DTM

3.Writer

Reader: It connects to the source and Extract the data from tables,Files,etc

Data Transformation Manager (DTM): It process the data according to the business rules that you
configured in the mapping

Writer: It connects to the target system and loads the data into the tables (or) Files

Note: Log Created by Integration service and saved in repository that log can accessed by work flow
Manager

Power Center Domain:

1.The Informatica power center has the ability to scale the services and shared resources across multiple
machines

2.The power center domain is a primary unit for managing and administrating application
services(PCRS,PCIS)
3.Power Center Domain is a collection of one or more Nodes

4.A Node which host the Domain is known as Primary Node or Master Gate way Node

5.If master gate way Node fails users request can’t be processed

6.Hence it is recommended to configure more than one Node as Master Gate way Node

7.If the worker Node fails the request can be distributed to other Nodes[High Availabilty]

8.Each Node is created or Configured with application services

Informatica Administrator (Web Client):

1.It is an Administrative web client which is used to manage&administrate power center Domain

2.The following admin tasks can be performed with web client

a) creation of users,Groups

b) assign roles&permissions to the users or users groups

c) enable and disable existing nodes

d) Configuring existing nodes to increase the processing efficiency

f) adding or deleting Nodes


g) Creation of application services(PCRS,PSIC)

Pre Requisites of an ETL process:

STEP1: Set Up Sourc&Target Data Base

STEP2: Create ODBC connections for Sources & Target DB

Set Up Target Data Base:

Start--- >Programs---> Oracle---> Application Development---> SQL PLUS

Log on to Oracle with following details

Create User:

SQL>SHO USER

SQL>Create user BATCH7AM identified by TARGET;

Assign permission to User:

SQL>Grant DBA to BATCH7AM;

ETL Development process:

1.Creation of Source& Target Definitions

2 A Source Definition is created using Source Analyzer tools

3 A Source Analyzer connects to the Source DB using ODBC connection


1.Creation of Source Definition:

A Source Definition is created using Source Analyzer tools

A Source Analyzer connects to the Source DB using ODBC connection

2.Creation Of Target Definition:

A Target Definition is created using Target Designer tool in the Designer Client Component

A Target Designer connects to the Target DB using ODBC connection

3.Create Mapping(with or with out Business Rule):

A Mapping is made up of following metadata components

a) Source(E)
b) Business Rule(T)

c) Target (L)

A Mapping with out business rule known as Flat Mapping

A Mapping is created using Mapping Designer Client component

4.Creation of Session:

1.A Session is a task that runs the mapping

2.It is created using Task Developer tool in Work flow Manager Client component

3.Every Session is configured with the following details

a) Source Connection

b) Target Connection

c) Load Type

Creation of Reader Connection(Oracle):

From the client Power center work flow Manager Select connections menu click on Realational select
the type Oracle click on New Enter the following details
Creation of Writer Connection(Oracle):

From the client Power center work flow Manager Select connections menu click on Relational select the
type Oracle click on New Enter the following details

Configuring the Session:

1.Double click the session select the mapping tab from left window select the source

2.From Left window select source and from connections section click on ( down arrow) to open
relational connection browser select connection ORACLE_SCOTT_DB

3.From Left window select target and from connections section click on(down arrow) to open relational
connection browser select connection ORACLE_BATCH7AM_DB click Ok

4.From properties section select target load type=Normal click apply and click Ok

5.From Repository menu Click on Save

Create Work Flow:

From Client Power Center Work flow Manager select Tools menu click on Work flow Designer

2.From work flow menu select create enter the Work flow Name w_s_flatmapping_oracle

3.From left window drag the session drop in Work flow Designer

4.From Tool Bar click on Link Task

5.Drag the link from start drop on session instance

6.From Repository menu click on Save


7.Run work flow

8.From Workflow menu click on Start Workflow

Creation of target tables Using Target Designer Tool:

1.Open the click Power Center Designer from Tools menu select designer

2.From left window expand sources subfolder

3.Drop the source definition [EMP] to the target Designers work space

4.Double click the target definition click on Rename DIM_EMPLOYEES

5.Select columns tab from tool bar click on Cut to delete columns

6.From tool bar click on add a new column click apply click Ok

7.From target menu click on generate/Execute SQL

8.Click on connect and connect to the DB with the following Details

Select Create table& Click on Generate&Execute and click Ok then the SQL stores in a file ,file
name called MKTABLES.SQL
Transformations&Types of Transformations:

A transformation is a power center object which allow you to develop the business rules to process the
data in desired business formats.

Transformations are categorized in two types

1.Active transformation

2.Passive Transformation

1.Active transformation:

A transformation that can effect the no of rows(or) change the no of rows is known as Active
transformation

The following are the list of active transformations used to process the data

1.Source Qualifier Transformation

2.Filter Transformation

3.Rank Transformation

4.Sorter Transformation

5.TransactionControl Transformation

6.UPDATE_DATE Strategy Transformation

7.Normalizer Transformation

8.Aggrigator Transformation

9.Joiner Transformation

10.Union Transformation

11 .Router Transformation

12.SQL Transformation

13.JAVA Transformation

14.Look Up Transformation(From 9.0 version on wards Act as Active transformation)


1.Passive transformation:

A transformation that doesn’t effect the no of rows(or) does n’t change the no of rows is known as
Passive transformation

The following are the list of active transformations used to process the data

1.Look UpTransformation( Up to Informatica 8.6 act as Passive transformation)

2.Expression Transformation

3.SQL Transformation(it Act as Duel Transformation)

4.Stored Procedure Transformation

5.Sequence Generator Transformation

6.XML Source Qualifier Transformation

Ports & Types of Ports:

A port represents column of the table (or) File

Every Transformation can have two basic types of Ports

1.Input Port(I)

2.Output Port(O)

Input Port(I): A Port which can receives the data is known as Input Port

Output Port(O): A port which can can provide the data is known as Output Port

Connected &Unconnected Transformations:

Connected Transformation:

A Transformation which is the part of mapping in Data flow direction is known as connected
Transformation
2.Connected to the source and connected to the Target

3. A connected transformation can receive multiple Input Ports and Can return Multiple Output Ports

Note: All active and passive transformations can be configured as connected transformation

Un Connected Transformation:

A Transformation which is not a part of Data flow direction neither connected to the source nor
connected to the target is known as Un Connected Transformation

2.Can receive multiple input ports but it always returns a single Output Port.

3. The following transformations can be configured as Un Connected

Look Up Transformation

Stored Procedure Transformation

Filter Transformation:

1.It is an active transformation that can filter the records based on the given condition

2. The condition can define on single /multiple ports

3. The integration service evaluates the condition writ tens True/False

4. True indicates that the records are allowed for further processing (or) Loading the data into target

5. False indicates that the records are rejected from filter transformation

6. Rejected records can’t be captured(even can’t be identified in session log)

7. The Filter transformation functions as where clause in SQL

8. The filter transformation supports single condition on one/more ports

Limitations :

Allows you to define only a single condition

Rejected records can’t be captured


Performance Considerations:

1.Keep the filter transformation as close o the Source Qualifier transformation as possible o filter the
rows early in the data flow,as a result we can reduce the no of rows for further processing

2.copy the required ports from source qualifier to expression transformation

3.Consider the data concatenation rule while designing mapping

Expression Transformation:

1.It is a passive transformation which allow you to calculate the expression for each row

2.It performs row by row process

3.Expressions are developed using functions&arthematical operations

4.An expression transformation is created with 3types of ports

Input,Output,Variable

5.Expressions are developed either in output(O) or Variable ports(V)

6.Varible ports are recommended to create to simplify the complex expressions and reuse expressions

Scenario1:

Calculate the tax for each employee who belongs to the sales department ,If sal is greater than 5000
then calculate the tax as Sal*0.17 else calculated the tax as Sal*0.13

Sales department is identified with department identification no is 30

Logic:
Expression transformation

SAL-[I]

TAX[O] (IFF(SAL>5000,SAL*0.17,SAL*0.13)

LOAD_DATE[O](SYSDATE)

Scenario2:

Calculate the total salary for each employee based on Sal and Commission

Total sal=Sal+Comm

Comm May have Nulls

Logic:

Expression transformation

TotSal=IIF(ISNULL(COMM),SAL,SAL+COMM)

Scenario3:

Implement LIKE operator using filter transformation in job column of EMP table ‘SALESMAN’ is
represented 3 different format

SALESMAN

SALES-MAN

PRE-SALES

Variable Port:

A port which can store the data temporarly is known as variable port(v)

2.Varible ports are created to simplify the complex expressions and reuse expressions in several Output
Ports

3.Varible ports are local to the transformation

4.Increase the efficiency of calculations


5.The default value for numerical variable port is “0”

6.The default value for variable port with data type string is “space”

7.Varible ports are not visible normal view of transformation but in edit view

Router Transformation:

A Router Transformation it is of type an active transformation which allows you to create multiple
conditions and passes the data to the multiple target

2.A router transformation is created with two types of the groups

1.Input Group

2.Output Group

Input Group:

Only Input Group can receive the data from source pipe line

Output Group:

Multiple Output Group categorized in to two types

1.User defined Output group

2.Default group

1.User defined Output group:

1.Each user defined output group has one condition

2.All Group conditions are evaluated for each row

3.One row can pass multiple conditions

Default Group

1.Always one default group

2.Captures the rows that fails all group conditions(Rejected records)


Performance Considerations:

The router transformation has a performance advantage over multiple filter conditions because .A row
is read once into Input Group but evaluated multiple times based on the no of groups,where as using
multiple filter transformation requires the same data to be duplicated for each filter transformation.
Source Qualifier Transformation:
1.An active transformation that can read the data from relational sources and flat files

2.Can define SQL override when source is relational data base

3 SQL Override: It is the process of changing or overriding default SQL

4.User Defined SQL select statement

5.The following properties can be defined with source qualifier transformation

a) SQL Query(SELECT)

b) User defined Join(WHERE)

c)Source Filter(Modified WHERE)

d) No of sorted Ports(Order By)

f) Select Distinct(Distinct)

g) Pre&Post SQL Commands

SQL Query:Allows you to over ride the default select query


User Defined Joins:

1.Join Separate sources using where clause can join any no of source tables

2.Supports Homogeneous relational tables Only

3.Supports Standards SQL joins like Inner Join (Or) Equie Joins

a.Inner Join

b.Left Outer Join

c.Right Outer Join

d. Full Outer Join

Source Qualifier-Advantages:

1.Can join any no of tables full functionality of standard SQL available

2.May reduce the volume of data on network(when we are writing where clause in SQL statement)

Source Qualifier-Dis Advantages:

1.Can only join homogeneous relational tables

2.Can effect the performance on the source database(when source database cache is less memory)

Difference Between Joiner and Source Qualifier Transformation

Joiner:

Can join only two tables

2.Suports heterogeneous relational tables

3.Join uses Heterogeneous Cache

4.Supports Non relational sources(Flat Files)

Source Qualifier:

1.Can join any no of tables

2. supports homogeneous relational tables

3. Source Qualifier uses Database Cache

4. Doesn’t support non relational sources


Aggregator Transformation:
1.Aggrigator is active and connected transformation

2.Use Aggregator transformation to perform aggregate or group level calculations

Ports In Aggregator:

Input Port:

It receives the data from source or other transformation

Output port:

To perform aggregate calculations to pass data to target or to the transformation

Variable Port:

To perform non aggregate calculation

Note: Aggregate functions is not allowed in Variable port

Group By Port: To specify the Groups

1.By default Integration service returns last record from each group ,if no group by port is specified
integration service consider entire data as single group and returns last record

2.Aggrigator transformation supports both single and multiple aggregate functions

Single: Min(sal)

Multiple: Min(Avg(sal))

3.Aggrigator doesn’t support multilevel aggregate functions with in a single Aggregator transformation

4.Another important functionality of aggregator is DE duplication

Aggregate Expression:

1.Aggrigate expressions are allowed in aggregator transformation only

2. Aggregate functions are always allowed in Output port

Examples for Aggregate functions:

Min,Max,Avg,First,Last,Sum,Count
Sorted Aggregator:

An aggregator transformation with sorted input option enables is called sorted aggregator

Guide line to implement sorted aggregator

1. Perform sorting all group by ports before passing data to aggregator

2. If multiple ports are selected for group by perform sorting on all ports in same order

3.If sorted I/P is enabled, unsorted data provided integration services fails the Session

Performance Tuning:

To improve the performance of aggregator then enable sorted I/P

Union Transformation:
This is of type active transformation which combines similar data sources into a single result set or
data set

2. Union transformation functions as Union All set operator in SQL

3.Union transformation created with two types of groups

a) Input Group(Multiple I/P Groups)

b)Output Group(Always one O/P Group provides results set)

Each I/P Group receives the data from source pipeline

4.Union transformation supports Heterogeneous data sources i.e different data bases

Note: Union eliminate duplicates

Union All allows duplicates

Union All performance wise more faster


Sorter Transformation:
1. Sorter is Active and connected transformation

2. use sorter transformation to perform sorting in ascending or descending order

3. To perform sorting based on Case sensitivity

4. To perform distinct operations on records

5. Sorter transformation it contain I/P,O/P,Key ports

6. Use key port to specify based on which column sorting has to be performed

7.You can select one more ports as key ports

8.If more than one port is selected as a key port integration service perform sorting on all columns in
sequential order from top to bottom ,However ports appears in sorter transformation

9.With in a single sorter transformation both ascending and descending sort order can be configured

10.If a sorter transformation is configured with out Key port& Distinct option Integration service makes
the mapping is Invalid
11. If distinct option is selected then sorter transformation eliminate duplicate records hence sorter will
be called as a active transformation

Sorter Cache:

1.Integration service creates single cache to process sorter transformation

2.when a session started then sorter transformation Integration services Cache the data before
performing sort operation

3.Integration service perform sort operation inside cache memory based on key column&sort order,
return the records out

4.If Cache memory is less than required memory space to perform sorting,integration service writes the
data to disk memory

5.the process of writing data to disk memory&swapping that between cache memory&disk is called
paging

6.In paging reduce the performance of sort operation

7.To improve the performance of sort operation ,increase the Cache memory.

Stored Procedure Transformation:

1.This is of type passive transformation that can import the stored procedure from database

2.A stored procedure transformation can configure in to two different categories

a) Connected stored procedure

b) Un connected stored procedure

Connected stored procedure:

1.It is connected to the source and connected to the target

2.Can receive multiple I/P ports and can return Multiple O/P ports
Un Connected stored procedure:

1.Niether connected to the source and Nor connected to the target

2.Can receive multiple I/P ports and can return a single O/P port

Uses:

1.Calculations per each row

2.Dropping and re creates Index

3.Calculating the space required for Loading

PL SQL Program:

CREATE OR REPLACE PROCEDURE STG_CALC_PROCS

V_EMPNO IN NUMBER,

TOTSAL OUT NUMBER,

TAX OUT NUMBER,

HRA OUT NUMBER

IS

BEGIN

SELECT SAL+NVL(COMM,0),SAL*0.1,SAL*0.4

INTO

TOTSAL,TAX,HRA

FROM EMP

WHERE EMPNO=V_EMPNO;

END;

/
Procedure for Implementing stored procedure:

Source: Table[emp]

Target:Table[empno,ename,sal,comm,totsal,tax,hra)

Mapping:M_STG_EMP_ST_PROC

1.Drag and drop the source def,target def in mapping designer work space

2.From transformation menu select create select transformation type stored procedure

Enter the name click on create connect to the DB with following details

ODBC Data source Name:

Username:

Owner Name:

Password:

Then next click on connect then select the procedure of name STG_CALC_PROCS from scott user click
on OK

3.From SQ_EMP connect port Empno to the Stored Procedure

EMPNO->V_EMPNO

4.From stored procedure connect the 3 output ports to the target

5.double click the stored procedure transformation select properties tab

6.From SQ connect a remaining ports the target

7.Save mapping,create session,create workflow,run work flow


Transformation Control Transformation:

1.TCL is a active and connected transformation

2.A se t of rows that are bounded by commit or rollback is called transaction

3.Informatica power center the transactions can be controlled in two different ways

a)Mapping

b)Session

Configuring user defined commit:

User defined commits has to be specified at two levels

a) Maping Level

To configure user defined commit use TCL transformation in mapping

b) session lEVEL

To configure user defined commit at session level set commit type as userdefined

TCL Variable:

To configure user defined commit type TCL provides the following variable
1.TC_CONTINUE_TRANSACTION

2.TC_COMMIT_BEFORE

3.TC_COMMIT_AFTER

3.TC_ROLLBACK_BEFORE

4.TC_ROLLBACK_AFTER

Transaction control-Mapping:

If we want to control the transaction based on a given condition then create a transaction control
transformation in mapping

is known as user defined commit

Ex: IF(SAL>8000,TC_COMMIT_AFTER,TC_ROLLBACK_AFTER)

Transaction Control-Session:

1.Transaction can be controlled based on no of rows

2.define commit interval property at session level

3.A commit interval is the no of rows that you want to use as a basis for commits

4.default commit interval is 10,000

5.Session can be configured with commit type

a) Source

b) Target

c) Userdefined

6.Default commit type is target


Rank Transformation:
This is of type an active transformation which allows you to calculate the ‘TOP” and “BOTTOM”
performance

The Rank Transformation is created with following types of ports

1.I/P port

2.O/P port

3.V/Port(Varible)

4.R/port(Rank)

Rank Port:

The port which is participated in rank calculations is known as Rank port

Variable Port:

A port which allow you to develop expression to store the data temporarily for rank calculation is known
as variable port

Variable port to support to write expressions which are required for Rank calculation

Set the if properties to calculates the ranks

1.TOP(OR )Bottom=TOP

2.No of Ranks=3
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO

----- ---------- --------- ----- --------- ----- ----- ------

7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30

7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30

7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30

7698 BLAKE MANAGER 7839 01-MAY-81 2850 30

7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30

7900 JAMES CLERK 7698 03-DEC-81 950 30

EMPNO ENAME JOB SAL DEPTNO TAX

----- ---------- --------- ----- ------ -----

7698 BLAKE MANAGER 2850 30 484.5

7499 ALLEN SALESMAN 1600 30 272

7844 TURNER SALESMAN 1500 30 255

EMPNO ENAME JOB SAL DEPTNO TAX RANKINDEX

----- ---------- --------- ----- ------ ----- ---------

7698 BLAKE MANAGER 2850 30 484.5 1

7499 ALLEN SALESMAN 1600 30 272 2

7844 TURNER SALESMAN 1500 30 255 3


Mapplets&Types of Mapplets:
1.A Mapplet is a reusable object created with the business rules using set of transformation

2.A mapplet is created using mapplet designer tool in designer client component

There are two types of mapplets

a) Active Mapplet
b) Passive Mapplet
1.Active Mapplet:

A mapplet is created with at least one active transformation is known as active mapplet

2.Passive Mapplet:

A Mapplet which is created with all passive transformations is known as passive mapplet

Mapplet Limitations:

Keep the following instructions in the mind while creating the mapplet

1 If you want to create a stored procedure T/R you should create stored procedure with type normal

2.If you want to use a sequence generator T/R you should use reusable sequence generator T/R

3.The following T/R can’t be used

a) Normalizer T/R

b) XML Source qualifier T/R

c) PRE-POST Stored procedure T/R

d)Mapplet(Nested Mapplet cant be created)

Advantages:

1.Business rules can be reused across multiple mappings

2.Save Development time

Note:

Set of T/R are placed between Mapplet 1/P&Mapplet O/P

A Mapplet can also be created without Mapplet I/P T/R


Create a Mapplet without Mapplet I/P T/R:

Reusable Transformation:
A reusable Transformation is a reusable object created with business rules using Single T/R

There are two ways to create a reusable T/R

1.Using Transformation Developer Tool


2.Converting Non reusable T/R in to reusable T/R

Limitations:

Source qualifier T/R can’t be create as reusable Transformation

Trasnformation Developer Tool:

Open the client Power center Designer from tools menu select Transformation Developer,from
Transformation menu select Create and select the Transformation type Sequence Generator and enter
the name “DW_KEY” click on create and done

And next repository menu click on save

Converting Non reusable T/R into reusable T/R:

Select a Mapping in a Mapping Designer workspace,select the T/R which you want to convert to
reusable,double the transformation and from Trasformation tab select

make reusable next clik on YES and click Apply and click Ok

Key Points:

When we drag the reusable transformation to the mapping designer work space it will created as
instance,you can modify the instance properties,that doest not reflect on original Object

User Defined function(UDF):

1.It is a reusable object that extends power center built in functionality

2.It can be private or public

3.It is created using power center designer client

4. :UDF is an identifier that identifies User Defined function

Procedure for creating User Defined function:

From Left window select User Defined function sub folder from tools menu select “user defined
function” click on “NEW” provide the function name “TRIM” and select type “Public” and next clik on
new Argument
And next clik on Launch Editor

Write the function LTRIM(RTRIM(Arg1))

And from repository menu click on Save

Constraint Based Load Order:


Loads the data in an order which is based on Primary key and forigen Key relation ship

Business Purpose:

Loading data into Snowflake Dimensions which are related to primary key and forigen key relations ship

Constraint Based Loding:


Design a mapping described above

Create a session and double click the session and Select Configure Object

Target Load Plan:


Define the order In which data Extracted from Source Qualifier
Single Mapping Two Pipe lines:

Procedure:

Design a Mapping with two pipelines as described above

Select the Mapping Menu Click on “Target Load Plan” and change the Load Order using Up and Down
arrows click ok and save mapping

Unconnected Look Up Transformation:


1. It is not a part of mapping data flow direction

2.It is Neither connected to the source nor connected to the target


3.It can receive the multiple I/P port but it returns single output port that should designated as return
port[R]

4. If the [R] port is not checked then the mapping is valid ,but the session created for that mapping will
fail at run time.

5.An unconnected look up is very commonly used when the look up is not needed for every input
record

6.Look up data is called at the point in the mapping that needs it

7.Look up function can be set with in any transformation that supports to write expressions(Expression
transformation)

8.Use a look up function within conditional statement(IIF())

9.The condition is evaluated for each record, but the look up function is only called if the condition
evaluates True

10.The un connected look up transformation is called following key expression

(:LKP.LookupName)

Business Purpose:

1.A source table or file may have a percentage of records with incomplete data.the holes in the data can
be filled by performing a look up to another table or tables.

2. As only a percentage of the rows are effected it is better to perform the look up on only those rows
that need it and not the entire data set

SCD-Type1:
SCD Type1 is used to maintain current status of the data in Dimension table

2.SCD-I never maintains the historical data

Business Functionality:

1.Insert New records which are coming from Source

2.UPDATE_DATE existing records that are coming with change from source

EXAMPLE DATA:
EMPNO ENAME JOB SAL

7369 SMITH CLERK 800

7499 ALLEN SALESMAN 1600

INITIAL LOAD

EMP_SURR_KEY EMPNO ENAME JOB SAL

1 7369 SMITH CLERK 800

2 7499 ALLEN SALESMAN 1600

SMITH PRAMOTED TO CLERK TO ANALYST

EMP_SURR_KEY EMPNO ENAME JOB SAL

1 7369 SMITH ANALYST 800

2 7499 ALLEN SALESMAN 1600

3.To verify the existence of records in target

4.to assign flag(INSERT/UPDATE_DATE) for each record

5.To route records INSERT/UPDATE_DATE how

6.To generate surrogate key

7.To change row type to UPDATE_DATE

8.Target Instance for INSERT

9.Target Instance for UPDATE_DATE


Implementation of SCD-TYPE1:

1.create a target with name EMP_SCD1(EMP_SURR_KEY+EMP)

2.Create a Mapping with name M_EMP_SCD1

3.Drag & Drop EMP source definition,Two instances of target EMP_SCD1 into mapping designer
workspace

4.Rename the target instances as EMP_SCD1_INSERT,EMP_SCD1_UPDATE_DATE

5.Create a Look Up T/R on target EMP_SCD1 with name LKP_EMP_SCD1

6.Select EMPNO port from Source Qualifier and Connect to Look Up T/R

7.Double click on Look UP T/R go to condition tab and enter the below condition

EMPNO=EMPNO1

8.Go to ports tab and configure the ports as shown below

9.Click on apply click on Ok

10.Create an Expression T/R,Select EMP_SURR_KEY,JOB,SAL ports from Look Up T/R and connect to
Expression T/R

11.Select All ports from Source Qualifier and connect to Expression T/R

12.Double click on Expression T/R ,Go to ports tab and add two O/P Ports INSERT,UPDATE_DATE and
enter the Expression as shown below.

INSERT =IIF(ISNULL(EMP_SURR_KEY),’TRUE’,’FALSE’)

UPDATE_DATE= IIF(NOT ISNULL(EMP_SURR_KEY) AND (JOB!=JOB1 ORS SAL!=SAL1),’TRUE’,’FALSE’)


13.Click on Apply and click on Ok

14,Create a ROUTER T/R ,Select EMP_SURR_KEY,All Ports coming from source,INSERT,UPDATE_DATE


Ports from EXPRESSION T/R and connect to ROUTER T/R

15.Double click on ROUTER T/R go to group tab and add two groups as shown below

15.Click on Apply and click on OK

Implementing INSERT Flow:


1.Create a SEQ GENERATOR T/R,collect next value port to EMP_SURR_KEY in EMP_SCD1_INSERT(target
Instance)

2. Select EMPNO,ENAME--------DEPTNO from INSERT group of ROUTER and connect to


EMP_SCD1_INSERT(target Instance)

Implementing UPDATE_DATE Flow:


1.Create an UPDATE_DATE STRATEGY T/R ,Connect EMP_SURR_KEY,JOB,SAL ports from UPDATE_DATE
group of ROUTER T/R to UPDATE_DATE STRATEGY T/R

2.Double click on UPDATE_DATE STRATEGY T/R go to properties tab and enter the UPDATE_DATE
strategy expression value as DD_UPDATE_DATE or 1

3. Connect EMP_SURR_KEY,JOB,SAL ports from UPDATE_DATE STRATEGY T/R to


EMP_SCD1_UPDATE_DATE target instance

4.Save the Mapping and Create Session with name S_M_EMP_SCD1 and create Work flow with name
W_S_M_EMP_SCD1,Execute Work flow

Note: Don’t Enable Truncate Target table option in session properties

SCD-TYPE-2:

SCD-Type2 method is used to maintain current data along with complete history in Dimension tables

Business Functionality:

1.Insert New records that are coming from source


2.Insert existing records that are coming source with change

SOURCE

EMPNO ENAME JOB DEPTNO

7369 SMITH CLERK 20

7499 ALLEN SALESMAN 30

TARGET

EMP_SURR_KEY EMPNO ENAME JOB DEPTNO IND START_DATE END_DATE

1 7369 SMITH CLERK 20 Y 17/12/1980

2 7499 ALLEN SALESMAN 30 Y 20/02/1981

SMITH PRAMOTED FROM CLERK TO


ANALYST

EMP_SURR_KEY EMPNO ENAME JOB DEPTNO IND START_DATE END_DATE

1 7369 SMITH CLERK 20 N 17/12/1980 31/12/2014

2 7499 ALLEN SALESMAN 30 Y 20/02/1981

3 7369 SMITH ANALYST 20 Y 01/01/1015

Blue Print or Proto type of SCD-Type2:

3.To verify the existence of records in target

4.To collect rows from Look up to SQ

5.To route records INSERT/UPDATE_DATE


6.To generate Date&IND values

7.To generate surrogate key

8.Target Instance for INSERT

9.To generate Date&IND values

10.To change row type to UPDATE_DATE

11.Target Instance for UPDATE_DATE

Steps to implement SCD-Type2:


1.Create a EMP_SCD2(EMP_SURR_KEY+EMP+STARTDATE,ENDDATE,IND)

2.Create a Mapping with Name M_EMP_SCD2,drag &drop EMP source Defination,two instances
EMP_SCD2 target into mapping designer workspace

3. Rename the target instances as EMP_SCD2_INSERT and EMP_SCD2_UPDATE_DATE

4.Create a Look up T/R on target EMP_SCD2

5. Select EMPNO port from SQ T/R and connect to Look UP T/R

6.Double click on Look Up T/R go to ports tab and configure the ports as shown below

7.Go to condition tab and add below condition

EMPNO=EMPNO1

8.Go to properties tab and enter below query for Look Up SQL override attribute

SELECT EMP_SCD2.EMP_SURR_KEY AS EMP_SURR_KEY,

EMP_SCD2.JOB AS JOB

EMP_SCD2.SAL AS SAL
EMP_SCD2.EMPNO AS EMPNO

EMP_SCD2

WHERE EMP_SCD2.ENDDATE IS NULL

OR

EMP_SCD2.IND=’Y’

10.Create an Expression T/R select all ports from Look Up T/R,Source Qualifier and connect to
Expression T/R

11.Create a ROUTER T/R,Select all ports from EXPRESSION T/R and connect to ROUTER T/R

12.Double click on ROUTER T/R and two groups as INSERT,UPDATE_DATE enter below expression for
INSERT and UPDATE_DATE group

13.

INSERT=ISNULL(EMP_SURR_KEY)

OR

(JOB!=JOB1)

OR

(SAL!=SAL1)

UPDATE_DATE=NOT ISNULL(EMP_SURR_KEY)

AND

(JOB!=JOB1 OR SAL!=SAL1)

14.Click on Apply and click on OK

Implementing INSERT flow:

1.Create an Expression transformation, select EMPNO,ENAME------DEPTNO from INSERT group of


router and connect to EXPRESSION transformation.

2.Double click on Expression transformation ,go to the ports tab and add two O/P ports STARTDATE,IND

3.
4.Click on Apply and click on OK

5.Select All ports from EXPRESSION T/R and Connect to EMP_SCD2_INSERT target INSTANCE

6.Create SEQUENCE GENERATOR T/R,Connect next value port to the EMP_SURR_KEY column in
EMP_SCD2_INSERT target Instance

Implementing UPDATE_DATE flow:

1.Create an EXPRESSION T/R,Select EMP_SURR_KEY port from UPDATE_DATE group of ROUTER T/R
and connect to the EXPRESSION T/R

2.Double click on EXPRESSION T/R and add two OUTPUT ports

3.Click on Apply and Click on OK

4.Create an UPDATE_DATE STRATEGY T/R,Select All ports from EXPRESSION T/R and connect to
UPDATE_DATE STRATEGY T/R

5.Double click on UPDATE_DATE STRATEGY T/R ,go to properties tab and enter DD_UPDATE_DATE or 1
as UPDATE_DATE STRATEGY EXPRESSION click on Apply and click on OK

6.Select All ports from UPDATE_DATE STRATEGY T/R and connect to respective ports in
EMP_SCD2_UPDATE_DATE target instance

7.Create a session,workflow,Run work flow

Note: Don’t Truncate target table option in session properties


Importing& Exporting repository objects:
Exporting Repository Objects:

1.Repositary Objects such as Mappings,Sessions etc can be exported into a metadata file format called
.XML

2.Use Repository Manager client to perform object exports

Procedure:

1.Open client Powercenter repository Manager

2.From Left window expand the folders

3.Expand Mappings,Select all the Mappings

4.From Repository Menu click on Export Object

5.Select the file directory enter the file name “Dev_20150512 and click on save

Import Repository Objects:

Repository Objects such as Work flows,Sessions,Mapping etc can be imported from metadata file called
.XML

Procedure:Create a New folder from left window select the new folder(Repository Manager Client)

1.From repository menu click on Import Objects

2.click on browse to select an XML file

3.Select XML file and click on OK

4.Click on Next

5.Click on Add All

6.Click on Next

7.Choose the destination folder and click on Next and again click on Next and click on Import

8.Click on Done

9.Select the folder and right click and click on refresh


Source Qualifier Transformation:
1.It is an active transformation and that can read the data from relational sources and flat files

2.Can define SQL override when source is a relational DB

3.SQL override: it is the process of changing or overriding default SQL

4.User defined SQL select statement

5.The following properties can be defined in Source Qualifier transformation.

a) SQL query(SELECT)

b) User defined join(WHERE)

c)Source filter(Modified WHERE)

d)No of sorted ports(ORDER BY)

f) Select Distinct(DISTINCT)

g)Pre and Post SQL commands

6.SQL query: Allow you to override the default select statement

7.User Defined Join:Joins separate sources using where clause can join any no of source tables

8.It supports homogeneous relational tables onlyr

9.Supports Standard SQL joins like Inner,Left,Right,Full Outer joins

Advantages of Source Qualifier:

1.Can join any no of tables full functionality of standard SQL available

2.May reduce the volume of data on network (when we are writing where clause in SQL statement)

Dis Advantages of Source Qualifier:

1.Can only join homogeneous relational tables

2.Can effect performance on the source database


Flat Files:

A data file with .txt or .csv or .dat is called as Flat file

2. A flat file can be used as Source and target

3.There are two types of flat files

a) delimated flat files

b)Fixed width flat file

Delimated Flat File:

1.In a delimated flat file each column will be separated by a special char like
,Comma(,),Pipe(|),Dollor($) etc

2Combination of special chars also can be used as delimator

3.Most commonly used delimator is Comma

File List:

1.File list is the method used for loading the data from multiple files to single target by using single
source definition

2.File list is also called indirect loading method.

3.File list can be apply on files that are having similar metadata
Mapping Parameter:

A Parameter represents a constant value which can be defined before mapping run.

1.Mapping parameter is a constant value which means the value remains same throughout the session

2.Mapping parameter s are local to the mapping

3.Maping parameters can be used with SQL override

4.A mapping parameter is created with name, data type,precision,scale

Advantages:

1.Mapping parameters are created with to standardize the business and increase the flexibility in
development

2.Mappinmgs can be reused for various constants

3.Mapping parameters can be defined with constant value in parameter file which is saved with an
extension either .txt or .prm

Syntax of parameter file:

[FOLDER.WF:WORKFLOW.ST:SESSION]

$$Param1=const value
$$Param2=const value

$$Param3=const value

Procedure:

1.Source table column:EMP

2.Target Table column:STG_EMP_PARAM[Empno,Ename,Job,Sal,Tax,Deptno]

3.Create a mapping with name ‘M_Stg_Emp_Param’

4.Drop the source & target definations

5.Create a transformation type expression

6.From Mapping menu select parameters&variables

7.From tool bar click on add a new variable

Name Type Data type Prec Scale

$$DNO Parameter Integer 10 0

$$PERCENT Parameter decimal 6 2

8.Click ok

9.From Source Qualifier copy the required ports to the expression Empno,Ename,Job,Sal,Deptno

10.double click the source qualifier and select the properties tab

Transformation Attribute Value

SELECT EMP.EMPNO,EMP.ENAME,EMP.JOB,EMP.SAL
SQL Query EMP.DEPTNO FROM EMP WHERE EMP.DEPTNO=$$DNO

11.Click apply & click ok

12.Double click the expression transformation select the ports tab

Portname Data type Prec Scale I O V

Tax decimal 7 2 yes


In the Expression Write the following derivation SAL*$$PERCENT

13.From exp connect the ports to the target

14.Save Mapping

Creation of Parameters file :

1.Open the notepad and type the following syntax

[BATCH7AM.WF:W_S_M_PARAM.ST:S_M_PARAM]

$$DNO=20

$$PERCENT=0.25

2.From file menu click on Save Test.txt

C:\Param\Test.txt

3.Create a session with name ‘S_M_PARAM’

4.Double click the session select the properties tab’

Attribute Value

Parameter file name C:\Param\Test.txt

5.Select the mapping tab define reader and writer connections

6.Click Apply and Click Ok

7.Save the session

8.Create work flow with name ‘W_S_M_PARAM’

9.Run workflow

Advantages: Use mapping parameters to perform incremental extraction from source.


SQL Transformation:
1. It is active and Passive and connected transformation

2. It is used to process SQL queries in the mid stream of pipeline, we can insert and UPDATE_DATE and
delete and retrieve rows from the data base at run time using the SQL transformation

3. The SQL transformation process external SQL scripts or SQL queries created in the SQL editor

Script Mode (Passive):

The SQL transformation runs SQL scripts that are externally located, we pass a script name to the
transformation, with each input row, The SQL transformation O/P‘s one for each I/P row

Query Mode (Active):

The SQL transformation executes a query that we defined in a query editor we can pass strings or
parameter s to the query to define dynamic queries or change the selection parameters, we can O/P
multiple rows when the query has select statement

Example:

Create SQL script and save it in notepad file

INSERT INTO FIR_AGG(

SELECT DISTRICT,COUNT(*) AS TOTALFIR,COUNT(ONLINEFIR) AS ONLINEFIR,COUNT(OFFLINEFIR) AS


OFFLINEFIR FROM

SELECT DISTRICT,

CASE WHEN INSERT_DATE=UPDATE_DATE THEN 1 ELSE NULL END AS ONLINEFIR,

CASE WHEN INSERT_DATE=UPDATE_DATE THEN 1ELSE NULL END AS OFFLINEFIR FROM FIR)Q GROUP BY
Q.DISTRICT)

Save as C:\SOURCE\FIR_SCRIPT.txt

And also create one more note pad text file and save it as C:\SOURCE\SCRIPT_ADD.txt
PC DESIGNER:

SA:Import C:\SOURCE\SCRIPT_ADD.TXT

TD:FIR_RES(FLAT FILE)

STATUS STRING 10

MESSAGE STRING 1000

MD:MAP_SQL

Create SQL transformation S1 and select SCRIPT MODE ok

Note:Edit session Mapping tab SQ_FIR_ADD assign C:\SOURCE\FIR_ADD.txt


Informatica SCD Type1:
Source: CUST

CID

NAME

DOB

Create table CUST(CID NUMBER,NAME VARCHAR2(12),DOB DATE)

Insert into CUST values(11,’BEN’,’12-JAN-67’)

Insert into CUST values(12,’ALEX’,’15-JAN-62’)

Insert into CUST values(13,’JOHN’,’13-FEB-85’)

SELECT * FROM CUST;

CID NAME DOB

---- ------------ ---------

13 john 13-FEB-85

11 ben 12-JAN-67

12 alex 15-JAN-62

Target: CUST_TYP1

CKEY

CID

NAME

DOB
Mapping Designer: M_SCDTYPE1

Save Mapping and Create Session with Name S_M_SCDTYPE1

Workflow Designer: Create workflow with name Wf_S_M_SCDTYPE1

Step: Run Workflow

After that check the data in target table

SELECT * FROM CUST_TYP1;

CKEY CID NAME DOB

----- ---------- ------------ ---------

100 13 john 13-FEB-85

101 11 ben 12-JAN-67

102 12 alex 15-JAN-62


Update source table

UPDATE CUST SET DOB ='16-AUG-85' WHERE CID=13;

UPDATE CUST SET DOB='14-FEB-68' WHERE CID=11;

Start workflow once again and check data in target table

SELECT * FROM CUST_TYP1;

CKEY CID NAME DOB

---- ---------- ------------ ---------

100 13 john 16-AUG-85

101 11 ben 14-FEB-68

102 12 alex 15-JAN-62

Informatica SCD Type2:

Source:CUST

CID

NAME

LOC

Create table CUSTSRC(CID NUMBER,NAME VARCHAR2(20),LOC VARCHAR2(20));

Insert into CUSTSRC values(11,’BEN’,’CHE’)

Insert into CUSTSRC values(12,’ALEN’,’MUM’)

Insert into CUSTSRC values(13,’RAM’,’PUN’)


SELECT * FROM CUST;

CID NAME LOC

---- -------------------- ---------

11 BEN CHE

12 ALEN MUM

13 RAM PUN

Target: CUST_TYPE2

DROP TABLE CUST_TYPE2;

CREATE TABLE CUST_TYPE2

CKEY number,

CID number,

NAME varchar2(20),

LOC varchar2(20),

FLAG Number

);

Mapping Designer: M_SCDTYPE2


Create Session with Name S_M_SCDTYPE2 and create Workflow with name Wf_S_M_SCDTYPE2

Step: Start workflow

Check the data in target table

SELECT * FROM CUST_TYPE2;

select * from cust_type2;

CKEY CID NAME LOC FLAG

----- ---------- -------------------- ---------------------------------------

100 11 BEN CHE 1

101 12 ALEN MUM 1

102 13 RAM PUN 1


update cust set loc='BLG' where CID=11;

update cust set loc='CHE' where CID=12;

SELECT * FROM CUST;

CID NAME LOC

---- -------------------- ---------

11 BEN BLG

12 ALEN CHE

13 RAM PUN

After that Start Workflow Wf_S_M_SCDTYPE2

And check data in target table

Select * from CUST_TYPE2;

CKEY CID NAME LOC FLAG

----- ---------- -------------------- -------------------- -------------------

100 11 BEN CHE 0

101 12 ALEN MUM 0

102 13 RAM PUN 1

103 11 BEN BLG 1

104 12 ALEN CHE 1

Informatica SCD Type3:


Source: CUSTSRC

CID

NAME

LOC

Create table CUSTSRC(CID NUMBER,NAME VARCHAR2(20),LOC VARCHAR2(20));


Insert into CUSTSRC values(11,’BEN’,’CHE’)

Insert into CUSTSRC values(12,’ALEN’,’MUM’)

Insert into CUSTSRC values(13,’RAM’,’PUN’)

SELECT * FROM CUSTSRC;

CID NAME LOC

---- -------------------- ---------

11 BEN CHE

12 ALEN MUM

13 RAM PUN

Target: CUST_TYPE3

DROP TABLE CUST_TYPE3;

CREATE TABLE CUST_TYPE3

CKEY number,

CID number(15),

NAME varchar2(20),

CLOC varchar2(20),

PLOC varchar2(10)

);
Mapping Designer: M_SCDTYPE3

Create Session with Name S_M_SCDTYPE3 and create Workflow with name Wf_S_M_SCDTYPE3

Step: Start workflow

Check the data in target table

SELECT * FROM CUST_TYPE3;

CKEY CID NAME CLOC PLOC

----- ---------- -------------------- ------------------------------------

100 11 BEN CHE

101 12 ALEN MUM

102 13 RAM PUN

Next Update source table data

UPDATE CUSTSRC SET LOC='BNG' WHERE CID=11;

UPDATE CUSTSRC SET LOC='VJY' WHERE CID=12;


COMMIT;

SELECT * FROM CUSTSRC;

CID NAME LOC

---- -------------------- -----------------

11 BEN BNG

12 ALEN VJY

13 RAM PUN

Next check the data in target table

SELECT * FROM CUST_TYPE3;

CKEY CID NAME CLOC PLOC

----- ---------- -------------------- -------------------- -------

100 11 BEN BNG CHE

101 12 ALEN VJY MUM

102 13 RAM PUN

Informatica SCD Type2(Version):


Source: CUST

CID

NAME

LOC

Create table CUST(CID NUMBER,NAME VARCHAR2(20),LOC VARCHAR2(20));

Insert into CUST values(11,’BEN’,’CHE’)

Insert into CUST values(12,’ALEN’,’MUM’)

Insert into CUST values(13,’RAM’,’PUN’)


SELECT * FROM CUST;

CID NAME LOC

---- --------------------- ---------

11 BEN CHE

12 ALEN MUM

13 RAM PUN

Target: cust_type2_vrsn

CREATE TABLE CUST_TYPE2_VRSN

CKEY number NOT NULL,

CID number,

NAME varchar2(20),

LOC varchar2(20),

VERSION number

);

Generate and execute

Mapping Designer: create Mapping with name M_SCDTYPE2_VERSION


Create session with name S_M_SCDTYPE2_VERSION and assign source and target and look up
connections

Next create workflow with name Wf_S_M_SCDTYPE2_VERSION

Before start work flow check data in source table and target table

SELECT * FROM CUST;

CID NAME LOC

------ -------------------- ----------

11 BEN CHE

12 ALEN MUM

13 RAM PUN

SELECT * FROM CUST_TYPE2_VRSN

No rows selected

Next Start workflow and check data in target table


SELECT * FROM CUST_TYPE2_VRSN;

CKEY CID NAME LOC VERSION

---- ---------- -------------------- -------------------- --------------------

100 11 BEN CHE 1

101 12 ALEN MUM 1

102 13 RAM PUN 1

UPDATE CUST SET LOC='BNG' WHERE CID=11;

UPDATE CUST SET LOC='CHE' WHERE CID=12;

Commit;

Start Workflow again and check the data in target table

SELECT * FROM CUST_TYPE2_VRSN;

CKEY CID NAME LOC VERSION

---- ---------- -------------------- -------------------- -------------------------

100 11 BEN CHE 1

101 12 ALEN MUM 1

102 13 RAM PUN 1

103 11 BEN BNG 2

104 12 ALEN CHE 2

Project Development Life Cycle:


X KICK OFF MEETINGS1

X KICK OF MEETINGS2

ANALYSIS PHASE

DESIGN PHASE

CODING PHASE
REVIEWS

TESTING PHASE

GO LIVE PHASE

SUPPORT

1.Analysis Phase:

Business Analyst:

Gathers Business requirements it in Business , It consists of Business process,Organization


structure,Target users requirements details,source system

Senior Team based on BRS provides Hardware and Software requirements

Outcome:

SRS (System requirement Specification) consist of the following details

1. Operating system has to be used

2. DB Tool to be used

3. ETL, OLAP, Modeling tools to be used

2.Design Phase:

Data warehouse Architect/ETL Architect provides solution to build the DW or Data marts

Requirements it in Business consists of Business process, Organization structure, Target users


requirements details, source system

Outcome:

HLD (High Level Design Document) consists of the following Details

1.Summary Information

2.Project Architecture

3.System Architecture

4.Source
5.DB

6.ETL tool Details

7.Data Flow Diagram

8.Data Model

9.Source Object Details

10.Target Object Details

11.Staging Object Details

12.Mapping Details

Senior Technical Team:

Provides detail technical specifications for each Mapping

Outcome: Low Level Design Document It consists of Source and Target Object Details

(Field Names,Data Types,Length,Description),Entire Mapping Flow,Detail technical Design for


each Mapping.Block Diagram,Business Logic Pre and Post Dependencies,Schedule Options,Error
Handling

ETL Team:

Mapping Design Document is prepared for each Mapping

Outcome: Mapping Design Document

3.Coding Phase:

Mapping is created based on Design document

Code Review: Code review is to check Business Logic and whether naming standards are
followed or Not

Peer Review:

Team member review the same as above mentioned, If everything is Ok then do testing
4.Testing Phase:

1. Unit testing (Mappings are tested by individual users debugger or enable test load to test
mapping with limited test data

2. SIT (System Integration testing): Mappings are tested according to their dependencies

3. UAT (User acceptance testing): Mappings are tested in the presence of onsite users

4.Production Phase:

Jobs are scheduled and monitored scheduling tools-UC4,DAC,Autosis,Control-M,Tivoli Work


Load Scheduler

Project Architecture:

Tell Me About Your Self:

Hi good morning

Myself is Bhaskar Reddy Allam

Coming to my education details, I have completed my MCA from a college which is affiliated to
JNTU University, Hyderabad.

Coming to my professional summary


I started my carrier as an employee in Infinite Computer Solutions joined there as a fresher. I
was trained over there on Data stage and Informatica and was mapped to a project.

My first experience in that project AP GBS DATA MART. We have developed data marts from
global data warehouse .after signing off from that project, I was mapped to another project in
the same company. In my second project we are worked for prudential insurance .there we
developed data warehouse for their proposed system. I worked as a Associate Software
engineer in IBM from Jun‘2014 –Oct’ 2016.

Later on that I was selected in IBM India Private Limited, Pune. I am working at IBM for client
GE. For GE we developed a data warehouse for their INTERNAL BILLING SYSTEM. Few days back
I got relieved from project and they are trying to map me into another project. I am working in
IBM from June’2014 to till date.

In my 6 years of experience I grown up as a developer gradually from time to time

In my 6 years exp I involved in many things like developing mappings, performance tuning, Data
profiling etc

In these 6 years of experience I gained hands on Experience on tools Informatica, data stage,
Information Analyzer, Quality stage, Trillium Discovery oracle and some knowledge on PL/SQL
and UNIX environment.

Documented By

Bhaskar Reddy Allam

Mail:abreddy2003@gmail.com

Mobile:9948047694

You might also like