You are on page 1of 8

I am giving generic explanation of the project.

Any project either


banking or sales or retail or e-commerce insurance can use this
explanation.
First you have to start with
1) You have to first explain about objective of the project and what is
client expectations
2) You have to start where ur involvement and responsibility
of your job and limitations of job.
Add some points from post Project Architecture reply like offshore and
onsite model and team structure. etc.,
Main objective of this project is we are providing a system with all the
information
regarding Sales / Transactions (sales if sales domain /transactions if bank
domain orinsurance domain) of entire organizations all over the
country US / UK ( based on theclient location US/UK/)
we will get the daily transaction data from all branches at the end of the
day.
We have to validate the transactions and implement the business logic
based on the transactions type or transaction code.
We have to load all historical data into DWH and once finished
historical data. We have to load Delta
Loads.
Delta load means last 24 hrs transactions captured from the source
system. In other words you can call it as
Change Data Capture (CDC).
This Delta loads are scheduled daily basis. Pick some points from What
is Target Staging Area Post.. Source to
Staging mappings, staging to warehousing.. based on ur comfort level..
Each transaction contains Transaction code.. based on the
transaction code you can identify wheather that transaction belongs to
sales, purchase / car
insurance, health insurance, /deposit , loan, payment ( u have to change
the words basedon the project..) etc., based onthat code business logic
will be change.. we validate and
calculate the measure and load to database.
One Mapping explanation : In Informatica mapping, we first lookup all
the transaction
codes with code master table to identify the transaction type to
implement the correct
logic and filter the unnecessary transactions.. because in an organization
there are lot of
transactions will be there but you have to consider only required
transactions for ur project..
the transaction code exists in the code master table are only transactions
u have to

consider and other transactions load into one table called Wrap table and
invalid records (
transaction code missing, null,spaces) to error table. For each dimension
table we are creating surrogate key and load
into DWH tables.
SCD2 Mapping:
We are implementing SCD2 mapping for customer dimension or account
dimension to
keep history of the accounts or customers. We are using SCD2 Date
method.
before telling
this you should know it clearly about this SCD2 method.careful about it.
Role and Responsibilities
Pick from Project architecture Post and tell according your comfortable
level. we are responsible for only development and testing
and scheduling we are using third party tools..( Control M, AutoSys, Job
Tracker, Tivoli
or etc..) we simply give the dependencies between each mapping and
run time. Based on
that Information scheduling tool team will schedule the mappings. We
wont schedule in
Informatica .. thats it Finished
Please Let me know if u required more explanation regarding any point
reply

Confirmed Dimension:
The dimensions which is used more than one fact table is called conformed dimensions.
Ex-Product Dimension related to Order fact, Sles fact..
2.Junk Dimension:
A "junk" dimension is a collection of random transactional codes, flags and/or text
attributes that are unrelated to any particular dimension.
A good example would be a trade fact in a company that brokers equity trades.
fact would contain several metrics (principal amount,net amount, price per share,
commission, margin amount, etc.) and would be related to several dimensions such as
account, date, rep, office, exchange, etc.
3.Degenerated Dimension:
In a data warehouse, a degenerate dimension is a dimension which is derived from the
fact table and doesn't have its own dimension table.
ex-line no in a Facttable,
4.Slowly changing Dimensions:

A Slowly Changing Dimension (SCD)is a dimension that changes over time.It may
change immediately and it may also change quite rapidly.
ex-nothing but Inserts,updates
A dimension table typically has two types of columns, primary keys to fact tables and
textual\descriptive data.
Eg: Time, Customer
Types of Dimensions:
Slowly Changing Dimensions
Rapidly Changing Dimensions
Junk Dimensions
Inferred Dimensions
Conformed Dimensions
Degenerate Dimensions
Role Playing Dimensions
Shrunken Dimensions
Static Dimensions
Slowly Changing Dimensions:
Attributes of a dimension that would undergo changes over time. It depends on the
business requirement whether particular attribute history of changes should be preserved
in the data warehouse. This is called a slowly changing attribute and a dimension
containing such an attribute is called a slowly changing dimension.
Rapidly Changing Dimensions:
A dimension attribute that changes frequently is a rapidly changing attribute. If you dont
need to track the changes, the rapidly changing attribute is no problem, but if you do need
to track the changes, using a standard slowly changing dimension technique can result in
a huge inflation of the size of the dimension. One solution is to move the attribute to its
own dimension, with a separate foreign key in the fact table. This new dimension is
called a rapidly changing dimension.
Junk Dimensions:
A junk dimension is a single table with a combination of different and unrelated attributes
to avoid having a large number of foreign keys in the fact table. Junk dimensions are
often created to manage the foreign keys created by rapidly changing dimensions.
Inferred Dimensions:
While loading fact records, a dimension record may not yet be ready. One solution is to
generate a surrogate key with null for all the other attributes. This should technically be
called an inferred member, but is often called an inferred dimension.
Conformed Dimensions:
A dimension that is used in multiple locations is called a conformed dimension. A

conformed dimension may be used with multiple fact tables in a single database, or
across multiple data marts or data warehouses.
Degenerate Dimensions:
A degenerate dimension is when the dimension attribute is stored as part of fact table, and
not in a separate dimension table. These are essentially dimension keys for which there
are no other attributes. In a data warehouse, these are often used as the result of a drill
through query to analyze the source of an aggregated number in a report. You can use
these values to trace back to transactions in the OLTP system.
Role Playing Dimensions:
A role-playing dimension is one where the same dimension key along with its
associated attributes can be joined to more than one foreign key in the fact table. For
example, a fact table may include foreign keys for both ship date and delivery date. But
the same date dimension attributes apply to each foreign key, so you can join the same
dimension table to both foreign keys. Here the date dimension is taking multiple roles to
map ship date as well as delivery date, and hence the name of role playing dimension.
Shrunken Dimensions:
A shrunken dimension is a subset of another dimension. For example, the orders fact
table may include a foreign key for product, but the target fact table may include a
foreign key only for productcategory, which is in the product table, but much less
granular. Creating a smaller dimension table, with productcategory as its primary key, is
one way of dealing with this situation of heterogeneous grain. If the product dimension is
snowflaked, there is probably already a separate table for productcategory, which can
serve as the shrunken dimension.
Static Dimensions:
Static dimensions are not extracted from the original data source, but are created within
the context of the data warehouse. A static dimension can be loaded manually for
example with status codes or it can be generated by a procedure, such as a date or time
dimension.
A complex mapping generally will have the following characteristics:
Difficult requirement
More no.of transformations
Having difficult business logic
May require combination of two or more methods/combinations
Complex business logic
More than 30 unconnected lookup
Star Schema: It has single fact table connected to dimension tables like a star. In star
schema only one join establishes the relationship between the fact table and any one of
the dimension tables.A star schema has one fact table and is associated with numerous
dimensions table and depicts a star.

Snowflake Schema: It is an extension of the star schema.In snowflake schema, very


large dimension tables are normalized into multiple tables. It is used when a dimensional
table becomes very big.In snow flake schema since there is relationship between the
dimensions Tables it has to do many joins to fetch the data.Every dimension table is
associated with sub dimension table.
The main difference between star schema and snowflake schema is that:
The star schema is highly denormalized and the snowflake schema is normalized. So the
data access latency is less in star schema in comparison to snowflake schema. As the star
schema is denormalized, the size of the data warehouse will be larger than that of
snowflake schema.
Performance wise, star schema is good. But if memory utilization is a major concern,
then snow flake schema is better than star schema.
A dimension table will not have parent table in star schema, whereas snow flake schemas
have one or more parent tables.
The dimensional table itself consists of hierarchies of dimensions in star schema,whereas
hierarchies are split into different tables in snow flake schema. The drilling down data
from top most hierarchies to the lowermost hierarchies can be done.
How to load last 10 records of flat file in to the target?
Use Sequence,Rank Transformations and Create Rank on Seq.no port and select Bottom
10 in Rank Properties.
How will you display "Mr." for male & "Mrs." for female in target table?
First Method:
select decode(column_name, 'male','Mr','female','Mrs')
Second Method:
Use Expression Transformation

Add new Field


Condition
IIf(fieldname='male','Mr','Mrs')
This field Check Variable then
add newfield Concat(newfield,first name)

Which will better perform If or decode?

Decode is much faster then If -Else because decode in built already have all the values of
the column which we want to decode whereas in if - else statement , we need to
explicitly specify
I want to load data in to two targets. One is dimension table and the other is fact table?
How can I load data at a time?
Generally we all knew that is,In Data warehouse environment,we should load data first in
the dimension table then we load into the fact table.
bcoz fact table which contains the Primary keys of the dimension table along with the
measures.
So we need to check first that whether the fact table which you are going to load that has
foreign key relationship with the dimension table or not?
If yes,Use pipeline mapping,and load dimension data first in first pipeline and in the
second pipeline load fact table data by taking the lookup transformation on the dimension
table which has loaded data already..and return the key value from the lookup
transformation then calculate the measures by using Aggregator and also give "group by"
on the dimension keys and map to the Target(Fact) prots as required.
most importantly specify the "Target Load Plan" where dimesion target as first,
fact table target as second.
Explain different types of modeling
Modeling is defined as to convert requirements of the business
users into technical structures.
1.conceptual modeling
2.logical modeling
3.physical modeling
example modeling tools:ERwin
If one flat file contains n number of records we have to load the records in target from 51
to 100 how to use expressions in Informatica?
use sequence generator to get row no. for each record ,then use filter giving the condition
(row no.greater than 50 and less than 100)

How will you get 1 and 3rd and 5th records in table? What is the query in oracle?
Select * from
(Select sal,
emp_id
row_number() over (partition by emp order by sal) row_num from emp)ref

where row_num in (1,3,5);


Question Description
What is unit testing & how it is done?
0'YES'
0'NO'
Answers From Users
Posted On: 08-17-14 at 06:04:40
Unit testing can be broadly classified into 2 categories.
Mswami
Post Count: 585 Quantitative Testing
Validate your Source and Target
a) Ensure that your connectors are configured properly.
b) If you are using flat file make sure have enough read/write permission
on the file share.
c) You need to document all the connector information.
Analyze the Load Time:
a) Execute the session and review the session statistics.
b) Check the Read and Write counters. How long it takes to perform the
load.
c) Use the session and workflow logs to capture the load statistics.
d) You need to document all the load timing information.
Analyze the success rows and rejections:
a) Have customized SQL queries to check the source/targets and here we
will perform the Record Count Verification.
b) Analyze the rejections and build a process to handle those rejections.
This requires a clear business requirement from the business on how to
handle the data rejections. Do we need to reload or reject and inform
etc? Discussions are required and appropriate process must be
developed.

Performance Improvements:
a) Network Performance
b) Session Performance
c) Database Performance
d) Analyze and if required define the Informatica and DB partitioning
requirements.
Qualitative Testing:
Analyze & validate your transformation business rules. More of
functional testing.
e) You need review field by field from source to target and ensure that
the required transformation logic is applied.
f) If you are making changes to existing mappings make use of the data
lineage feature Available with Informatica Power Center. This will help
you to find the consequences of Altering or deleting a port from existing
mapping.
g) Ensure that appropriate dimension lookups have been used and your
development is in Sync with your business requirements.

You might also like