You are on page 1of 52

DATA STAGE SCENARIOS LAB HAND OUT :

1. SEQUENTIAL FILE STAGE IN DATASTAGE:


Sequential file stage is a file stage which is used to read the data
sequentially or Parallely.
If it is 1 file - It reads the data sequentially
If it is N files - It reads the data Parallely
Sequential file supports 1 Input link |1 Output Link | 1 reject link.
To read the data, we have read methods. Read methods are
a) Specific files
b) File Patterns
Specific File is for particular file
And File Pattern is used for Wild cards.
And in Error Mode. It has
Continue
Fail and
Output
If you select Continue - If any data type mismatch it will send the rest of
the data to the target.
If you Select Fail- Job Abort or Any Data type mismatch
Output- It will send the mismatch data to Rejected data file.
Error data we get are
Data type Mismatch
Format Mismatch
Condition Mismatch
and we have the option like
Missing File Mode: In this Option
we have three sub options like
Depends
Error
Ok
(That means How to handle, if any file is missed)
2. WHAT IS DATASET IN DATA STAGE? USES OF DATA SET?:
Dataset is file stage, which is used for staging the data when we design
dependent jobs.
Dataset will overcome the Limitations of Sequential Stage
By Default Dataset will processed parallely.
Dataset will stores the data in the Native Format.
Dataset will stores the data inside Repository (i.e. inside Data stage)
And It has memory of more than 2 GB (> 2GB)
There are Two types of Datasets. They are
1) Virtual and
2) Persistence
Virtual is noting but the data formed when passing the link.
Persistence is nothing but the data loaded in the Target.

Alias names of Datasets are


1) Orchestrate File
2) Operating System file

And Dataset is multiple files. They are


a) Descriptor File
b) Data File
c) Control file
d) Header Files

In Descriptor File, we can see the Schema details and address of data.
In Data File, we can see the data in Native format.
And Control and Header files resides in Operating System.

And we can organize the data by using Dataset Utilities


they are
GUI (Dataset Management) in Windows environment.
CMD (Orchadmin) in UNIX Environment.
3.
3. WHAT IS DATASET AND TYPES OF DATA SETS?
Dataset is the parallel processing stage which is used for staging the data
when we design dependent jobs.
By Default dataset is parallel processing stage
Dataset will be stored in the binary format.
If we use dataset for the jobs, data will be stored in the Data Stage. Thats is
inside the repository.
Dataset will over come the limitations of the sequential file.
Limitations of sequential files are
1) Memory limitations ( It can store up to 2 GB Memory in the file format )
2) Sequential ( By default it is Sequential file )
3) Conversion Problem ( Every time when we run the job, it has to convert
from one format to another format)
4) Stores the data outside the Data stage ( Where in Dataset it stores the data
inside the Data stage)

Types of Datasets are 2 types


1) Virtual Dataset
2) Persistence Dataset
Virtual Dataset is the temporary dataset which is formed when passing in the
link.
Persistence Dataset is the Permanent Dataset which is formed when loaded
in the Target.

Alias names of Datasets are


a) Orchestrate Files
b) Operating System Files

Datasets are Multiple files


Dataset files are
1) Descriptor Files
2) Data Files
3) Controll Files
4) Header Files

1) Descriptor Files contains the Schema Details and address of the data.
It stores the data in C:/Data/file.ds

2) Data Files contains the data in Binary format


It will be stored in the c:/IBM/InformationServer/Server/Data/file.ds
3rd and 4th Control and Header Files resides in the Operating System.
Dataset Organization are View, Copy, Delete
Dataset utilities for organizing are
GUI - Dataset management ( In Windows Environment )
CMD- Orchadmin ( In Unix Environment )

4. WHAT ARE THE CLIENT COMPONENTS IN DATASTAGE 7.5


VERSION?
In Datastage 7.5X2 Version, they are 4 client Components. They are
1) Datastage Designer
2) Datastage Director
3) Datastage Manager
4) Datastage Admin
In Datastage Designer, We
Create the Jobs
Compile the Jobs
Run the Jobs

In Director, We can
View the Jobs
View the Logs
Batch Jobs
Unlock Jobs
Scheduling Jobs
Monitor the JOBS
Message Handling

In Manager, We can
Import & Export the Jobs
Node Configuration

And by using Admin, We can


create the Projects
Organize the Projects
Delete the Projects

5. MULTIPLE JOIN STAGES TO JOIN THREE TABLES:


If we have three tables to join and we don't have same key column in all the
tables to
join the tables using one join stage.
In this case we can use multiple join stages to join the tables.
You can take sample data as below

soft_com_1
e_id,e_name,e_job,dept_no
001,james,developer,10
002,merlin,tester,20
003,jonathan,developer,10
004,morgan,tester,20
005,mary,tester,20

soft_com_2
dept_no,d_name,loc_id
10,developer,200
20,tester,300

soft_com_3
loc_id,add_1,add_2
100,melbourne,victoria
200,brisbane,queensland

Take Job Design as below

1.Read and load the data in three sequential files.


In first Join stage,
Go to Properties ----Select Key column as Deptno
and you can select Join type = Inner
Drag and drop the required columns in Output
Click Ok
In Second Join Stage
Go to Properties ---- Select Key column as loc_id
and you can select Join type = Inner
Drag and Drop the required columns in the output
Click ok

Give file name to the Target file, That's it


Compile and Run the Job

2.Read and load the data in Seq. Files


Go to Column Generator to create column and sample data.
In properties select name to create.
and Drag and Drop the columns into the target
Now Go to the Join Stage and select Key column which we have
created( You can give
any name, based on business requirement you can give understandable
name)
In Output Drag and Drop all required columns
Give File name to Target File. Than
Compile and Run the Job.
Sample Tables You can take as below

Table1
e_id,e_name,e_loc
100,andi,chicago
200,borny,Indiana
300,Tommy,NewYork

Table2
Bizno,Job
20,clerk
30,salesman

6. WHAN TO CHOOSE JOIN STAGE OR LOOKUP STAGE IN


DATASTAGE:
How to choose the stages.
Join stage or Lookup stage
We need to be careful when selecting the stages. We need to think about the
Performance of the Job before selecting the stages. Time is more precious to
the clients. That's why we need to get the Job for very less time. We need to
try our best to get good performance to the Job.
Both the stages Join stage and Look up stage performs same thing. That is
they combine the tables we have. But why Lookup stage has been
introduced.
Look Up Stage have some extra benefits which will not come with the Join
stage. Look up stage doest not required the data to be sorted. Sorting is
mandatory with The Join stage. In Look Up stage the columns with different
column names can be joined as well where it is not possible in the Join
stage. That means Join stage, the column name must be similar.
A Look Up Stage supports reject links , if our required demands reject links
we cant go with Join stage. Because Join stage doesnt supports Reject
Links. And Lookup stage has an option to fail the Job if the look up fails. It
will be useful When the look up stage is expected to be successful.
Look up stage keeps the reference data into the memory which yields better
Performance for smaller volume of data. If you have large amount of data,
you need to go with Join stage.

7. WHAT IS LOOKUP STAGE? TYPES OF LOOKUPS:


Look Up stage is a processing stage which performs horizontal combining.
Look Up is A Cross verification of primary data with the second data.
Lookup stage Supports
N-Inputs ( For Norman Lookup )
2 Inputs ( For Sparse Lookup)
1 output
And 1 Reject link
Up to Datastage 7 Version We have only 2 Types of LookUps
a) Normal Lookup and b) Sparse Lookup
But in Datastage 8 Version, enhancements has been take place. They are
c) Range Look Up And d) Case less Look up

Normal Lookup:-- In Normal Look, all the reference records are copied to
the memory and the primary records are cross verified with the reference
records.
Sparse Lookup:--In Sparse lookup stage, each primary records are sent to
the Source and cross verified with the reference records.
Here, we use sparse lookup when the data coming have memory sufficiency
and the primary records is relatively smaller than reference date we go for
this sparse lookup.
Range LookUp:--- Range Lookup is going to perform the range checking
on selected columns.
For Example: -- If we want to check the range of salary, in order to find the
grades of the employee than we can use the range lookup.

8.LOOKUP STAGE EXAMPLES:


Look Up stage is a processing stage and used to perform lookup operations
and to map short codes in the input dataset into expanded information from a
lookup table which is than joined to the incoming data and output.
For example if we have the primary data as below.
Table1
e_id,ename,e_state
100,sam,qld
200,jammy,vic
300,tom,Tas
400,putin,wa
table1Ref
e_state,full_state
qld,queensland
vic,victoria

Take the job design as below

Read and load the two tables in sequential files.


Go to lookup stage and Drag and drop the primary columns to the output.
And Join e_state from primary table to the e_state in reference table
And drag and drop the Full state to the output.
In properties select lookup failure as drop
now click ok
Give Target file name and Compile & Run the Job

9.RANGE LOOKUP EXAMPLES IN DATASTAGE:


Range Look Up is used to check the range of the records from another table
records.
For example If we have the employees list, getting salaries from $1500 to $
3000.
If we like to check the range of the employees with respect to salaries.
We can do it by using Range Lookup.
For Example if we have the following sample data.

xyzcomp ( Table Name )


e_id,e_name,e_sal
100,james,2000
200,sammy,1600
300,williams,1900
400,robin,1700
500,ponting,2200
600,flower,1800
700,mary,2100

Take Job Design as

Lsal is nothing but low salary


Hsal is nothing but High salary
Now Read and load the data in Sequential files
And Open Lookup file--- Select e_sal in the first table data

And Open Key expression and


Here Select e_sal >=lsal And
e_sal <=hsal
Click Ok
Than Drag and Drop the Required columns into the output and click Ok
Give File name to the Target File.
Then Compile and Run the Job . That's it you will get the required Output.

10. AGGRIGATOR STAGE AND FILTER STAGE EXAMPLES:


If we have a data as below

table_a
dno,name
10,siva
10,ram
10,sam
20,tom
30,emy
20,tiny
40,remo

And we need to get the same multiple times records into the one target.
And single records not repeated with respected to dno need to come to one
target.
Take Job design as

Read and load the data in sequential file.


In Aggregator stage select group =dno
Aggregator type = count rows
Count output column =dno_count( user defined )
In output Drag and Drop the columns required.Than click ok
In Filter Stage
----- At first where clause dno_count>1
-----Output link =0
-----At second where clause dno_count<=1 -----output link=0 Drag and drop
the outputs to the two targets. Give Target file names and Compile and Run
the JOb. You will get the required data to the Targets.

11. AGGRIGATOR STAGE TO FIND NO OF PEOPLE IN GROUP


WISE:
We can use Aggregator stage to find number of people each in each
department.
For example, if we have the data as below
e_id,e_name,dept_no
1,sam,10
2,tom,20
3,pinky,10
4,lin,20
5,jim,10
6,emy,30
7,pom,10
8,jem,20
9,vin,30
10,den,20
Take Job Design as below
Seq.-------Agg.Stage--------Seq.File

Read and load the data in source file.


Go to Aggregator Stage and Select Group as Dept_No
and Aggregator type = Count Rows
Count Output Column = Count ( This is User Determined)
Click Ok ( Give File name at the target as your wish )
Compile and Run the Job

12. TRANSFORMER STAGE FOR DEPARTMENT WISE DATA:


In order to get the data according to department wise.
And if we have the data as below

a_comp ( Table name )


e_id,e_name,e_job,dept_no
100,rocky,clerck,10
200,jammy,sales,20
300,tom,clerck,10
400,larens,clerck,10
500,wagon,sales,20
600,lara,manager,30
700,emy,clerck,10
800,mary,sales,20
900,veer,manager,30

And have three targets. Our requirement is as below


In 1st target, we need a 10th and 20th department records

In 2nd Target, we need a 30th department records


In 3rd Target, We need a 10th and 30th department records
You can take Job design as below
Read and Load the data in Sequential File
Go to Transformer Stage,
Just Drag and Drop all the columns in to the three Targets.
In 1sT Constraint write expression as,
dept_no=10 or dept_no=20
In 2nd constraint write expression as,
dept_no=30
In 3rd Constraint write expression as,
dept_no=10 or dept_no=30
click ok
Give file names in all the targets.
Compile and run the jobs.

13. TRANSFORMER STAGE TO FILTER THE DATA:


If our requirement is to filter the data department wise from the file below
samp_table
1,sam,clerck,10
2,tom,developer,20
3,jim,clerck,10
4,don,tester,30
5,zeera,developer,20
6,varun,clerck,10
7,luti,production,40
8,raja,priduction,40

And our requirement is to get the target data as below


In Target1 we need 10th & 40th dept employees.
In Target2 we need 30th dept employees.
In Target1 we need 20th & 40th dept employees.

Take Job Design as below

Read and Load the data in Source file


In Transformer Stage just Drag and Drop the data to the target tables.
Write expression in constraints as below
dept_no=10 or dept_no= 40 for table 1
dept_no=30 for table 1
dept_no=20 or dept_no= 40 for table 1
Click ok
Give file name at the target file and
Compile and Run the Job to get the Output

14. SORT STAGE AND TRANSFORMER STAGE WITH SAMPLE


DATA EXAMPLE:
If we have some customers information as below.
cust_info
c_id,c_name,c_plan
11,smith,25
22,james,30
33,kelvin,30
22,james,35
11,smith,30
44,wagon,30
55,ian,25
22,james,40

We can see the customers information and there mobile plans ( for example)
If we like to find lowest plan taken by all customers
Take Job Design as

Seq.File--------Sort------Tx-----------------D.s

Read and Load the data in Sequential file


In Sort Stage select Key Change =True to generate group id
In Transformer Stage write Key Change=1 in Constraint
Write File name for Target D.S File
Compile and Runt the Job
You Get the Output as required
Lowest plans of the customers.

15. CONVERT ROWS INTO COLUMNS USING SORTING AND


TRANSFORMER STAGE:
If you have Some Data like below to convert rows into the columns

xyz_comp
e_id,e_name,e_add
100,jam,chicago
200,sam,newyork
300,tom,washington
400,jam,indiana
500,sam,sanfransico
600,jam,dellas
700,tom,dellas

Take Job Design as

Seq.File----Sort-----Tx-----R.d-----D.s

Tx- Transformer stage


R.D- Remove Duplicates Stage
Here we are taking remove duplicate stage, inorder to remove duplicates
after getting the output.
Read and Load the Data in Sequential file stage .
In Sort Stage Select Key column as e_name
and select key change column as True
In output Drag and Drop all the Columns
Go to Transformer stage Create two stage variables as Temp and Add
Map the key change to temp and
in add derivation write expression as
If temp=1 then e_add else add:',':e_add
Than create one column in output table as hist_add
Now Drag and Drop the Add(From Stage Varable ) to Hist_Add (Output
Column )
That's it Click ok
In Remove Duplicate stage Select key column as e_add than
Select Duplicate to retain as last and click ok
Give File name to the target file
Compile and Run the Job.

16. HOW TO DO SORTING WITHOUT SORT STAGE:


To do the sorting without sorting stage. Take
You can do it as normal process first as follows.
If we want to read the data using Sequential File
Design as follows : -------------

Seq. File ------------------------Dataset File

To read the data in Sequential file


Open Properties of Sequential file
and give the file name.
Now you can give the file path, by clicking on the browse for the file.
And in Options select True ( If the first line is column name )
You can just leave the rest of the options as it is.
Now go to Column and click on load, then select the file you like to read
from the

table definitions .
( This file should be same which you have given in the properties. )
Now in the Target Dataset - Give file name.
Now for the sorting process.
In the Target Open Dataset properties
And go to Partitioning ---- Select Partitioning type as Hash
In Available Columns Select Key Column ( E_Id for EXAMPLE) to be
sorted.
Click Perform Sort
Click Ok
Compile And Run
The data will be Sorted in the Target.

17. HOW TO CREATE GROUP ID IN SORT STAGE IN


DATASTAGE:
Group ids are created in two different ways.
We can create group id's by using
a) Key Change Column
b) Cluster Key change Column
Both of the options used to create group id's .
When we select any option and keep true. It will create the Group id's group
wise.
Data will be divided into the groups based on the key column and it will
give (1) for
the first row of every group and (0) for rest of the rows in all groups.
Key change column and Cluster Key change column used based on the data
we are getting from the source.
If the data we are getting is not sorted , then we use key change column to
create
group id's
If the data we are getting is sorted data, then we use Cluster Key change
Column to
create Group Id's

18. HOW TO CONVERT ROWS INTO COLUMN USING


DATASTAGE:
If we have some customers information with different address as below.

mult_add
e_id,e_name,e_add
10,john,melbourne
20,smith,canberra
10,john,sydney
30,rockey,perth
10,john,perth
20,smith,towand
If U like to get all multiple addresses of the customer into one single row
from multiple rows. .
We can perform this using Sort Stage, Transformer Stage and Remove
Duplicate Stage
Take Job Design as below

SeqFile----Sort-----Tx----R.D----D.S

Read and load the data in Seq.File


In Sort Stage Select key column and select Key change = True to generate
group id's
In Transformer stage Create one Stage variable and select name as
temporary
and Write expression for that as
If keychange=1 then s_add else temporary:",": s_add And click ok
Go to Remove duplicates and select last in properties and select key column
to remove dupilicates ( You can select address column here )
That's it compile and run the job
You will get the required output.

19. CONCATENATE DATA USING TRANSFORMER STAGE:


If we have a Table as below
e_id,e_name,e_job,e_Sal
1,sam,clerck,2000
2,tim,salesman,2100
3,ram,clerck,1800
4,jam,salesman,2000
5,emy,clerck,2500

Read and Load the data in sequential file


In Transformer stage Create one column as Total_one
In derivations you can write expression as below
click ok
Give File name in the target file
Compile and Run the Job That's it

20. TRANSFORMER STAGE USING STRIPWHITE SPACES


FUNCTION:
Stripwhitespaces is the function used for the remove before,after and middle
of the characters.
Some times we get the data as below

e_id,e_name
10,em y
20, j ul y
30,re v o l
40,w a go n

Take Job Design as

SeQ.File ------ Tx------D.s

Read and load the data in Sequential file stage


Go to Transformer stage
Here, we use stripwhitespaces function in the required column derivation.
You can write expression as below
Stripwhitespaces(e_name) for e_name

Click ok
Compile and Run the data
You will get the data after removal of all the spaces between the characters,
before and after spaces also.

21. FIELD FUNCTION IN TRANSFORMER STAGE WITH


EXAMPLE:
Some times we get the data as below
Customers
1,tommy,2000
2,sam,2300
3,margaret,2000
4,pinky,1900
5,sheela,2000

Take Job Design as

Seq.File ------- Tx ------ Ds

Read and load the data in Seq.file


Select first line is column name
And in Transformer stage Create three columns to get the data
You can take columns names as c_id,c_name,c_sal with respective data
types.
Write the expression in Derivations to the columns as below
Field (dslink3.customers,',',1) for c_id
Field (dslink3.customers,',',2) for c_name
Field (dslink3.customers,',',3) for c_sal

That's it you will get the data in 3 different columns in the output as
required.
After compile and Run the Job.

22. RIGHT AND LEFT FUNCTIONS IN TRANSFORMER STAGE


For example some times we get the data from warehouse as below
This is just a sample example data

Customers
1 bhaskar 2000
2 ramesh 2300
3 naresh 2100
4 kiran 1900
5 sunitha 2000

They are exactly straight. They just have spaces in between the data.
Our requirement is to get the data into the three different columns from
single column.
Here The data is customers is the column name we are getting and we have
only
single column.
Now Take Job Design as below

Seq.File------------Tx-------------Ds

Read the data in Seq.file


and dont forget to tick first line is column name.
In Transformer stage Create 3 columns and write the expressions in
derivations.
Create Columns as c_id , c_name, c_sal
You can create the names as your wish.
Expressions for three columns are

left(dslink3.customers,1) for c_id


right(left(dslink3.customers,8),7) for c_name
right(dslink3.customers,4) for c_sal

That's it
Give name for the file in the Target.
Now Compile and Run the Job.
You will get the Output as required.

23. TRANSFORMER STAGE FILTER THE DATA:


If our requirement is to filter the data department wise from the file below

samp_tabl
1,sam,clerck,10
2,tom,developer,20
3,jim,clerck,10
4,don,tester,30
5,zeera,developer,20
6,varun,clerck,10
7,luti,production,40
8,raja,priduction,40

And our requirement is to get the target data as below

In Target1 we need 10th & 40th dept employees.


In Target2 we need 30th dept employees.
In Target1 we need 20th & 40th dept employees.
Take Job Design as below

Read and Load the data in Source file


In Transformer Stage just Drag and Drop the data to the target tables.
Write expression in constraints as below

dept_no=10 or dept_no= 40 for table 1


dept_no=30 for table 1
dept_no=20 or dept_no= 40 for table 1

Click ok
Give file name at the target file and
Compile and Run the Job to get the Output

24. TRANSFORMER STAGE USING PADSTRING FUNCTION:


Padstring is a function used to padding the data after the string.
If we have a data as below
Table_1
e_id,e_name
10,emy
20,july
30,revol
40,wagon

(Remember to give gap between the words to understand the Padstring


function)
Take Job Design as
Seq.File------------Tx--------------D.s

Read and load the data in sequential file.


Now Go to the Transformer stage, here in required column derivation write
your
expression as below

padstring(e_name,'@',5) for e_name

Here '@' is called padding you want to get after the data
5 is the padlength Now click ok

Give file name at the target file


Compile and Run the Job

25. CHANGE CAPTURE EXAMPLES:


Change Capture Stage is used to capture the changes between the multiple
tables.
It mainly used to capture the changes between the after_Data and
before_data.
Take Example Data as below

Change_after_data
e_id,e_name,e_add
11,kim,syd
22,jim,canb
33,pim,syd
44,lim,canb
55,pom,perth

Change_before_data
e_id,e_name,e_add
11,kim,syd
22,jim,mel
33,pim,perth
44,lim,canb
55,pom,adeliade
66,shila,bris
Take Job Design as below

Read and load the data in two files as below

In Change Capture Stage Select Key column as e_id


and value column as e_add. Because e_add is the column we are going to
update.

Compile and Run the Job now to get the required Output.
26. MERGER STAGE EXAMPLES:
Merge Stage is a Processing Stage which is used to perform the horizontal
combining. This is one of the stage to perform this operation like Join stage
and Lookup Stage. Only the difference between the stages are size variance
an Input requirements between them.
Example for Merge Stage

Sample Tables
MergeStage_Master
cars,ac,tv,music_system
BMW,avlb,avlb,Adv
Benz,avlb,avlb,Adv
Camray,avlb,avlb,basic
Honda,avlb,avlb,medium
Toyota,avlb,avlb,medium

Mergestage_update1
cars,cooling_glass,CC
BMW,avlb,1050
Benz,avlb,1010
Camray,avlb,900
Honda,avlb,1000
Toyota,avlb,950

MergeStage Update2
cars,model,colour
BMW,2008,black
Benz,2010,red
Camray,2009,grey
Honda,2008,white
Toyota,2010,skyblue

Take Job Design as below


Read and load the Data into all the input files.

In Merge Stage Take cars as Key column. In Output Column Drag and Drop
all the columns to the output files.
Give File name to the Target/Output file and If you want you can give reject
links (n-1)
Compile and Run the Job to get the required output

27. BASIC JOB EXAMPLE FOR SEQUENTIAL STAGETO DATA


STAGE:
this is the basic job for Datastage Learners.
You can understand, how we can read the data and how we can load the data
into the
target.

If we want to read the data using Sequential File


Design as follows : -------------

Seq. File ------------------------Dataset File

To read the data in Sequential file


Open Properties of Sequential file
and give the file name.
Now you can give the file path, by clicking on the browse for the file.
And in Options select True ( If the first line is column name )
You can just leave the rest of the options as it is.
Now go to Column and click on load, then select the file you like to read
from the
table definitions.
(This file should be same which you have given in the properties. )
Now in the Target Dataset - Give file name.
Now Compile and run
thats it

You will get the output.

28. SEQUENTIAL FILE STAGE PROPERTIES:


Sequential file stage is a file stage in the Datastage.
Sequential file stage have different properties to read the data.
In source, we will be haveing two options
a) File=? And b) Read Method = ?
File Means the file we are going to read. We need to give path of the file.
Read methods are two types.
a) Specific Files And b) File Patterns
Most of the time we work with specific files.
Specific file means to read the data specially.
File Patterns means to used a wild cards.
In Options : We have different options in sequential file. They are
a) First Line is Column name
b) Keep File Patterns
c) Reject Mode
d) Report Progress
a) First Line is Column Name :
When we read the data, if first line is column contains the column name we
select True.
Or else we can select False. Default is False.
b) Keep File Option : We select True for the partition the read data set
according to the
organization of the input file.
c) Missing File Mode:
In missing file mode we have three different options to select. They are
1) Depends
2) Error
3) Ok

Ok means to skip th error file.


Error means to stop the job, if one of the file mentioned doesn't exist.
Depends: If we select depend, that means the default is Error unless and
until the file name has
a node name prefix of *, in which case it is ok .
In default Depend option will be there.

d)Reject Mode :
In reject mode, If we like to restrict the job with error data we go with three
different options.

1) Continue
2) Fail
3) Output

1) Continue: It means it leaves the error data and loads rest of the data into
the target.
This is default option.

2) Fail: Job will be aborted, if any row has error data.

3) Output: It is used to carry the rejected rows down a reject link.

29. LIMITATION OF SEQUENTIAL FILE STAGE:


Sequential file is a file stage which is used for read the data.
It has some limitations. In that case we use Dataset file.
The Limitations are
a) Memory Limit
b) Sequential File
c) Conversion Problem
d) It stores the data outside repository.

Memory Limit means it has less memory to save. If memory exceed's more
than 2
GB, the memory will be full. And we need to save in two files.
If we like to save the data in single file. In that case we can go with Dataset
file.
Sequential file processed in default in sequential file.
So sequential file have conversion problem, when the data transfer from
stage to
stage.

When we save the data it stores the data outside repository.

30. STAGES IN DATASTAGE:


There are different types of Stages in Datastage.
1)Database Stages
2)File Stages
3)Development Stages
4)Processing Stages
5)Real Time Stages

In Database Stages, mostly used stages are


a)Oracle Enterprise
b)ODBC Enterprise
c)ODBC Conductor
d)DB2 Enterprise
e)Teradata Enterprise
f)Informic
g)Dynamic RDBMS
h)Universe
i)MS SQL Server Load

And in Processing Stages mostly used stages are


a)Aggregator Stage
b)Command Stage
c)Pivot Stage
d)Transformer Stage
e)Sort Stage
f)Merge Stage
g)Link Partitioner Stage
h)Join Stage
i)Lookup Stage
j)Copy Stage
k)Funnel Stage
l)Surrogate Key
M)SCD
N)Filter Stage

The file stages are as follows


a)Sequential Stage
b)Data Set Stage
c)File Set Stage
d)Lookup file set stage
e)Complex Flat File Stage
f)External Source Stage

The Development Stages are


a)Row Generator
b)Column Generator
c)Head Stage
d)Tail Stage
e)Sample Stage
f)Peek Stage
31. FUNNEL STAGE IN DATA STAGE
Some times we get data in multiple files which belongs to same bank
customers information.
In that time we need to funnel the tables to get the multiple files data into the
single file.( table)
For Example, if we have the data two files as below
xyzbank1
e_id, e_name,e_loc
111,tom,sydney
222,renu,melboourne
333,james,canberra
444,merlin,Melbourne

xyzbank2
e_id,e_name,e_loc
555,,flower,perth
666,paul,goldenbeach
777,raun,Aucland
888,ten,kiwi

For Funnel take the Job design as

Read and Load the data into two sequential files.


Go to Funnel stage Properties and
Select Funnel Type = Continous Funnel
( Or Any other according to your requirement )
Go to output Drag and drop the Columns
( Remember Source Columns Stucture Should be same ) Then click ok
Give file name for the target dataset then
compile and run the job

32. WHAT IS FUNNEL STAGE, FUNNEL STAGE FOR MULTIPLE


INPUTS?
Funneling is the processing stage which is used to perform Horizontal
Combining.
That is we combines multiple tables into a single output link.
There is no key column dependency between the tables.
We just append the tables what we have.
There should be some mandatory constraint when we combine the tables
a) Stucture should be same.
b) Column name should be same ( If it is not same, we need to make same
by using the Column Stage)
c) Format should be same.

33.SURROGATEKEY STAGE:
Surrogate Key is a unique identification key. It is alternative to natural key .
And in natural key, it may have alphanumeric composite key but the
surrogate is
always single numeric key.
Surrogate key is used to generate key columns, for which characteristics can
be
specified. The surrogate key generates sequential incremental and unique
integers for a
provided start point. It can have a single input and a single output link.

34. WHAT IS SCD IN DATASTAGE AND TYPES OF SCDS IN


DATASTAGE?
SCD's are nothing but Slowly changing dimension.
Scd's are the dimensions that have the data that changes slowly. Rather than
changing in a time period. That is a regular schedule.
The Scd's are performed mainly into three types.
They are
Type-1 SCD
Type-2 SCD
Type-3 SCD
Type -1 SCD: In the type -1 SCD methodology, it will overwrites the older
data
( Records ) with the new data ( Records) and therefore it will not maintain
the
historical information.
This will used for the correcting the spellings of names, and for small
updates of
customers.
Type -2 SCD: In the Type-2 SCS methodology, it will tracks the complete
historical
information by creating the multilple records for the given natural key
( Primary key) in the dimension tables with a separate surrogate keys or a
different
version numbers. We have a unlimited historical data preservation, as a new
record is inserted each time a change is made.
Here we use differet type of options inorder to track the historical data of
customers like
a) Active flag
b) Date functions
c) Version Numbers
d) Surrogate Keys
We use this to track all the historical data of the customer.
According to our input, we use required function to track.

Type-3 SCD: In the Type-3 SCD, it will maintain the partial historical
information.

35. HOW TO USE SCD TYPE 2 IN DATA STAGE:


SCD'S is nothing but Slowly changing Dimensions.
Slowly Changing Dimensions are the dimensions that have the data that
change slowly rather than changing in a time period, i.e regular schedule.
The most common Slowly Changing Dimensions are three types.
They are Type -1 , Type -2 , Type -3 SCD's

Type-2 SCD:-- The Type-2 methodology tracks the Complete Historical


information by creating the multiple records for a given natural keys in the
dimension tables with the separate surrogate keys or different version
numbers.
And we have unlimited history preservation as every time new record is
inserted each time a change is made.

36. WHAT ARE THE TYPES OF ERRORS IN DATASTAGE?


You may get many errors in datastage while compiling the jobs or running
the jobs.
Some of the errors are as follows
a)Source file not found.
If you are trying to read the file, which was not there with that name.
b) Some times you may get Fatal Errors.
c) Data type mismatches.
This will occur when data type mismaches occurs in the jobs.
d) Field Size errors.
e) Meta data Mismach
f) Data type size between source and target different
g) Column Mismatch
i) Pricess time out.
If server is busy. This error will come some time.

37. WHAT DATA STAGE PROJECT CONTAINS:


Datastage is a Comprehensive ETL Tool. It is used to Extract,
transformation and loading the Jobs.
Datastage Project will be worked on the Datastage don't.
We can login to the Datastage Designer in order to enter the Datastage too
for datastage jobs, designing of the jobs etc.
Datastage jobs are maintained according to the project standards.
In every project we contain the
Datastage Jobs, Built in Components, Table Definitions, Repository and
components
required for the project.

38. WHAT IS DATASTAGE ADMINISTARTOR AND USES OF DATA


STAGE ADMINISTRATOR
Datastage Administrator is used for the various purposes like
a) Adding the projects
b) Create the Projects
c) Delete the Projects
And we can set all the Project Properties here.

Datastage Administrator also provides the command line interface to the


datastage repository.
And we can perform different settings of the project using Datastage
Administrator like
1) Setting Up Datastage Users in Administrator.
2) We can create , delete , move the datastage projects from one place to
another place.
3) We can clean up the project files which are not required.
4) We can purge the job log files.
5) We can set the job properties in Datastage Administrator.
6) We can trace the Server Activity.

39. UNIX COMMANDS IN DATA STAGE:


UNIX commands are very important to remember in order to operate
the files in Unix operating system.
1.CAT:
1) The cat command is used to read one or more and prints them to target.
For example syntax we can write as
a) cat file 1
b) cat file1 file 2

2. CD:
2) The cd command is used to change the directory.
Syntax is cd[Dir]
Ex: cd tech

3. FTP
3) The ftp command is used to transfer files to and from a remote server.
Syntax is
ftp[options][hostname]
Options will be like
d-debugging is enabled
g- is to interactive prompting is disabled
v- is to display all responses from the response

4. GREP
4) The grep command is used to search one or files or multiple files for line
and that contain a pattern.
Syntax is
grep[options][pattern]
Some of the options are as below
b- To display the blocked numbers at the beginning of each line
h-To display the matches lines, did not display the file names.
c- Is used to ignore case sensitive.
i-Is used to ignore case sensitive.
s-Is for silent node.
v- Is used to display all line that do not match.
w-Is to match whole word.
5. KILL
5) The Kill Command is used to Kill one process id or multiple id's
Syntax is
Kill [options]ids
Options are
l - lists the signal names
Signal - Means the signal number of name.

6. LS
6) The Is command is used to retrieve all the list in the directory.
Syntax is
ls[options][names]

40. TUTORIALS FOR DATA STAGE JOBS:


Datastage is Comprehensive ETL Tool. Etl is Extracting, Transformation and
loading the jobs. This is the best place to learn Datastage Online . This
tutorial is created for best datastage practices, Stages explanations, Datastage
Concepts, Data warehouse fundamentals required for datastage and
Datastage jobs with simple examples with different tasks.
In this Datastage Tutorial, many things are covered like different examples
with sample data to solve different types of Datastage jobs . Datastage is a
powerful tool to solve the any type of jobs. It is one of the top competitor in
the world.
And in this Datastage Tutorial , you can learn how to extract the data from
the source, how to implement logical expressions in Transformation and
load into the different types targets.
Different types of jobs using Sequential stage, Dataset file, File set files ,
Transformation Stage, Sorting Stage, Aggregator Stage, Change Capture
Stage, Surrogate Key Stage, Copy Stage, Modify Stage, Remove Duplicate
Stage, Join stage, Lookup Stage, Merge Stage, Debug Stages , Head and Tail
Stages , Peek Stages , Column and Row Generator Stages etc.
In this Datastage Tutorial . we discussed all about features of Datastage,
different types of plug-in in datastage , job control routines and how to
generate reports for Datstage jobs.

41. WHAT DO WE USE UNIX COMMANDS IN DATASTAGE JOBS


Unix is the Operating System we use in the software companies.
We use this Operating System because it has high security features.
Some of the common places unix used in Datastage are as below
a) Job Sequences using Execute common activity and Routine.
b) In sequential file check file unix commands, later we can give unix
commands as filter.
c) We use unix commands in Transformer stage before and after stage
subroutines.
d) We use in sequential file stage. Stage use this filter commands.

42. COLUMN GENERATOR STAGE EXAMPLE WITH SAMPLE


DATA
Column Generator is a development stage/ generating stage that is used to
generate column with sample data based on user defined data type .
Take Job Design as

Seq.File--------------Col.Gen------------------Ds

Take source data as a

xyzbank
e_id,e_name,e_loc
555,flower,perth
666,paul,goldencopy
777,james,aucland
888,cheffler,kiwi

In order to generate column ( for ex: unique_id)


First read and load the data in seq.file
Go to Column Generator stage -- Properties -- Select column method as
explicit
In column to generate = give column name ( For ex: unique_id)
In Output drag and drop
Go to column write column name and you can change data type for
unique_id in sql type and can give length with suitable name
Then compile and Run

43. WHAT IS MODELING OF DATA STAGE, MODELING OF DATA


STAGE
Modeling is a Logical and physical representation of Source system.
Modeling have two types of Modeling Tools
They are
ERWIN AND ER-STUDIO
In Source System there will be a ER-Model and
in the Target system there will be a ER-Model and Dimensional Model
Dimension:- The table which was designed for the client perspective. We
can see in many ways in the Dimension tables.
And there are two types of Models.
They are
Forward Engineering (F.E)
Reverse Engineering (R.E)
F.E:- F.E is the process starting the process from the scratch for banking
sector.
Ex: Any Bank which was required Datawarehouse.
R.E:- R.E is the process altering existing model for another bank.

44. DATAWARE HOUES:


Datamart is the access layer of the datawarehouse environment. That means
we create datamart to retrieve the data to the users faster.
The Datamart is the subset of Datarehouse. That means all the data available
in the datamart will be available in datarehouse. This Datamart will be
created for the purpose of specific business. (For example telecom Database
or banking Database etc)
There are many reasons to create Datamart.There is lot of importance of
Datamart and advantages.
It is easy to access frequently needed data from the database when reuired by
the client.
We can give access to group of users to view the Datamart when it is
required. Of course performance will be good.
It is easy to maintain and to create the datamart. It will be related to specific
business.
And It is low cost to create a datamart rather than creating datarehouse with
a huge space.

45. DATA STAGE VERSIONS:


AND DATA STAGE DIFFRENCE BETWEEN 7.5X2 AND
DATASTAGE 8.0.1
Difference between Datastage 7.5X2 and Datastage 8.0.1 Versions
1) In Datastage 7.5X2, there are 4 client components. They are
a) Datastage Design
b) Datastage Director
c Datastage Manager
d) Datastage Admin

And in
2) Datastage 8.0.1 Version, there are 5 components. They are
a) Datastage Design
b) Datastage Director
c) Datastage Admin
d) Web Console
e) Information Analyzer
Here Datastage Manager will be integrated with the Datastage Design
option.
2) Datastage 7.X.2 Version is OS Dependent. That is OS users are Datastage
Users.

And in 8.0.1
2) this is OS Independent. That is User can be created at Datastage, but one
time dependant.
3) Datastage 7.X.2 version is File based Repository (Folder).
3) Datastage 8.0.1 Version is Datastage Repository.

4) No Web based Administration here.


4) Web Based Administration.

5) There are 2 Architecture Components here. They are


a) Server
b) Client
5) There are 5 Architecture Components. They are
a) Common user Interface.
b) Common Repository.
c) Common Engine.
d) Common Connectivity.
e) Common Shared Services.
6) P-3 and P-4 can be performed here.
P-3 is Data Transformation.
P-4 is Metadata Management

6) P-1,P-2,P3,P4 can be performed here.


P-1 is Data Profiling
P-2 is Data Quality
P-3 is Data Transformation
P-4 is Metadata Management

7) Server is IIS
7) Sever is Websphere

8) No Web based Admin


8) Web based Admin.

46. FEATURES OF DATA STAGE:


Datastage Features are
1) Any to Any (Any Source to Any Target)
2) Platform Independent.
3) Node Configuration.
4) Partition Parallelism.
5) Pipeline Parallelism.

1) Any to Any
That means Datastage can Extract the data from any source and can loads
the data into the any target.

2) Platform Independent
The Job developed in the one platform can run on the any other platform.
That means if we designed a job in the Uni level processing, it can be run in
the SMP machine.

3 )Node Configuration
Node Configuration is a technique to create logical C.P.U
Node is a Logical C.P.U

4) Partition Parallelism
Partition parallelim is a technique distributing the data across the nodes
based on the partition techniques. Partition Techniques are
a) Key Based
b) Key Less

a) Key based Techniques are


1 ) Hash 2)Modulus 3) Range 4) DB2

b) Key less Techniques are


1 ) Same 2) Entire 3) Round Robin 4 ) Random

5) Pipeline Parallelism
Pipeline Parallelism is the process, the extraction, transformation and
loading will be occurred simultaneously.
Re- Partitioning: The distribution of distributed data is Re-Partitioning.
Reverse Partitioning: Reverse Partitioning is called as Collecting.
Collecting methods are
Ordered
Round Robin
Sort Merge
Auto

47. FACT TABLES IN DATA WAREHOUSE AND GIVE EXAMPLE:


Fact Table is an entity which represents the numerical measurements of any
business.
That means we create the fact tables for loading the numerical data.
For Example, in a banking model the account numbers and balances are the
measurements with in the fact tables.

48. NODE CONFIGURAION:


WHAT IS NODE ,AND WHAT IS NODE CONFIGURATION:
Node is a Logical Cpu in datastage .
Each node in a configuration file is distinguished by the virtual name and
defines a number , speed, cpu's , memory availability etc.
Node configuration is a technique of creating logical C.P.U

49. OLAP :( ONLINE ANALYTICAL PROCESS)


Online Analytical Process (OLTP) is a characterized by relatively low
volume of transactions. Actually the queries are often very complex. In the
OLAP System response time more. In OLAP Database there is Aggregated,
historical Inf. Data , stored in multi-dimensional schemas
50. OLTP (ONLINE TRANSACTIONAL PROCESS
WHAT IS OLTP AND USES OF OLTP
OLTP is nothing but Online Transaction Processing.
It will be characterized by a large number of short online transactions. The
main emphasis for OLTP system is to put on very fast query processing.In
order to get the data faster to the end-users. And we use the Online
transaction process for the fast process. And Oltp system is used for data
integrity in multi access environments, and effectiveness measured by
number of transactions per second.
51. PARTIONNING TECHNIQUES:
WHAT IS PARTION PARALLALISM
Partition Parallelism is a technique of distributing the records across the
nodes based on different partition techniques.
Partition techniques are very important to get the good performance of the
job.
We need to select right partition technique for the right stage.
Partition techniques are

Key based Techniques And


Key less Techniques

1.Key based Techniques are


a) Hash
b) Modulus
c) Range
d) DB2

2.Key Less Techniques are


a) Same
b) Entire
c) Round Robin
d) Random

52. DIFFERENCE BETWEEN HASH AND MODULE TECHNIQ


Hash and Modulus techniques are Key based partition techniques.
Hash and Modulus techniques are used for different purpose.
If Key column data type is textual then we use hash partition technique for
the job.
If Key column data type is numeric, we use modulus partition technique.
If one key column numeric and another text then also we use hash partition
technique.
if both the key columns are numeric data type then we use modulus partition
technique.

53. PARTITION IN DATASTAGE:


Partition Techniques are used in datastage to get good performance.
They are different types of Partition Techniques in datastag. They are

a) Key Based Partition Techniques


b) Key Less Partition Techniques
Key Less Techniques are
1) Same
2) Entire
3) Round Robin
4) Random

a) Same: This technique is used in oder to do not alter the existing partition
technique in the previous stage.
b) Entire: Each Partition gets the entire dataset. That is rows are duplicated.
c) Round Robin: In Round Robin Technique rows are evenly distributed
among the Partition.
d) Random: Partition a row is assigned to is Random.

54. KEY BASED PARTIONING TECHNIQUES:


Key Partitioned Techniques are
1) Hash
2) Modulus
3) Range
4) DB2

Hash:-- In Hash Partitioning Technique, Rows with same Key Column


values go to the same partition.
Hash is the technique often used in the datastage.

We Hash Partitioning technique when the key column data type is text.

Modulus: --- It Assigns each rown of an input dataset to partition , as


determined by a specified numeric Key column.
And we use Modulus Partition technique, when the key column data type is
numeric.
If one key column is numeric and another is text , the also we will go with
the Hash Partitioning technique.
Modulus Partition technique can be performed only on the Numbers.

Range: Range Partition technique is similar to the Hash Partition technique.


But the partition mapping is user determined and partitions are ordered.
Rows are distributed according to the values in one or more key fields, using
a range map.

Db2: Db2 Partitioning technique matches db2 EEE Partitioning

55. PROJECT CONCEPTS:


WHAT IS ETL PROJECT PHASE|PROJECT PHASE WITH ETL
TOOL(DATA STAGE)
ETL Project contains with four phases to implement the project.
ETL Means Extraction Transformation Loading
ETL is the tool used to extract the data
Transformation the Job
And to Load the Data
It is used for Business Developments
And four phases are

1) Data Profiling
2) Data Quality
3) Data Transformation
4) Meta data management

Data Profiling:-
Data Profiling performs in 5 steps. Data Profiling will analysis weather the
source data is good or dirty or not.
And these 5 steps are
a).Column Analysis
b) Primary Key Analysis
c) Foreign Key Analysis
d) Cross domain Analysis
e) Base Line analysis
After completing the Analysis, if the data is good not a problem. If your data
is dirty, it will be sent for cleansing. This will be done in the second phase.

Data Quality:-
Data Quality, after getting the dirty data it will clean the data by using 5
different ways.
They are
a) Parsing
b) Correcting
c) Standardize
d) Matching
e) Consolidate

Data Transformation:-
After competing the second phase, it will gives the Golden Copy.
Golden copy is nothing but single version of truth.
That means , the data is good one now.

56. SERVER COMPONETS OF DATASTAGE 7.5X2:


There are three Architecture Components in datastage 7.5x2
They are
a) Repository
b) Server (Engine)
c) Datastage Package Installer

Repository:--
Repository is an environment where we create job, design, compile and run
etc.
Some Components it contains are
JOBS,TABLE DEFINITIONS,SHARED CONTAINERS, ROUTINES ETC

Server( engine):-- Here it runs executable jobs that extract , transform, and
load data into a datawarehouse.

Datastage Package Installer:--


It is a user interface used to install packaged datastage jobs and plugins.

57. SURROGATE KEY IN DATASTAGE


Surrogate Key is a unique identification key. It is alternative to natural key .
And in natural key, it may have alphanumeric composite key but the
surrogate is
always single numeric key.
Surrogate key is used to generate key columns, for which characteristics can
be
specified. The surrogate key generates sequential incremental and unique
integers for a
provided start point. It can have a single input and a single output link.

58. IMPORTTANCY OF SURROGATE KEY:


Surrogate Key is a Primary Key for a dimensional table. ( Surrogate key is
alternate to Primary Key) The most importance of using Surrogate key is not
affected by the changes going on with a database.
And in Surrogate Key Duplicates are allowed, where it cant be happened in
the Primary Key.
By using Surrogate key we can continue the sequence for any jobs. If any
job was aborted at the n records loaded.. By using surrogate key you can
continue the sequence from n+1.

59. ROLES AND RESPONSIBILITIES OF DEVELOPER:


1) Preparing Questions
2) Logical Designs ( i.e Flow Chart )
3) Physical Designs ( i.e Coding )
4) Unit Testing
5) Performance Tuning.
6) Peer Review
7) Design Turnover Document or Detailed Design Document or Technical
design Document
8) Doing Backups
9) Job Sequencing ( It is for Senior Developer )

60. RCP (RUNTIME COLUMN PROPAGATION):


RCP is nothing but Runtime Column Propagation.
When we run the Datastage Jobs, the columns may change from one stage to
another
stage. At that point of time we will be loading the unnecessary columns in to
the
stage, which is not required. If we want to load the required columns to load
into
the target, we can do this by enabling a RCP.
If we enable RCP, we can send the required columns into the target.

61. PERFPRMANCE TUNING IN DATA STAGE:


t is more important to do the performance tuning in any job of datastage.
If Performance of the Job taking too much time to compile, we need to
modify the job design. So that we can good performance to the Job.
For that
a) Avoid using Transformer stage where ever necessary. For example if you
are using Transformer stage to change the column names or to drop the
column names. Use Copy stage, rather than using Transformer stage. It will
give good performance to the Job.
b) Take care to take correct partitioning technique, according to the Job and
requirement.
c) Use User defined queries for extracting the data from databases .
d) If the data is less, use Sql Join statements rather then using a Lookup
stage.
e) If you have more number of stages in the Job, divide the job into multiple
jobs.

62. DATASTAGE INSTALATION STEPS FOR WINDOWS:


Doing Datastage Installation is little bit time consuming. So you need to take
some time for the installation. It will take 4 hours maximum to install the
Datastage in your system.
Here you are going to learn how to install Datastage 8.0.1 Version.

Before continuing with the Datastage Installation you need to see your
system requirements.
Your system should have
a) Miminim of 2 GB Ram
b) Windows Server 2003 ( You can Install in another Windows Xp also, but
it is better to have Windows Server 2003 only)
c) You need to Install Oracle 9i/10g before the installation of Datastage.
d) You need to keep Fire wall off.
Open Your Cd to Install
1) Click on Install.exe
On the next Screen You can see
IBM Information Server
-----Client
-----Engine
-----Domain
-----Metadata Repository
-----Select All

Click on Select All and Next


Than Give File name from Browse
2) Select Product Module and Component.
3) Select Installation Type as
Typical
4) In Database Server Selection Select Install DB2 Version 9.1
5) In Metadata Repository Configuration

Databaser Owner------- xmeta


Password ------- xmeta
Confirm Password------ xmeta
Database name -------- xmeta
Databasser Instance

Database Location : c:location


Next
6) In websphere Application Server Select Install Websphere Application
Server

Next
7) In Websphere Server Administrator Information
Username ------- admin
password ------- admin
confirm password--admin

Administrator
Click on Start

8)Websphere Datastage Provides


Click on New Project
Give any name ( For example Project1)
ok

9) In Information Analysis Database


Database Owner : Welcome
Password : Welcome
Database Location : c: Location

10) In DB2 Server Selection


c:/ibm/ -----

11) Db2 Instance ownwe information


instance owner : db2 admin
password : server
confirm password : server
Instance Name: Db2inst
Instance part number: 50000
Click on Next

12) In odbc Drivers


Open Database Connectivity
Directory Name
Click ok Next

13) Job Monitoring port Configuration


First tcp/ (50000)

14) In National Language support


click on install NLS for websphere datastage server
Click on Next

15) IBM Websphere MQ Plugin selection


click next

16) Oracle Operator Configuration


click on configure an existing

Oracle 10g ( Which you have installed in your system already)


In IBM Information Server Desktop Shortcuts
Click tick

17) Pre-installation summary


Click on Install
If you got any warnings in the process just click ok

18) Restart when ever required, if it asks.


Click Finnish

19) Click on Webconsole Information Server


Start----Control Panel ---- Administrative Tools---Services

In services
a) ASB Agent Started Automatic
b) Datastage Engine Started Automatic
c) Datastage Telnet Started Automatic
d) DB2- DB2 Copy Started Automatic
e) IBM Websphere App Server v6 Started Automatic

20) Oracle Service you can give as orcl or oracle

21) In IBM Information Server


Login
username admin
password admin

22)In Tools -- Internet Security -- -Internet -- Custom level--prt all enable


User Authentication
Anonymous
Click on ok

You will get the message as


Welcome TO IBM

63. IN DATA STAGE THE DATA WILL BE STORED IN WHICH


DATA BASE:
First of all, the Datastage is first end tool; this is for designing jobs like
parellel jobs etc. We create a folder in the system. The data will be stored in
that folder.
That means the data will be stored according to the path we give.

64. WHAT IS DATA STAGE? ASCENTIAL DATA STAGE? HISTORY


OF DATA STAGE? WHAT IS DATA STAGE ETL TOOL?
What is Datastage? Ascential Datastage? History of Datastage? What is
Datastage ETL Tool?
Datastage is a comprehensive ETL Tool which provides end to end ERP
Solutions.
It is a powerful tool in the market for end to end ERP Solutions. ETL is
nothing but Extract Transform Tool.

History of Database:
In 1997, Vmark (is Top 100 Company). It is UK Based company. Mr.Lee
Cheffler was father of Datastage.
In that "Datastage" is called as Data Integrator

And This Product has been acquired by many companies. Then i has gone to
Torrent and gone to the hands of Informix.
Informix has the popular product Database and now they have Data
Integrator
In later years I.B.M acquired informix product Database in 2000.
Now Informix changed the name of the company as Ascential and they
changed the name of the product as Datastage server Jobs in 2000.
Later in 2002 Ascential Datastage integrated with Orchestrate (PX,UNIX).
Orchestrate is another tool.
Now Datastage got parallel capabilities from 2002 by integrating with
Orchestrate.
This software works on only in Unix Enovironment.
In 2004 December 7.5X2 Ascential Datastage integrated with MKS Tool kit.
This will be used to run the software in Windows environment. This tool
creates partial unix environment in Windows to run the Datastage software.
And the Ascential Suite Components are like
Profile Staging
Quality Staging
Audit Staging
Meta Staging
Datastage PX
Datastage TX
In 2005 I.B.M acquired Ascential everything (With Datstage)
and named it is I.B.M Datastage
I.B.M Datastage 7.5X2

65. ASCENCIAL DATASTAGE NAMING CONVENTIONS:

JOB NAME PREFIXES

Job prefixes are optional but they help to quickly identify the type of job and
can make job navigation and job reporting easier. Parallel jobs - par Server
jobs - ser Sequence jobs - seq Batch jobs - bat Mainframe jobs - Mfe

STAGE NAMES

The stage type prefix is used on all stage names so it appears on metadata
reports that do not include a diagram of the stage or a description of the
stage type. The name alone can be used to indicate the stage type.

Source and target stage names indentify the name of the entity such as a
table name or a sequential file name. The stage name strips out any dynamic
part of the name - such as a timestamp, and file extensions.

Database stage - db_table name


Dataset - ds_datasetname
Hash file - hf_hashfilename
Sequential file stage - sf_filename

The prefix identifies the source type; the rest of the name indicates how to
find that source outside of DataStage or how to refer to that source in
another DataStage job.

Transformation stages

Aggregation - AG_CalculatedContent (Prices, SalesAmounts,


YTDPrices)
Changed Data Capture - CD
Funnel - FO_FunnelType (Continuous, round robin)
Lookup - LU
Pivot - PI
Remove Duplicates - RD
Sort - SO_SortFields
Transformer - TR_PrimaryFunction (HandleNulls, QA, Map)

LINK NAMES

The link name describes what data is travelling down the link. Link names
turn up in process metadata via the link count statistics so it is very
important to use names that make process reporting user friendly.

Only some links in a job are important to project administrators. The link
naming convention has two types of link names: - Links of importance have
a five letter prefix followed by a double underscore followed by link details.
- Intermediate links have a link name without a double underscore.

Links of Importance: - The first primary link in a job consists of

SourceType (char2)pri(primary). - Any link from a reference source consists


of SourceType(char2)ref(reference). - Any link loading to a target consists of
TargetType(char2)UpdateAction(char3). - Any reject link
SourceType(char2)rej(reject).

Any project can add new links of importance, such as the output count of a
remove duplicates or aggregation stage.

Example: dbpri__stockitem is the first link in a job. dbups__stockitem is the


link loading to a target database table with an upsert option. dbref__orgcodes
is a reference lookup to of orgcodes to a database table. dbrej__stockitems is
a reject of upserts to the stockitem table.

You can then produce a pivot report against the link row count statistics to
show the row counts for a particular job using the five letter prefix as for
each type of row count.

Documented By
Bhaskar Reddy.A
Mail.abreddy2003@gmail.com
Contact:91-9916355577

15,18,

You might also like