You are on page 1of 10

11. How to run a Shell Script within the scope of a Data stage job?

A) By using "ExcecSH" command at Before/After job properties.

12. How to handle Date conversions in Datastage? Convert a mm/dd/yyyy format to


yyyy-dd-mm?
A) We use a) "Iconv" function - Internal Conversion.
b) "Oconv" function - External Conversion.

Function to convert mm/dd/yyyy format to yyyy-dd-mm is


Oconv(Iconv(Filedname,"D/MDY[2,2,4]"),"D-MDY[2,2,4]")

13 How do you execute datastage job from command line prompt?


A) Using "dsjob" command as follows.
dsjob -run -jobstatus projectname jobname

14. Functionality of Link Partitioner and Link Collector?


Link Partitioner: It actually splits data into various partitions or data flows using
various partition methods.
Link Collector: It collects the data coming from partitions, merges it into a single data
flow and loads to target.

15. Types of Dimensional Modeling?


A) Dimensional modeling is again sub divided into 2 types.
a) Star Schema - Simple & Much Faster. Denormalized form.
b) Snowflake Schema - Complex with more Granularity. More normalized form.

18. Containers Usage and Types?


Container is a collection of stages used for the purpose of Reusability.
There are 2 types of Containers.
a) Local Container: Job Specific
b) Shared Container: Used in any job within a project.

19. Compare and Contrast ODBC and Plug-In stages?


ODBC: a) Poor Performance.
b) Can be used for Variety of Databases.
c) Can handle Stored Procedures.

Plug-In: a) Good Performance.


b) Database specific. (Only one database)
c) Cannot handle Stored Procedures.

20. Dimension Modelling types along with their significance


Data Modelling is Broadly classified into 2 types.
a) E-R Diagrams (Entity - Relatioships).
b) Dimensional Modelling.
Q 21 Explain Data Stage Architecture?
Data Stage contains two components,
Client Component.
Server Component.
Client Component:
 Data Stage Administrator.
 Data Stage Manager
 Data Stage Designer
 Data Stage Director

Server Components:
 Data Stage Engine
 Meta Data Repository
 Package Installer

Data Stage Administrator:


Used to create the project.
Contains set of properties

We can set the buffer size (by default 128 MB)


We can increase the buffer size.
We can set the Environment Variables.
In tunable we have in process and inter-process
In-process—Data read in sequentially
Inter-process— It reads the data as it comes.
It just interfaces to metadata.

Data Stage Manager:


We can view and edit the Meta data Repository.
We can import table definitions.
We can export the Data stage components in .xml or .dsx format.
We can create routines and transforms
We can compile the multiple jobs.

Data Stage Designer:


We can create the jobs. We can compile the job. We can run the job. We can
declare stage variable in transform, we can call routines, transform, macros, functions.
We can write constraints.

Data Stage Director:


We can run the jobs.
We can schedule the jobs. (Schedule can be done daily, weekly, monthly, quarterly)
We can monitor the jobs.
We can release the jobs.

Q 22 What is Data Stage Engine?


It is a JAVA engine running at the background.

Q 23 What is Star Schema?


Star Schema is a de-normalized multi-dimensional model. It contains centralized fact
tables surrounded by dimensions table.
Dimension Table: It contains a primary key and description about the fact table.
Fact Table: It contains foreign keys to the dimension tables, measures and aggregates.

Q 24 Explain Types of Fact Tables?


Factless Fact: It contains only foreign keys to the dimension tables.
Additive Fact: Measures can be added across any dimensions.
Semi-Additive: Measures can be added across some dimensions. Eg, % age, discount
Non-Additive: Measures cannot be added across any dimensions. Eg, Average
Conformed Fact: The equation or the measures of the two fact tables are the same under
the facts are measured across the dimensions with a same set of measures.

Q 25 What are stage variables?


Stage variables are declaratives in Transformer Stage used to store values. Stage
variables are active at the run time. (Because memory is allocated at the run time).

Q 26 What are Macros?


They are built from Data Stage functions and do not require arguments.
A number of macros are provided in the JOBCONTROL.H file to facilitate getting
information about the current job, and links and stages belonging to the current job.
These can be used in expressions (for example for use in Transformer stages), job control
routines, filenames and table names, and before/after subroutines.

DSHostName
DSProjectName
DSJobStatus
DSJobName
DSJobController
DSJobStartDate
DSJobStartTime
DSJobStartTimestamp
DSJobWaveNo
DSJobInvocations
DSJobInvocationId
DSStageName
DSStageLastErr
DSStageType
DSStageInRowNum
DSStageVarList
DSLinkRowCount
DSLinkLastErr
DSLinkName

Q 27 What is Routines?
Routines are stored in the Routines branch of the Data Stage Repository, where you can
create, view or edit. The following programming components are classified as routines:
Transform functions, Before/After subroutines, Custom UniVerse functions, ActiveX
(OLE) functions, Web Service routines

What are the command line functions that import and export the DS jobs?
Answer:
 dsimport.exe - imports the DataStage components.
 dsexport.exe - exports the DataStage components.

What does a Config File in parallel extender consist of?


Answer:
Config file consists of the following.
a) Number of Processes or Nodes.
b) Actual Disk Storage Location.

Functionality of Link Partitioner and Link Collector?


Answer:
Link Partitioner: It actually splits data into various partitions or data flows using various
Partition methods.
Link Collector: It collects the data coming from partitions, merges it into a single data
flow and loads to target.

Did you Parameterize the job or hard-coded the values in the jobs?
Answer:
Always parameterized the job. Either the values are coming from Job Properties or from
a ‘Parameter Manager’ – a third part tool. There is no way you will hard–code some
parameters in your jobs. The often Parameterized variables in a job are: DB DSN name,
username, password, dates W.R.T for the data to be looked against at.

How did u connect with DB2 in your last project?


Answer:
Most of the times the data was sent to us in the form of flat files. The data is dumped and
sent to us. In some cases were we need to connect to DB2 for look-ups as an instance
then we used ODBC drivers to connect to DB2 (or) DB2-UDB depending the situation
and availability. Certainly DB2-UDB is better in terms of performance as you know the
native drivers are always better than ODBC drivers. 'iSeries Access ODBC Driver
9.00.02.02' - ODBC drivers to connect to AS400/DB2.

What are Routines and where/how are they written and have you written any
routines before?
Answer:
Routines are stored in the Routines branch of the DataStage Repository, where you can
create, view or edit.
The following are different types of Routines:
1. Transform Functions
2. Before-After Job subroutines
3. Job Control Routines

How did you handle an 'Aborted' sequencer?


Answer:
In almost all cases we have to delete the data inserted by this from DB manually and fix
the job and then run the job again.

What will you in a situation where somebody wants to send you a file and use that
file as an input or reference and then run job.
Answer:
• Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run
the job. May be you can schedule the sequencer around the time the file is expected to
arrive.
Under UNIX: Poll for the file. Once the file has start the job or sequencer depending on
the file

What is the utility you use to schedule the jobs on a UNIX server other than using
Ascential Director?
Answer:
Use crontab utility along with dsexecute() function along with proper parameters passed.

How do you rename all of the jobs to support your new File-naming conventions?
Answer:
Create an Excel spreadsheet with new and old names. Export the whole project as a dsx.
Write a Perl program, which can do a simple rename of the strings looking up the Excel
file. Then import the new dsx file probably into a new project for testing. Recompile all
jobs. Be cautious that the name of the jobs has also been changed in your job control jobs
or Sequencer jobs. So you have to make the necessary changes to these Sequencers.

What's the difference between Datastage Developers...?


Answer:
Datastage developer is one how will code the jobs. Datastage designer is how will design
the job, I mean he will deal with blue prints and he will design the jobs the stages that are
required in developing the code

How many places u can call Routines?


Answer:
Four Places u can call
1. Transform of routine
a. Date Transformation
b. Upstring Transformation
2. Transform of the Before & After Subroutines
3. XML transformation
4. Web base transformation

What is the Batch Program and how can generate?


Answer: Batch program is the program it's generate run time to maintain by the
Datastage itself but u can easy to change own the basis of your requirement (Extraction,
Transformation, Loading) .Batch program are generate depends your job nature either
simple job or sequencer job, you can see this program on job control option.

Question: Suppose that 4 job control by the sequencer like (job 1, job 2, job 3, job
4 ) if job 1 have 10,000 row ,after run the job only 5000 data has been loaded in
target table remaining are not loaded and your job going to be aborted then.. How
can short out the problem?
Answer:
Suppose job sequencer synchronies or control 4 job but job 1 have problem, in this
condition should go director and check it what type of problem showing either data type
problem, warning massage, job fail or job aborted, If job fail means data type problem or
missing column action .So u should go Run window ->Click-> Tracing->Performance or
In your target table ->general -> action-> select this option here two option
(i) On Fail -- Commit , Continue
(ii) On Skip -- Commit, Continue.
First u check how much data already load after then select on skip option then
continue and what remaining position data not loaded then select On Fail ,
Continue ...... Again Run the job defiantly u gets successful massage

Question: What happens if RCP is disable?


Answer:
In such case OSH has to perform Import and export every time when the job runs and the
processing time job is also increased...

What is the difference between the Filter stage and the Switch stage?

Ans: There are two main differences, and probably some minor ones as well. The two
main differences are as follows.
1) The Filter stage can send one input row to more than one output link. The
Switch stage can not - the C switch construct has an implicit break in every case.
The Switch stage is limited to 128 output links; the Filter stage can have a theoretically
unlimited number of output links. (Note: this is not a challenge!)
Advantages of the DataStage?
Answer:

Business advantages:

• Helps for better business decisions;


• It is able to integrate data coming from all parts of the company;
• It helps to understand the new and already existing clients;
• We can collect data of different clients with him, and compare them;
• It makes the research of new business possibilities possible;
• We can analyze trends of the data read by him.

Technological advantages:

• It handles all company data and adapts to the needs;


• It offers the possibility for the organization of a complex business intelligence;
• Flexibly and scalable;
• It accelerates the running of the project;
• Easily implementable.
• What is the architecture of data stage?

Basically architecture of DS is client/server architecture.

Client components & server components

Client components are 4 types they are


1. Data stage designer
2. Data stage administrator
3. Data stage director
4. Data stage manager

Data stage designer is user for to design the jobs

Data stage manager is used for to import & export the project to view & edit the
contents of the repository.

Data stage administrator is used for creating the project, deleting the project & setting
the environment variables.

Data stage director is use for to run the jobs, validate the jobs, scheduling the jobs.

Server components

DS server: runs executable server jobs, under the control of the DS director, that extract,
transform, and load data into a DWH.
DS Package installer: A user interface used to install packaged DS jobs and plug-in;

Repository or project: a central store that contains all the information required to build
DWH or data mart.

I have some jobs every month automatically delete the log details what r the steps u
have to take for that

We have to set the option autopurge in DS Adminstrator.

I want to run the multiple jobs in the single job. How can u handle.

In job properties set the option ALLOW MULTIPLE INSTANCES.

. What is version controlling in DS?

In DS, version controlling is used for back up the project or jobs.


This option is available in DS 7.1 version onwards.
Version controls r of 2 types.
1. VSS- visual source safe
2. CVSS- concurrent visual source safe.

VSS is designed by Microsoft but the disadvantage is only one user can access at a time,
other user can wait until the first user complete the operation.
CVSS, by using this many users can access concurrently. When compared to VSS, CVSS
cost is high.

What is the difference between clear log file and clear status file?

Clear log--- we can clear the log details by using the DS Director. Under job menu
clear log option is available. By using this option we can clear the log details of
particular job.

Clear status file---- lets the user remove the status of the record associated with all
stages of selected jobs.(in DS Director)

I developed 1 job with 50 stages, at the run time one stage is missed how can u
identify which stage is missing?

By using usage analysis tool, which is available in DS manager, we can find out the what
r the items r used in job.

How to do road transposition in DS?

Pivot stage is used to transposition purpose. Pivot is an active stage that maps sets of
columns in an input table to a single column in an output table.
If a job locked by some user, how can you unlock the particular job in DS?

We can unlock the job by using clean up resources option which is available in DS
Director. Other wise we can find PID (process id) and kill the process in UNIX server.

I am getting input value like X = Iconv(“31 DEC 1967”,”D”)? What is the X value?

X value is Zero.
Iconv Function Converts a string to an internal storage format.It takes 31 dec 1967 as
zero and counts days from that date(31-dec-1967).

I have three jobs A,B,C . Which are dependent on each other? I want to run A & C
jobs daily and B job runs only on Sunday. How can u do it?

First you have to schedule A & C jobs Monday to Saturday in one sequence.
Next take three jobs according to dependency in one more sequence and schedule that job
only Sunday.

Before-Stage and After-Stage Routines

Because the Transformer stage is an active stage type, you can specify routines to be
executed before or after the stage has processed the data. For example, you might use a
before-stage routine to prepare the data before processing starts. You might use an after-
stage routine to send an electronic message when the stage has finished.

Date Conversions
The following examples show the effect of various D (Date) conversion codes.

Conversion Expression Internal Value


X = Iconv("31 DEC 1967", "D") X=0
X = Iconv("27 MAY 97", "D2") X = 10740
X = Iconv("05/27/97", "D2/") X = 10740
X = Iconv("27/05/1997", "D/E") X = 10740
X = Iconv("1997 5 27", "D YMD") X = 10740
X = Iconv("27 MAY 97", "D X = 10740
DMY[,A3,2]")
X = Iconv("5/27/97", "D/MDY[Z,Z,2]") X = 10740
X = Iconv("27 MAY 1997", "D X = 10740
DMY[,A,]")
X = Iconv("97 05 27", "DYMD[2,2,2]") X = 10740

Date Conversions
The following examples show the effect of various D (Date) conversion codes.

Conversion Expression External Value


X = Oconv(0, "D") X = "31 DEC 1967"
X = Oconv(10740, "D2") X = "27 MAY 97"
X = Oconv(10740, "D2/") X = "05/27/97"
X = Oconv(10740, "D/E") X = "27/05/1997"
X = Oconv(10740, "D-YJ") X = "1997-147"
X = Oconv(10740, "D2*JY") X = "147*97"
X = Oconv(10740, "D YMD") X = "1997 5 27"
X = Oconv(10740, "D X = "MAY 97"
MY[A,2]")
X = Oconv(10740, "D X = "27 MAY 97"
DMY[,A3,2]")
X = Oconv(10740, X = "5/27/97"
"D/MDY[Z,Z,2]")
X = Oconv(10740, "D X = "27 MAY 1997"
DMY[,A,]")
X = Oconv(10740, X = "97 05 27"
"DYMD[2,2,2]")
X = Oconv(10740, "DQ") X = "2"
X = Oconv(10740, "DMA") X = "MAY"
X = Oconv(10740, "DW") X = "2"
X = Oconv(10740, "DWA") X = "TUESDAY"

• Complex Flat File Stage:


A new Parallel Complex Flat File stage has been added to read or write files that contain
complex structures (for example groups, arrays, redefines, occurs depending on, etc.). Arrays
from complex source can be passed as-is or optionally flattened or normalized.

You might also like