DS Faqs

TCS
1.What are the sources systems and target systems in your project?
-> source: flatfiles,oracle,(any database).
-> target: db2(8.1px),orcle,sql-server.
2.what are the ETL stages you have used in your project?
-> transformer,sort,join,dataset,lookup,change capture,aggrigrater.
3.what are the difference between oracle and odbc enterprise stages?
4.which stage gives good performance?
-> dataset stage
5.what is the diff between dataset and sequentical stages?
->SEQ FILE:
1.extract/load from/to seq file.max==2GB.

2.when used as a source at the time of compilation it will be converted into native
format from ASCII.
3.By default it will be processed in sequence only.
4.doesn't support null values.
5.processed at the server.
6.supports .csv,.txt,.xls etc...
7.does not support lkp fileset.
-> DATASET:
1.used as an intermediate stage.

2.at compile time conversion is not required.
3.datasets get processed in our local system.so performance is improved as the server
is not loaded.
4.supports only .ds
5. 2GB limit is not thete
6.support lkp
6.what are the diff processing stages you have used in your project?
-> transformer,modify,changeapply,merge,sort,survgate.
7.what is the shared container? what is the purpose of using this stage?
-> create reusable object that many jobs within the project can include.
-> When we go for parallel shared container the logic can be reusable across many
jobs
8.what is the funnel stage?
-> this is stage is use to comibne maltiple input data(same metadata) to single outputdta.
1.CONTINOUS FUNNEL: NO PARTCULAR ORDER( LOAD WITHOUT
ORDERING).
2.SORT FUNNEL: IT LOAD IN PARTCULAR ORDER(ASCENDING OR
DECENDING).
3.SEQVANCE FUNNEL: IT READ FIRST INPUT FIRST AFTER SECOND INPUT
SECOND LOAD
9.what is the diff between joner and merge stage?
10.what is the diff between joiner and lookup stage?
11.suppose if i have souce data records and lookup data is about 50000 records.
which stage is preferred to used for this requirement?
-> look up
12.suppose if i have 50000 records in source and 100 records in lookup stage,
then what is the stage preferred for this requirement?
->lookup
13.what is the diff between normal transformer and basic transformer?

-> A Basic transformer compiles in "Basic Language" whereas a Normal Transformer
compiles in "C++".
-> Basic transformer does not run on multiple nodes wheras a Normal Transformer can
run on multiple nodes giving better performance.
-> Basic transfomer takes less time to compile than the Normal Transformer.
-> Basic transformer stage can only be used for SMP(Symmetric Multiprocessors)
systems and
not for MPP(Massively Parellel Processing) system or clusters systems.
USAGE :
-> A basic transformer should be used in Server Jobs.
-> Normal Transformer should be used in Parallel jobs as it will run on multiple nodes
here giving better performance.
14. what is the diff between parallel and server jobs?

-> The basic difference between server and parallel jobs is the degree of parallelism
-> Server job stages do not have in built partitoning and parallelism mechanism for
extracting and loading data between different stages.
-> Datastage compiled in to BASIC(interpreted pseudo code) and Parallel compiled to
OSH(Orchestrate Scripting Language).
-> Server Transformer supports basic transforms only, but in parallel both basic and
parallel transforms.
-> In File stages, Hash file available only in Server and Complex falat file , dataset ,
lookup file set avail in parallel only.
-> Parallel Jobs haveing a huge amount of data to maintains.
-> server Jobs haveing a low amount of data to maintains.
-> Parallel Jobs it having a more stages comperies server Jobs .
-> Parallel Jobs ARE supports smp,mpp.
-> server Jobs support smP.
15.what is the stage you have used to load the data into teradata table?
-> teradeta api stage
16.what is the multinode concept in teradata?
17.what is the slowly changing dimension?
-> it explain how to capture the changes in the target over the period of the time.
-> it explain change data capture.
18.how do u capture the changes in slowly changing?
-> Type I: Replace the old record with a new record with updated data there by we lose
the history.
But data warehouse has a responsibility to track the history effectively where
Type I implementation fails.
-> Type II: Create a new additional dimension table record with new value. By this way
we can keep the history.
We can determine which dimension is current by adding a current record flag or
by time stamp on the dimensional row.
-> Type III: In this type of implementation we create a new field in the dimension table
which stores the old value of the dimension.
When an attribute of the dimension changes then we push the updated value to
the current field and old value to the old field.
19.what is the surrogate key?
-> A surrogate key is a system-generated (non-meaningful from a business

perspective).it specified by uniqness
-> It is the system generated key which cannot be edited by the user.
-> It is the primary key of the dimension table in warehouse.
-> It is nothing but the sequence which generates the numbers for primary key column.
20.what is the diff between narmalized model and dimensional model?
-> normalized data is held in a very simple structure. The data is stored in tables.
-> Each table has a primary key and should contain data relating to one entity – so a
normalized customer table contains only data about customers.
-> We need to make logical connections between the entities (for example, this
customer placed these orders).
-> To do this we use a foreign key in the orders table to point to the primary key in the
customer table.
21.what is the star schema?
-> A relational database schema organized around a central table (fact table) joined to
a few smaller tables (dimension tables) using foreign key references.
-> The fact table contains raw numeric items that represent relevant business facts.
-> Star schema is a type of organising the tables such that we can retrieve the result
from the database easily and fastly in the warehouse environment.
Usually a star schema consists of one or more dimension tables around a fact table
which looks like a star,so that it got its name.
22.what is the ods and edw?
-> ods mean Operational Data Store

-> it maintain the transactional date(current data)
-> in ods tha data is volatile(it can be changed)
-> EDW means enterprize datawarehouse
-> it maintain histrocial data for long time
-> in edw data is non-volatile(it cant chane)
23.when do u use ods and edw?
-> ODS can be described as a snap shot of the OLTP system.It acts as a source for
EDW(Enterprise datawarehouse).
-> ODS is more normalised than the EDW.Also ODS doesnt store any
history.Normally the Dimension tables remain at the ODS
(SCD types can be applied in ODS)where as the Facts Flow till the EDW.
-> edw maintain histrocialdata.
-> it is very use ful to decision making for enterprize.
24.WHAT IS DATAMART?
-> A data mart is a subset of an organizational data store, usually oriented to a specific
purpose or major data subject,
that may be distributed to support business needs.
25.if i want to delete dataset file in the unix, how do i do that?
-> YA, ORCHADMIN RM <DATASET NAME>
26.how do u schedule the jobs?
-> using datastahe director-> tools
27.how do you scedule the jobs in job sequence?
-> 1. Open Datastage

-> 2. Open New Job Sequence
-> 3. Save it with some name.
-> 4. A your job sequence is created.
-> 5. Now drag the job sequence stages-->as per your business requirement.
28.what is the normal view and materialized view? when do u used which view?
VIEW
-> A view takes the output of a query and makes it appear like a virtual table. You can
use a view in most places where a table can be used.
-> All operations performed on a view will affect data in the base table and so are
subject to the integrity constraints and triggers of the base table.
-> A View can be used to simplify SQL statements for the user or to isolate an
application from any future change to the base table definition.
-> A View can also be used to improve security by restricting access to a predetermined
set of rows or columns.
MATERLIZED VIEW
-> Materialized views are schema objects that can be used to summarize, precompute,
replicate, and distribute data. E.g. to construct a data warehouse.
-> A materialized view provides indirect access to table data by storing the results of a
query in a separate schema object.
-> Unlike an ordinary view, which does not take up any storage space or contain any
data.
-> The existence of a materialized view is transparent to SQL, but when used for query
rewrites will improve the performance of SQL execution
-> An updatable materialized view lets you insert, update, and delete.
29.what it the buildops?
-> Buildops are good if users need custom coding but do not need dynamic (runtime-
based) input and output interfaces.
-> Buildop provides a simple means of extending beyond the functionality provided by
PX,
but does not use an existing executable (like the wrapper).
Reasons to use Buildop include:
-> Speed / Performance

-> Complex business logic that cannot be easily represented using existing stages, like
advanced lookups across a range of values,
custom surrogate key generation, calculating olling aggregates
-> Build once and reusable everywhere within project, no shared container necessary
-> Can combine functionality from different stages into one
30.In datastage, which language code will be used?
-> OSH(orcistrate shell lang)
31.why do u use copy stage? can do sort in copy stage?
-> to copy tha data in multiple targets

-> Yes we do sort in copy stage.
32.in what other stage we can do copy?
-> transformer
33.what is sort stage? in what other stage we can do sort?
-> soting(ascending or descending) the data based on key.

-> using copy stage we can sort.
34.why transformer stage so costlier in development? how this stage different from
other stage?
-> ref: OSH : Archestrate shell script

-> multi purpose
-> counters and hold values for previous rows to make comparison
-> hold dervvation to be used in multiple field dervations
-> can be used to control execution of constraints
-> it compile on c++ genrated object mode
35.what is modify stage? what are all diff things we can in modify stage?
-> modify column types

-> null handling
-> date/time handling
-> string handling
-> add or drop columns
36.what is column generate stage?
-> it adds column to incoming data and generates mock data for these columns for each
data row processed
37.instead of using column generate stage can we generate a column in any stage?
-> runtime column propagation (RCP)
38.what is RCP?
-> runtime column propagation (RCP)

-> IF Your job counters extra columns that are not defined in the metadata when it
actually runs
it will adopt these extra columns and propagate them through the rest of the job.
39.diff between lookup,join and merge stage
40.can we track rejections in join stage?
-> NO
41.why do we use funnel stage?
-> the funnel stage copies multiple input data(same meta data) to a ingle output dataset
-> the operation is usepul for combining separate datasets into a single large dataset
-> a processing stage that combines datafrom multiple input links to a single output link
42.which one will give good performance between ODBC & Oracle Enterprise stages?
-> oracle enterprize(parell and easy connct)
43.How do we reject the records in sequential file stage?

-> using reject port
44.In what other stage we can reject the records?

-> all except join
45.what is filter stage?
-> to select aportion of data for checkng

-> filter the data based on condtion
46.what is difference between job parameters and environment variables?
-> Basically Environment variable is predefined variable those we can use while
creating DS job.
-> We can set eithere as Project level or Job level.
-> Once we set specific variable that variable will be availabe into the project/job
47.what is the aggregator stage?
-> which is use to calculate the summrized for a group og records

-> columns to be aggregated
48.what is shared container stage?
-> When we go for parallel shared container the logic can be reusable across many jobs
49.Have you used sequencer jobs? what are the different triggers you have used?
-> ok condition,failed warning,custom,userstatus,unconditional
50.Explain about yourself, what is your work experience and how do you rate yourself on
datawaresing?
51.what model we are doing here?

->star shema
52.suppose if we have two flat files and if we want to build one scenario where one fact
table
and two dimensions then how do you build this and what trasformations you use there?
-> one transformer,one lookup,two targets
53.why you use shared container to read read unix flat file why not some other stage?
54.what is that some other stage?
55.what is the teradata version you are using?
56.what is ur target,sources?
-> target:db2
-> source:flatfiles
57.who will create environment variables?

-> admin
58.how do you register a plug-in?
59.suppose take three jobs and if we want to run those jobs using sequencer how do you
use that and
explain.
-> arrange seqvansilly with links
60.how do the jobs run in sequecer? parallelly or sequecially?

-> we can run ant type( both server and parelle)
61.what are different stages involve in the sequencer job and explain them?
62.what is the difference between star & snow flake schemas?

for what purpose u will use them?
63.do you know narmalization techniques?
64.suppose u have build summarised tables and if we need to improve performance what
do u do?
65.what type performance u will get?
66.how do you improve load performance?

->increase memory size
67.what is an aggregator?What it do?
POLARIS-INTERVIEW
68. What is the size of database in your project?
-> 110gb
69. What is the size of data extracted in the extraction process?
-> lee then 1gb
70. How many data marts are there in your project?
-> 2
71. How many fact and dimension tables in your project?
-> 4 fact tables 10 dimension table
72. What is the size of fact table in your project?
73. How many dimension tables did you had in your project and name some dimensions?
-> 3 book,country,counter,subaccount,counterparty,security,legal
entity,intercompany,trnstype,project.
74. How many measures you have created?
-> 45
75. How do you enter into oracle?
JDA
76.What is difference between local & environment varialbles?
77.what is the routine? have you used the routines in your project?
78.what are after and before routines?
79.what are stage variables? where do we create stage variables? why do we use stage
variables?
80.how do we access the oracle database from datastage?
-> orcle enterprize,odbc
81.suppose if there is no client version in your local machine,the oracle is been installed
in unix box,then how do you get access the oracle connection.
82.can create an environment variable from your designer?

->no
83.suppose there is no datastage client components installed in your local machine then
how do you compile and run the job?
84.suppose a single job is accessed in three sequencer jobs, can we run three sequencer
jobs at a time?
-> all
85.what is job controls? when do we specify these controls and where?
86.how sequential file execute during the run time in PX? Either sequential or parallel
mode?
->seqvential
87.how do we make a sequential file stage to run in parallel mode?
-> no of readers for node =2
IBM INTERVIEW QUESTIONS (15TH OCT’07)
88. Have you built separate data marts for financial and manufacturing aspects?
89. Which schema have you used?
90. Wether you were loading data into DWH or a staging area?If loading the date into
DWH, then how do you load the data into target table?
91. How many fact and dimension tables you have used?
92. Tell me about funnel stage
93. Which stage in server jobs is equivalent to funnel stage?
94. Difference between merge stage and funnel stage
95. Explain about SCD type-2?
96. What is the order of execution: stage variable,constraints,derivation?

->yes
97. How you used to get the source?
98. Difference between data set stage and file set stage
99. If we are using SFS as source and hash files as reference, with transformer stage and
pulling data
into target without using any constraints, then which kind of join simulation we are
doing?
100. What is a factless fact? Give example?
101. What is a semi-additive fact. Give example
102. How to go for SCD type 2 in server jobs?
WIPRO INTERVIEW QUESTIONS (26TH OCT’07)

103. Tell me about your project. Was it a Datawarehousing project?
104. How many dimensions and fact tables you have developed?name some of the fact
and dimension tables
105. Have you been a part of the production team?

->no
106. Name some of the stages used by you
107. Difference between join and look-up stage
108. Difference between funnel and merge stage
109. What is the difference between stage variable and parameters?
110.If the look-up table has more than 10,000 records and the source table has
considerably less number of records,which stage will you use?look-up or join
'
-> sprase lokup
111. What is configuration file?
112. What is a node?
113. What are the different types of partitions? Hwat is the default?
114. Difference between same and entire partition
115. What different things can be done by using transformer stage
116. What all things you can do using copy stage
117. If I have a file containing date as one of the columns and I have 5 years of data.
Now I have to load the data at run time according to user requirements which is 6 months
of data. How to go for it?
-> using job parmater
TERRADATA INTERVIEW QUESTIONS (26th Oct ’07)
DATA STAGE
118. What is a configuration file?
119. have you ever created a configuration file

-> no
120. If you have a 4CPU box, what kind of configuration file you suggest i.e, how many
number of processing nodes
121. What is a pipeline parallesism? How is is different from partition parallelism
122. What is a data set?How is it different from SFS?
123. In data stage where do we store jobnames,parameters etc. what is the name of the
place/database where we store them
124. What do you mean by restartability of sequencer
125. What is meant by multiple instances
126. Have you developed any routines or custom transforms
127. What is meant by shared containers? What are the different types of shared
containers?what is the difference between them
128. Difference between look-upstage,join stage and merge stage?
129. How to optimize and tune the parallel jobs for better performance
130. What is an environment variable
ORACLE
131. What is a correlated subquery
132. What is bitmap-indexing
133. What is a materialized view
UNIX
134. What is shell scripting
DWH
135. What is the difference between a fact table and a dimension table
136. What is confimed dimension

137. What is a factless fact
138. What is the best architecture?
139. Explain SCD type III
IBM Interview 14th and 15th June-2007
140. What is invocation id?
141. What is RCP?
142. Difference between stage and environmental variables?
143. What is array size and transaction size?
144. In a sequence of 5 jobs, if aborts at 3rd job. If we run the sequence, from where it
starts processing?
145. Difference between sparse and normal lookup and where we use sparse. Can give
one situation?
146. Data set will be stored in two formats. What are they?
147. How do you invoke the UNIX scripts in Data stage jobs?
148. What is Email task?
149. When do you use separate sort stage and inbuilt sort utility in a stage? What is the
difference between these two?
150. Have you run the jobs with odd number of nodes? If not why?
151. How many nodes have you used in your project?
152. Have you created environmental variables?
153. Difference between sequential file and data set?
154. If a job is aborted. Can you run it with out compiling?
155. How do you committ the records while loading?
156. What is the purpose of check points?

157. What are job Controls?
158. How to call the stored procedures in data stage jobs? Can you specify constraints?
159. How do you compile stored procedures?
160. If a job has 3 stages and nodes=3 then how many process you get?
161. A sequential file has 100 records. How to use first 19 records in the job?
162. How to check the number of nodes while running the job in UNIX environment?
163. Have you used the load option?
164. A transformer stage is running in parallel. How can you change to sequential?
165. What is preserve partitioning?
166. Configuration file is specified at project level and at job level. Which one will
override the other?
-> may be job level

->
167. In UNIX you don’t know some commands. While running the jobs you need the
commands, then how can you get them?
168. What is isolation?
-> see below
170. What is NLS?
-> national lang support
171. What command is used to search a string?
-> grep command:-> is used for finding any string in the file.
172. What commands do you know in UNIX?
-> grep ( searchinf file)

-> echo (display file)
-> chmod ( change mode)
-> rm ( remove)
-> cp(copy)
-> ls (list of files)
173. How do you start UNIX server and stop it?
-> n a UNIX environment, you can also use the pd_start script to manually start and
stop the server processes.
-> syn: # pd_start {start|restart|stop|status}
174. Performance difference between join and merge. Which takes more memory?
-> performance = join

-> join takes more memory
175. What is Email loop activity?
-> Notification Activity - used for sending emails to user defined recipients from
within Datastage
176. A sequential file has 4 duplicate records. How do you get 3rd record?
-> no not posible in seq file
177. What types of joins can be specified in look up stage?
-> inner and outer
178. What routines you wrote?
179. If the job is aborted while running in UNIX, then what error message you get?
180. Command VI is used for what?
-> The default editor that comes with the UNIX operating system is called vi (visual
editor)
-> editing
181. Which version of UNIX you are using?

-> 5.2
182. How do you connect to UNIX?
-> using command line in admin

-> osh commands
-> orchadmin
183. Link collector is present in parallel jobs or not?
-> ya
184. To join 2 sequential files which stage you use?
-> join or merge
185. Performance difference between ODBC, Oracle Enterprise, Dynamic RDBMS?
-> may be oracle enterprize bcoz of native connection
186. What performance techniques do you use while developing jobs?
-> using partioning methods

-> types of parallel( pipeline and partion)
-> using hash key
-> enablin and increase cashe memory
INTERVIEW QUESTIONS
ORACLE Satyam Teynampet 17/04/2004
187. What are inline views?
-> An inline view is a SELECT statement in the FROM-clause of another SELECT

statement.
188. What is a subquery & correlated subquery. Tell with example?
-> In sub query the inner query is executed only once. Depeding upon the results of
inner query outer query is evaluated.
-> ex: select * from emp where deptno=( select deptno from dept where
dname='hyd');
-> In correlated subquery the inner query is evaluated once for each row processed
by the parent statement or outer query.
-> SELECT empnum, name FROM empl e WHERE sal > (SELECT avg(sal)
FROM emp WHERE department = e.department);
189. What is normalisation. Please state all forms with example?
-> Normalization is process for reducing the redency of the data.there 5 normal
forms mainly used in 3 forms4th one boyesscoded normal form.
-> First Normal form (1NF): A relation is said to be in 1NF if it has only single
valued attributes, neither repeating nor
arrays are permitted.
-> Second Normal Form (2NF): A relation is said to be in 2NF if it is in 1NF and
every non key attribute is fully functional
dependent on the primary key.
-> Third Normal Form (3NF): We say that a relation is in 3NF if it is in 2NF and has
no transitive dependencies.
-> Eliminate Columns Not Dependent On Key - If attributes
do not contribute to a description of the key, remove them to a
separate table.
190. If I select from two tables, how do I assure which table is searched first?
->
191. What are triggers. What are Row level and Statement Level triggers?
-> Triggers are an action that is performed is a condition is met within the code.
-> Row-Level Triggers
-> Statement-Level Triggers
-> Before Trigger
-> After Trigger
-> Schema Triggers
-> Database level Triggers
-> row-level trigger: A row-level trigger fires once for each row that is affected by a
triggering event
-> Statement-Level Triggers: fires once per triggering statement regardless of the
number of rows affected by the triggering event
192. Implicit and Explicit Cursors. Does it work for and update statement?
-> PL/SQL declares a cursor implicitly for all SQL data manipulation statements,
including quries that return only one row
-> explicit cursor or use a cursor FOR loop However,queries that return more than
one row.
-> An implicit cursor is used for all SQL statements Declare, Open, Fetch, Close. An
explicit cursors are used to process multirow
SELECT statements An implicit cursor is used to process INSERT, UPDATE,
DELETE and single row SELECT. .INTO statements.
Satyam G.N.Chetty Road 28/03/2004
193. Write a statement using inline views?
-> small example

-> select * from ( select * from dept);
194. Write a sql to fetch the sumof top 5 salaries department wise?
->
195. What is a snapshot to old error?
-> If you have lots of updates, long running SQL and too small UNDO, the ORA-
01555 error will appear.
-> The SNAPSHOT_TOO_OLD erro will come ... the query will goves into
cartesion product or infinite loop
-> Other case is suppose ur updated loarge no of rows at atime without saveing the
records .
In this case u can use commit statement for every 500 records then u can avoid this
problam.
Or ask DBA to extend the table space for this segment.
196. What is the difference between Function and Procedure?
-> 1. Functions must return a value while procedure may or may not return any
value.
-> 2. Function can return only a single value at a time while procedure can return
one, many or none.
-> 3. Function can be called from sql statements while procedures can't.
197. What if I give a return parameter in Function as NULL?
198. How do I complile Pl/Sql procedure from Unix Prompt?
-> x := OSCommand_Run('/home/test/myoscommand.sh')
199. A view has three base tables. One of the column of a table is removed. What
happens to the view.
-> a pattel error occur
200. How do I enable the above view again.
201. How to read a flat file using utl file?
-> "UTL_FILE" allows PL/SQL programs to both read and write to any operating
system files that are accessible from the server
on which your database instance is running.
ex: utl_file_dir = c:sampledata
202. What are Ref cursors?
-> CURSOR IN WHICH QUERY CAN BE CHANGED AT RUNTIME IS

CALLED REF CURSOR
-> ref cusor is a dynamic type of cursor which is used to provide Reusability of
cursor.......
-> Ref Cursors also known as Dynamic cursors cane be associated with as many
SELECT statements you want at different times.
You can associate with different SELECT stataments at dynamic time
-> two types in dynamic cur
1) Strong- Return type
2) weak- No Return type
203. What are the Build in packages you have used?
204. What does a change capture do

->same question below
205. What does a transformer stage do?
-> dervations
-> constrains
-> experssions for constraints and dervations can referance
-> job parameters
-> funcations
-> system varaiables and cinstraints
-> stage variables
-> external routines
206. What is a sequencer job and why do we use it
-> to control the flow of jobs

-> it excutes jobs in seqvantially.
207. Why do we run jobs thru shell scripts
208. What is making a job as of multiple instances ?
-> same question below
209. What is the link ordering in the datastage?
-> -> same question below
210. What is the combinality mode in the teradata stage?
211. What is Hashed file and where do we use it.
235.what is multiple instance of making the job?

236. what is the fast multi node & fast default node
->
237. what is the routine?what it does?
same question below
238. what is the shared container?how & where do u use shared containers?
239. Tell me few runtime errors u struggled with?
-> if transformer used check c++ compiler

-> if buildop errors try bulidop from command line
-> chek enveronment variables settings
-> examine genrated osh
-> some stages not supported rcp
-> null handling errors...........
240. what change capture stage do?
-> change capture stage catch holds of changes from two different datasets and
generates a new column called change
code....change code has values
0-copy
1-insert
2-delete
3-edit/update
241. Tell few stages u used in ur development environment?
-> explain any stages
242. what is the defferences between server job & parallel job & main frame jobs?
-> Server jobs. These are available if you have installed DataStage Server.
They run on the DataStage Server connecting to other data sources as
necessary.
-> Parallel jobs. These are only available if you have installed Enterprise Edition.
Server jobs can be run on SMP MPP machines.Here performance is low
i.e speed is less
-> Parallel jobs can be run only on cluster machines .Here performance is high i.e
speed is high
-> These run on DataStage servers that are SMP MPP or cluster systems. They can
also run on a separate z/OS (USS) machine if required.
243.what is the defferences between server shared container & parallel shared containers?
-> SERVER SHARED CONTAINERS CAN BE USED IN BOTH SERVER &

PARALLEL JOBS,BUT PARRALLEL CONTAINERS CAN NOT BE USED IN
SERVER JOBS.
jobs
-> Server shared containers contain server stage types, parallel shared containers
contain parallel stage types.
244. where is the repository comes into picture?
-> Repository resides in a spcified data base. it holds all the meta data rawdata
mapping information and all the respective mapping information.
-> it contain jobs,routines,tabledefinations etc.......
245. what is the reject file ordering?
246. what are the responsibilities of Datastage Manager?
-> import and export metadata

-> backup and recovery the project
-> use the mmanager to store and manage reusable metadata for the jobs you define in
desiner
-> this meta data includes table and file layouts and routines for transforming extracted
data.
247. what is the Analysis tool? what we do with that?
-> may be usage analysis
248. What do we do with Reporting Assistant?
249. What change capture stage do?
-> change capture stage catch holds of changes from two different datasets and
generates a new column called change
code....change code has values
0-copy
1-insert
2-delete
3-edit/update
250. what is the link ordering in the datastage?
-> U can edit the order of the input and output links from the Link ordering tab in the
stages.
251. compare the informatica & datastage tools in view of performance
-> Main difference lies in parellism Datastage uses parellism concept through node
configuration where Informatica does not
-> Partitioning - Datastage PX provides many more robust partitioning options than
informatica.
You can also re-partition the data whichever way you want.
-> Parallelism - Informatica does not support full pipeline parallelism (although it
claims).
-> File Lookup - Informatica supports flat file lookup but the caching is horrible.
DataStage supports hash files lookup filesets datasets for much more
efficient lookup.
-> Merge/Funnel - Datastage has a very rich functionality of merging or funnelling the
streams.
In Informatica the only way is to do a Union which by the way is always
a Union-all.
252. how many jobs u have been created?
-> 100 + jobs in last six months
253. Explain few of the jobs u created?
-> sort
-> aggrigrater
-> remove duplicate
-> surogate key
-> transformer
254. how do u create stage variables in transformer stage?
-> transformerstage -> stage properties -> varables
255. what is the combinality mode in the teradata stage?
256. what is the datastage debugger? where it runs?
-> it check various locations

-> it check and debug the portion of data or data
-> like peek,head,tail......
257. how do u test your jobs at client place?
-> using datastage director
258. if input & output column datatypes are not matched and not used proper conversion
process
then there is an error like this.
-> pattel error datatypes mis match

-> bad conversion error
259. When checking operator: When binding output schema variable "outRec": When
binding output
interface field "CPSC_CODE" to field "CPSC_CODE": Converting nullable source to
non-nullable
-> result; fatal runtime error could occur (use modify operator to specify value to
which null
should be converted)
260. what is the architecture of datastage?
-> client-server architeck
261. how do you define and use the job parameters?
-> create in job properties

-> used at runtime
262. what is stage variables,system variables,environment varaibles?
-> Stage Variable - An intermediate processing variable that retains value during read
and doesnt pass the value into target column.
-> system variable - System variables have a predefined type and structure that cannot
be changed. When an expression is stored into a system variable,
it is converted to the variable type, if necessary and possible.
263. how to use routines in datastage?
-> write the routine in C or C++ create the object file and place object in lib directory.
-> now open disigner and goto routines configure the path and routine names
-> Routines are stored in the Routines branch of the DataStage Repository, where you
can create, view or edit.
1) Transform functions
2) Before-after job subroutines
3) Job Control routines
264. what is difference between shared-container and local-container?
-> A shared container can be shared between jobs and is great for re-usability.
-> local container can only be used in the job in which it is created,
it is useful for simplifying job design by breaking the job into different sections.
265. how do you connect to oracle ?

-> using odbc connections
266. what is TrimB and TrimF functions in datastage?
-> Trim - removes spaces and tabs (or %stripchar%)

-> TrimB - removes all trailing spaces and tabs
-> TrimF - removes all leading spaces and tabs
267. please explain any ETL process that you have developed?
-> u explain any stages ( including ur job)
268. What is hash file ? Types of hash files.
-> used for look-ups it is like a referance table

-> it is also used in-place od odbc,oci tables fo better performance
-> two types, static hashfile,dynamic hash file
269. If ur are doing any changes in shared-container will it reflect in all the jobs wherever
you used this shared- container?
-> ya
270. have u written any custom routines in ur project? if so explain?
271. how to make read-only routines in datastage?is it possible if so how?
-> ya one aption is there
272. what are different field operators in routines?
273. How do you get log info into a file?
275. What is isolation level and when do you use them?
-> Transactions not only ensure the full completion (or rollback) of the statements that
they enclose but also isolate the data modified by the statements.
-> The isolation level describes the degree to which the data being updated is visible to
other transactions.
276. What is before job subroutine/After job subroutine? When do you use them?
-> Before-stage subroutine Contain the name (and value) of a subroutine that is
executed before the stage starts to process any data.
-> After-stage subroutine Contain the name (and value) of a subroutine that is
executed after the stage has processed the data
277. How do you clear the lock of a job?
->ds dirctor jobs cleanup resources
278. How do you backup and restore the project?
-> using datastage manager ( datastage web sphered 7.5px)

-> using datastage director ( datastahe information server8.1)
279. What is Clear Status File and when do you use it?
-> ds dirctor-> job -> clear status file
280. What is Cleanup Resources and when do you use it?
-> ds dirctor jobs cleanup resources ( we use at when job locking)
281. can i join a flat file and oracle and load into oracle?is this possible?
-> ya
282. how can u do incremental aggregation in datastage?
-> may be using change data capture (cdc)
283. while loading some data into target suddenly thier is a problem loading process
stopped how can u start loading from the records that were left?
-> startin at error stoping( using restatble check points)
......................
284. what are the general problems that u face in datastage?

-> if transformer used check c++ compiler
-> if buildop errors try bulidop from command line
-> chek enveronment variables settings
-> examine genrated osh
-> some stages not supported rcp
-> null handling errors...........
285. Is it possible to rollback a set of jobs when after some jobs are executed, if the latest
job fails?
-> no
286. Could DS generate data dictionary of source database and target database ?
-> ya
287. What are the various reports that could be generated using this tool ?
-> usage analysis
288. Could DS show record length, total db size for source and target based on existing
source data to arrive at target
database sizing?
-> no
289. Does DS support data scrambling?
-> ya
290. Compare ETL features of DS with that of its competitors ? Chart is essential.
-> yes parallel
291. Does DS generate Audit Trail?
292. What other reports could DS provide apart from Impact Analysis Document ? Is
customizable reporting feature available?
-> ya (one option is there)
293. Could DS generate test cases to verify the veracity of mapping etc at design time to
validate the mapping and transforms?
-> ya
294. Is there any mechanism available in Transformer Stage to show the left out
nand un mapped fields from source and target stages?
-> RCP
295. Could DS handle object relational database table structures?
-> yes
296. Does Meta Data export of Copy book and that data in the same available in DS ?
-> yes
297. How is data compression handled in DS ?
-> compare stage
peter.zeglis@ascentialsoftware.com
Learn it from ValueCap. They are ex-developers of PX. They can conduct on-site px
course (with all bell and whistles).
lucy.luzza@ascentialsoftware.com
CSC (Hyderabad) – 01/06/07.
298. How you are getting your source?
-> flat files
299. How many routines you have made so far?

-> 4
300. How will you connect to ODBC?
-> computer controle pannel -> admin tools -> data source -> system dsn-> dsn
name,tsn name, pwd
301. Is ODBC is used to connect the database or to store the data?

-> ya
302. Your output from ETL is used to analyses the data for others? So, what type of
data ware house it is?
-> enterpize datawarehouse
303. Your project is slowly changing dimension, then how frequently you are getting
your data? i.e weekly/monthly
-> weekly or monthly for user rquirement
304. What are different source systems?
-> flatfiles,oracle
WIPRO (Chennai) – 01/06/07.
305. What is surrogate key?
-> A surrogate key is a system-generated (non-meaningful from a business

perspective).it specified by uniqness
-> It is the system generated key which cannot be edited by the user.
-> It is the primary key of the dimension table in warehouse.
-> It is nothing but the sequence which generates the numbers for primary key
column.
306. What is the difference between primary key and surrogate key?
-> primary key is combitnation of uniqe and notnull.

-> surrogate key is system genrated seqvance key
307. Where we use the primary key and where we use surrogate key?
-> primary key is combitnation of uniqe and notnull.( ex: empid,customerid....)

-> surrogate key is system genrated seqvance key ( suppose the primary key repeted
in many times we genrate surrgote key)
308. How you get the timestamp in date type in Oracle?
-> SELECT CURRENT_TIMESTAMP FROM dual;

-> TIMESTAMP datatype which stores all the information that the DATE datatype
stores, but also includes fractional seconds.
309. What is multiple instances?

-> same job run in simultaneously on the same computer.
310. How you compare two tables using multiple instances?
-> If your goal is to compare two tables, send the output of a select * from my_table
order by my_key to a file
for each table and, then, use a compare utility (like UNIX diff).
311. How you control jobs in data stage?
-> ya, using job properties

-> job control lanuage it is used to run more number of jobs at a time with or
without using loops.
312. What is the use of parameters?
-> Dynamic value passing at runtime(while running of job)

-> so ur job wil be dynamic take that parameters and do perform the work.
313. How do you give the parameters? Is it by hard coding or passing?
-> passing
314. If the data in text file, is you use to hard code the parameters or pass ?
315. How you pass the parameters?
-> IN JOBPARAMATERS AT RUN TIME
316. How you trigger the job in windows?
317. If the data is in 10 files and to load in table, which stage will you prefer?
-> same metadata (SEQ FILE SET)
318. What is the difference between dataset and sequential file?
SEQFILE:
-> excute seqvance mode( read and write also seqvance)

-> stored capcity : 2gb stored in hard disk
-> did't support lookupfile
-> support any extention like .txt,.csv....
-> performance less compare to dataset
dataset:
-> excute parall (read and write also parall)

-> srored capacity more syored in logical memory
-> support lookup file
-> support only .ds extention
-> performance high compare to seq file
319. How can you see the dataset in UNIX and windows?
-> orchadmin dump <dataset>.ds
320. How can you delete the dataset in UNIX?
jobs
321. What is the function of shared container?
322. You have the sequential files in funnel stage, the records of first file has to come
first in target
and second file records next and finally third. How you can achieve this?
-> seqvance funnel
323. What is the difference between remove duplicate & sort stage?
-> REMOVE DUPLICATE: just remove the the duplicate data based on key
-> SORT: just sot the data(ascending or desending) based on key. remove duplicates
also posible in sort stage.
324. Is it not possible to delete the duplicate in sort stage?

-> no

DS Faqs

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DS Faqs

Uploaded by

Copyright:

Available Formats

TCS

4.which stage gives good performance?

-> dataset stage

5.what is the diff between dataset and sequentical stages?

1.extract/load from/to seq file.max==2GB.

1.used as an intermediate stage.

8.what is the funnel stage?

9.what is the diff between joner and merge stage?

10.what is the diff between joiner and lookup stage?

13.what is the diff between normal transformer and basic transformer?

14. what is the diff between parallel and server jobs?

-> teradeta api stage

16.what is the multinode concept in teradata?

17.what is the slowly changing dimension?

18.how do u capture the changes in slowly changing?

19.what is the surrogate key?

-> A surrogate key is a system-generated (non-meaningful from a business

20.what is the diff between narmalized model and dimensional model?

21.what is the star schema?

22.what is the ods and edw?

-> ods mean Operational Data Store

23.when do u use ods and edw?

25.if i want to delete dataset file in the unix, how do i do that?

-> YA, ORCHADMIN RM <DATASET NAME>

26.how do u schedule the jobs?

-> using datastahe director-> tools

27.how do you scedule the jobs in job sequence?

-> 1. Open Datastage

Reasons to use Buildop include:

-> Speed / Performance

30.In datastage, which language code will be used?

-> OSH(orcistrate shell lang)

31.why do u use copy stage? can do sort in copy stage?

-> to copy tha data in multiple targets

32.in what other stage we can do copy?

33.what is sort stage? in what other stage we can do sort?

-> soting(ascending or descending) the data based on key.

-> ref: OSH : Archestrate shell script

-> modify column types

36.what is column generate stage?

-> runtime column propagation (RCP)

-> runtime column propagation (RCP)

39.diff between lookup,join and merge stage

40.can we track rejections in join stage?

41.why do we use funnel stage?

-> oracle enterprize(parell and easy connct)

43.How do we reject the records in sequential file stage?

44.In what other stage we can reject the records?

-> to select aportion of data for checkng

46.what is difference between job parameters and environment variables?

47.what is the aggregator stage?

-> which is use to calculate the summrized for a group og records

48.what is shared container stage?

-> ok condition,failed warning,custom,userstatus,unconditional

51.what model we are doing here?

-> one transformer,one lookup,two targets

55.what is the teradata version you are using?

57.who will create environment variables?

58.how do you register a plug-in?

60.how do the jobs run in sequecer? parallelly or sequecially?

62.what is the difference between star & snow flake schemas?

63.do you know narmalization techniques?