Usefull Stuff Datastage

Q1.
Hi,
I have numeric values embedded with string values for example:
tokyo(231),sfo(890) etc I want only numeric values from the strings.
For tokyo(231) output should be 231 For sfo(890) output should be 890 do we have
any function or a method in DataStage 8.1 parallel job developer to achieve this?
SOL: Hi Rajish, If all your data have the same format with what you give below, like
all the numbers are being placed inside ( and ), then you can use FIELD function.
Data:
tokyo(231)
sfo(890)
FIELD(ColName, '(', 2) - the result would be:
231)
890)
to remove the ), we can use the convert function, so the final code would
be:
CONVERT(')','',FIELD(ColName, '(', 2))
Output:
Hi Rajish,
If all your data have the same format with what you give below, like all the
numbers are being placed inside ( and ), then you can use FIELD function.
Data: tokyo(231) sfo(890) FIELD(ColName, '(', 2) - the result would be:
231 890
Please note that this is only applicable if all your data has the same
format with what you provided as sample data. All numbers are enclosed in ()
and after the alphabet characters
2SOL) Use nested Convert() functions; the inner one returns the original string with
numeric characters removed, the outer one removes all those from the original
string.
Convert(Convert("0123456789", "", InLink.TheString), "", InLink.TheString)
2Q)I had a file which is split into 8 parts due to size constraints. So I want to read all
the 8 parts into a single into a single tables by using a single datastage job. Can
any one please assist me?
1SOL) Hi To read all files into single file define following path in seqfile.
file1,file2,file3*.jtxt
Hi, Solution for this problem is
To read all 8 files into a single job Take one Flat file and mention File name like This
below
Ex: File is split into
File1,File2,File3,File4,File5,File6,File7,File8
Now give it like this in file Name: File?.txt It reads all 8 files in a single job.
2SOL) You put your 8 dataset files/sequences files in one datastage job and then
you passe all 8 output link into a funnel and from funnel you link out only one link
out which contains all your 8 files
3SOL) Or use a single sequential file with File name in Pattern mode (all parts have
to follow a pattern in their names)
3Q) Hi
I have a requirement where I have to fetch few files from a folder of similar pattern
and having same metadata and then need to apply some transformations on it and
then I have to produce same no of files fetched from the source folder and place
them in target folder.
Ex: dir1 has 210 files and i need to fetch some 50 files of similar pattern ( which I
specified in my seq file file pattern stage property) after applying transformation it
need to produce 50 files in target folder I have problem in splitting individual files
which are being read from input folder after the transformation is done (no of files in
i/p folder are not fixed and no of records in each i/p file are also not fixed ) can
anyone post me a solution for this
3SOL) . First read all the files that are to be processed.

2. Assign Unique ID or number to each file.(sequential).
3. Design a job. 4. Using loop activity read each file and process. 5. Every time one
file is processed and generate output for that file only.
4Q) i am trying to design job sequence to about the job status details, I am taking
one job activity and three outputs for that job activity in job activity triggers, I am
giving failed for first link, ok for second link ,running for third link, but it is getting
only success out put but when job fails it is not triggering to failed link , is there any
order to follow to give trigger conditions
4SOL) Did you add terminator activity or Exception handler?
Trigger your failed link to Terminator activity. It will work or create a exception
handler to terminator activity that way will also work
2SOL)didn't understand what you need, but trigger depend on business
requirement or your pre-requisite jobs
You can define from 1st job to 2nd, 2 triggers,one for ok go to next one and the
other for fail which goes to TERMINATOR Activity stage which will fail the job if your
job fail and you do the same for all your job activity, the last one(job activity) will
have only one link for fail
5QUES) Hi all, I have a file with fix length record, does anybody have an idea, how
to read that in ds parallel job ? I have 2 solutions, but not sure it's proper. 1) I have
created server job to read fix length file and converted in to pipe delimiter. 2) I read
record as a 1 column and then separated each column using substring function. for
ex. all_rek[1,10] = name. so first 10 character will go into name column.
5THSOL) You already have the solution. As you said you use your transformer to do
it ,by substring function as you did in a example that s the way I would choose.
Even if you like to try other way, you can use a column import instead of
transformer :6QUES) I had a file which is split into 8 parts due to size constraints. So I want to
read all the 8 parts into a single into a single tables by using a single datastage job.
Can any one please assist me?
6SOL) Hi To read all files into single file define following path in seqfile.
file1,file2,file3*.jtxt
A)Solution for this problem is To read all 8 files into a single job Take one Flat file
and mention File name like This below Ex: File is split into
File1,File2,File3,File4,File5,File6,File7,File8 Now give it like this in file Name: File?.txt
It reads all 8 files in a single job
B)You put your 8 dataset files/sequences files in one datastage job and then you
passe all 8 output link into a funnel and from funnel you link out only one link out
which contains all your 8 files
7QUES) Hi
I have a requirement where I have to fetch few files from a folder of similar pattern
and having same metadata and then need to apply some transformations on it and
then I have to produce same no of files fetched from the source folder and place
them in target folder. Ex: dir1 has 210 files and i need to fetch some 50 files of
similar pattern ( which I specified in my seq file file pattern stage property) after
applying transformation it need to produce 50 files in target folder I have problem in
splitting individual files which are being read from input folder after the
transformation is done (no of files in i/p folder are not fixed and no of records in
each i/p file are also not fixed )
can anyone post me a solution for this
7SOL) 1. First read all the files that are to be processed. 2. Assign Unique ID or
number to each file.(sequential). 3. Design a job. 4. Using loop activity read each
file and process. 5. Every time one file is processed and generate output for that file
only.
A) Hi,Read all the file from folder. use a generate file_name property in the seq
file stage. do the transformation based on business rules. then split the file
based on file_name column. will give you same no of out put file as input. it
will only work when you have a data in each file.
7Q) Hi Pals,
This is the issue to solve.. Super Batch - Remove parameter (pEssbaseMonth) from
INI and move to the ETL job Our standard is to not have the dates (pEssbaseMonth
and pEssbaseYear) pulled from a parameters file, but loaded/entered from the Main
Batch itself.
Q)Hi Sandy.. sample job control in job parameters:
Attach and run the following jobs in sequence: * 10 - Batch::OPC1000CreateHash *
20 - Batch::OPC2000Unload * 30 - Batch::OPC3000PostLoads
******** Assign Parameters ******* DEFFUN GetParameterArray(A)
Calling"DSU.GetParameterArray" DEFFUN GetParam(A, B) Calling "DSU.GetParam"
DEFFUN krNotifyEvent(CD,FM,PF,PN,JN,JD,JR,JF,IS,MT,UMT,DT ,MI,JI) Calling
"DSU.krNotifyEvent"
Deffun DSX.DATEGENERICTOTIMESTAMP(MyDate, Num)
* Initialize Variables *
vEventDetail = ' ' vPartitionStart = 190000 vPartitionStop = 190000
SOL) *---* * Set all job parameters from specified ini file into ParamList *
*---* ParamList = GetParameterArray(pParmFile) ; * get parameters from the .ini file
If ParamList = "ERROR" Then Call DSLogFatal("Error occured getting parameter
array from file - " : pParmFile, "OpPerfCube") End pEssbaseYear =
GetParam(ParamList, "pEssbaseYear")
pEssbaseMonth = GetParam(ParamList, "pEssbaseMonth")
pDwhDb = GetParam(ParamList, "pDwhDb")
pDwhUserid = GetParam(ParamList, "pDwhUserid")
pDwhPasswd = GetParam(ParamList, "pDwhPasswd")
8Q) Please let us know more information,
Eventually you have a column as integer or decimal and u are sending a character
which is not correct or if you are sending a integer and you have decimal in your
target then the format is not correct, you have had '.' a point. For the second
suggestion is concerning if you are using 7.5 you will not have this issue on 8.1. Just
let me know more information about your job and all warning you get so then I can
solve it more easy for you.
I am getting a warning in my job while converting string to decimal : Numeric string
expected, got "-1 ". Use Decimal default value. Can anyone help me how to avoid
this. I need sign value in decimal field.
1SOL) Hi, in source i/p format is 12-june-2009,but i need in target like this format
12--06-2009,How can we change I tried like this... i have used transformer stage in
Stringtodate function,but i didn't get corrct o/p...defalut i am getting system o/p.....
What is actual process 2) Hi, I think ICONV or OCONV internal functions can be
usedHI , We can use *Stringtodate* function in stage variable then we can reverse
the stagevariable to desired format . testdate=StringToDate(DSLink2.Date,"%dd%mmmm-%yyyy") finaldate=testdate[9,2]:'-':testdate[6,2]:'-':testdate[1,4]
9Q) I have a some data where the State column may have a State spelled out or an
abbreviation Or the state column may contain a country. So I do a lookup in a
country lookup to see if the State is a country and place it in the country column. I
also do a State lookup to see if the State column has an abbreviation like PA, NJ and
then I pul the state.. Here is the problem. The state lookup table has an integer to
lookup based on a number it also has the State Abbreviation and State spelled out.
The Abbreviation column is 2 Char. I use this to see if any rows in the State Column
of my data contain a state abbreviation. What is happening is When "Nepal" comes
in as the state and looks up against the abbreviations it Matches on NE and so I am

getting Nebraska as the state when I don't expect it. Despite the fact that Nepal
does not equal NE. I have tried changing the metadata in Datastage so the
Abbreviation is read into a 5 character field and it does the same thing. It still thinks
Nepal is equal to NE Anyone have any idea what is going on here?
9SLO) Nepal will not match NE unless you take only the first two characters 2)
Yeah, that's what I would have thought but Datastage is not doing that. Datastage is
only looking at the first 2 chars I guess because that's the length of the lookup key.
But I changed the lookup key length to 5 and it still messes up.I got around this by
putting anything in the state column len(State) < 5 into a different field to use for
my look up. It works fine. I can't figure out why Datastage is acting as expected but
I see strange things like this all the time.3) It is the problem with field width check
ur field width it is taking 2 only thats why this problem coming
10Q) Hi, All I have a 10 files, which is created by datastage. basically this is reject
file from 10 diff jobs running in 1 sequencer. now i wand to check for the data in the
file and if data found then i want to email this file. how can i achieve this rek.
10SOL) You can do it in unix or in Datastage. In can loop through the list of all file
names. And do the following. if test -s "$i" then mail -s "Subject" email@removed <
$i fi
2)Hi Rocky, Anyways in order to send an email initially please configure the email
server in your Datastage server. After which you can configure it to send email from
Datastage or from Linux server. would write a routine to check if the file has
records, if it does, then create another routine to send the email with the
attachment. You might be able to use the email stage but if not you can use the
basic command DSSendMail() which you can customize and send the attachment
that way....
11Q) Hi, is there any limitation in datastage when passing environment variable as
parameters in Execute command activity stage in a sequencer, in my job i am using
execute command activity stage to fetch some values from a file and the command
i am using is "cat dir/subdir/filename", when i hard code the entire path its working
fine , but when i pass environment variable to define my directory and sub directory
, it throws an error , is there any other way of acheving this
11SOL)There is a specific place holder for parameters. If that does not work, you
can even provide in command option itself. Do a cross check on the value that's
been passed on to the parameter.
1) You can verify from the Director log. What is the error you are getting....cross
check the value of parameter. It should be same as that when you hard code
it (which in your case is working fine).
2) Hi , I have cross checked the values of parameters ,there is no issue with the
values. I get the following error "Activity_execute-command:Executable Path
cannot contain parameters" I get this error when I compiling my job , as I said
earlier its working fine when I hard coded the values mentioned in
parameters.I am using DS 8.0.1 and Win 2003 OS
3) You can use "User Variable Activity" - define a user variable and set its value
equal to parameter. Use this user variable in execute command activity
12Q) Hi All,
I always use Transformer stage for adding new field with initial value (eq: adding
CreatedDate column with CurrentTimestamp() in each of record). But after doing the
stress test, I've found that this could be problem for the performance. Is there other
way to do the same without using Transformer? I'd tried to use Modify Stage but
unfortunately it can't be done if not derive from InputColumn.
SOL) You can use also column generator. You add new column eg:mydate and you
give it your current date.For modify stage you have to define in your output your
new column manually and it will work, you do not need to map your input to your
output, just give the same name to out put as your specification name, it should
work
If I use column generator, do it will automatically combine with the original record??
I will check it out, thanks for the enlightment ;p As as I know, copy stage can only
add new column with the existing value, not with new valueYes Arash, I'd tried it
with columngen and it works as I want ;) But still it only accept static/hard code
value in generator value. Is it possible if I want to get value from Job parameter or
routine?If you target is DB, you don't even have to deal the CreateDate in
Datastage, rather you can handle in Data Base itself. Which will have even more
appropriate timestamp.
13Q) UNCLASSIFIED// I have dates like this: 2011-02-21 that need to be converted
to timestamp. What is the best way to covert this date?
SOL) You can concatenete do this: "2011-02-21" : "00:00:00.000"
Oconv(@DATE,"D-YMD[4,2,2]"):'-':Oconv(@TIME,"MTS")
StringToTimeStamp(DateToString(2011-02-21,'%yyyy-%mm-%dd'):'
':'00:00:00',"%yyyy-%mm-%dd %hh:%nn:%ss")
14Q) Good Day everyone,
I have the following data that requires me to validate that every MV comes either
after an SN or an other MV. what would be the best way to validate that ? Is there
anyway to define a tempValue what will hold the value in the previous cell in order
to compare the value in the previous cell with the one in the current cell? SN ,MV
,MV ,SN ,MV ,VN ,VN ,VN ,SN, MV ,VN VN VN SN MV MV MV MV MVVN VN SN MV MV
SN MV MV SN MV SN MV MVVN VN VN SN ,
SOL) Yes. In transformer, you can use stage variable to hold the previous value and
check for the current records value 2) PrevValue=CurrValue CurrValue=Input.Field If
CurrValue='MV' And (PrevValue='MV' or PrevValue='SN') Then "logic1' Else 'Logic2'
Thanks Deepak, but I'm still have a little bite of confusion: by doing
PrevValue=CurrValue CurrValue=Input.Field It looks like both PrevValue and
CurrValue will always have the same value which is the Input.Field right? if that's the
case, I'd be comparing the same value over and over which will result in true . Can
please explain a little bit?
Hope the following clears your doubt. PrevValue=CurrValue If CurrValue='MV' And
(PrevValue='MV' or PrevValue='SN') Then "logic1' Else 'Logic2'
CurrValue=Input.Field Either case, while assigning the PrevValue, the CurrValue will
be holding the value of previous records value. Later CurrValue will be assigned with
the Current records value. The CurrValue will hold its value until the next
assignment, which is only at next record. Let me know, if you need any further
explanation
Thanks so much Deepak for your explanations. I tested it , it works. I'm new to using
this tool, I have a lot to learn, and I appreciate everyone's help on this forum
15Q) Hi,
I need to run a multiple instance Job 7 times simultaneously(in parallel).I know of
two methods to accomplish this 1.A script using dsjob command and a loop. 2.A
Jseq containing as many job activities as the number of parallel run of the job i
want. Just want to know if there is any other way to accomplish this ,preferably not
using a script.
SOL) Hi, I think you can go for a sequencer by which you have a start loop and end
loop and place the job activity and give the details about the job .I hope it works
fine(You can use loop in your stage It's more easy) Can looping activity be
independent? Is that what you mean to say? Looping will not be simultaneous, it will
be one at a time. You would be better off using sequencer to manage. Make sure
you follow the rules for multi-instance jobs and parameterize file names, etc
As I know, you can create your single job and active multiple instance, and then use
it in your loop and run it, i had the similar job before and it worked fine, You can use
different parameter and run them independently as you hopelet me know if you
have more question and if you would use it :-)
16Q) I don't understand why my where clause is not working: NAME_ONE like 'Wells
%' the job goes to completion but my records are getting rejected. Hi friend I Tried
the same thing in my pc . It's working fine .....Could you please provide some more
details.... A) Thank you for responding. I have found out that the where clause in the
filter stage is case sensitive. And my file is in all CAPS. ie, NAME_ONE like 'Wells%'
doesn't work, but NAME_ONE like 'WELLS%' works. Do you know if there is any way
to not have it case sensitive? Use UPPER() or LOWER() function on Name_One and
put the value accordingly in the LIKE Hi,use upper() or lower() to check it out once.I
have tried it and its working fine. Yes, you can use uppercase in yr transformer and
then do your filter As long as Wells is not WELLS you should be ok
ANS) By applying a conversion to Upper case to NAME_ONE then applying the like
'WELLS%' will satisfy even a column with mixed case names. Chances are all names
are converted to UPPER case before they are written to the NAME_ONE column. If
they are not all UPPER case and you use the UPPER(NAME_ONE) like 'WELLS%' any
index on NAME_ONE could be nullified. Check out the table and see if all NAME_ONE
columns are upper. Hello good morning Your table table name is CAPS.......In
Database table values should be case sensitive. (eg: emp table,dept table.....in
oracle)
17Q) I am getting a warning in my job while converting string to decimal : Numeric
string expected, got "-1 ". Use Decimal default value. Can anyone help me how to
avoid this. I need sign value in decimal field.
SOL) You got this because you are sending a character to a numerical type such int
or decimal, The other reason could be, because you are Sending int to decimal then
u have to send the correct format, mean . is missing ( this happened only in 7.5 and
not in 8.1) Let me know more about your job and your warning log to help 2)
youNope , I am getting warning when I am trying to do it from CHAR to decimal but
this warning is not coming in case of varchar to decimal.3) Just check the length of
char and decimal, as char data can be padded with default value which is not
accepted by decimal type . This can be one reason not sure4) saurabh, it is the
same as i said before, and that could have some many reason, but i just gave u
some example, and here is the example of ur case :- ) if u are doing a transypage
for a date or decimal etc.... it s always advised to do it from varchar, because when
you use char, for the null character Datastage interpret it on Linux as a Character it
s why u are getting this warning5) Try to trim that char field while moving to a
decimal field6)I already did trimming and all before moving it to decimal field.
Anyways I was developing it on 7.5 version , while I moved to 8.1 it is resolved
18Q) I have data coming in where the LastName column may also contain a Suffix
like Jr or Sr or III If the last name has a suffix it would be separated with a comma
like ... Smith, Jr I need to extract the Jr or Sr or whatever it is after the comma and
put it in another column called suffix I also have another set of data where the users
did not use a comma and those last names look like Smith Jr The last name column
may contain a second middle name then the last name then Jr or Sr... Like Lastname
= 'Doe Smith Jr' where the name is John Does Smith Jr In This case I need to extract
the Jr or Sr or whatever it is at the end of the name and put it in another column
called suffix I am using Datastage 8.1 on Windows in a parallel job Appreciate the
help Thanks
SOL) First check for comma and separate the Suffix using Field function. If you don't
find any comma take the last field based on space as separated and check for pre
defined values. (Jr, Sr, II, III? etc). Pre defined values need to be built by you for
accurate results. You can always use the quality stage with European Names to
segregate it Deepak, thanks for the help but can you show me the syntax for the
Field function or were to find it. I have not used the field function and do not see it
on the Parallel guide.Count(<Input.Field>,<Delimiter String>) - This gives you the
total count of the Delimiter.
Eg: Count(Input.Field,",")
Field(<Input.Field>,<Delimiter String>,Occurrence) Gives the field based on the
occurance
Eg: Field(Input.Field,",",1) gives you the 2nd field of the input based on comma.
Deepak, I have finally tried this and got one file correct. The file with name like John
Doe, Jr I managed to separate the Jr from the name with Field(Lnkq.lastname,",",-1)
that gets me the last name and Field(Lnkq.lastname,",",2) get me the suffix I have a
second source that does not contain a comma so its like John Doe Jr And some
names have multiple First and last names Like John Alexander Smith Doe Where
Smith Doe are in the last name field other times there is like 3 names like De La
Garza I have not been able to figure out the Count function to get me where I need
to be and also I need to make sure I am pulling out Jr or II or III and not the last part
of the last name I'm thinking I need a stage variable to get the Count on the
occurrences of the spaces and then check to see if the last string in the name is Jr,
Sr, II or III does that sound right.
Good practice is to use a Information Analyzer to find out the pattern of the last
names. And pull out all the set of possible Suffix that you need to pull put and have
it as a lookup or if the count is small enough, hold it in stage variables to do the
transformationsThat's all good but I can't seem to remove the Jr, Sr I,II or III because
all the delimiters are spaces
B) Lets do it again. :First consolidated the list of Suffixes. Load it in to a lookup
table or in stage variables. Separate out the last field based on space. Since
all the words in last name can be separated out based on space even if they
have separated out of comma. Do a lookup with the consolidated list. If you
find a match, get the substring of the name till the last space. Then trim put
the comma if any. If you don't find any match, then pass it on as last name
itself. You know to get the last field based on "Field" function. And you know
to count the total space available in the Name using the "Count" function. Let
me know if you have any difficulties executing it.
19Q) Hi All,
I have some .csv files with headers in a folder and I need to remove the header from
.csv files and convert it to .dat format and add an extra column with a constant
value 1 to it.
Ex: i/p : "abc","abcdefgh" -- (sample record excluding header) required o/p:
1|abc|abcdefgh Need to do this with datastage or any unix script is also helpful
Experts need guidance on this
SOL) I am not an expert in Unix/Datastage but I think you should check for a 'nawk'
script to get it done, 2) With a sequential file reader, set your input file name (under
Source) and set the option First Line is Columns Names to true Use the format tab of
the sequential file reader to indicate the field separators. I would recommend using
a schema file to define the file. Link the reader stage to a transformer stage. Create
your new column in the output link of the transformer and link the transformer to
your output stage, whatever you decide to use
3)And in the output Sequential file stage use the Pipe as delimiter in addition to
what Joe said. Joe, Why do you prefer schema file for any constant files?
4) Deepak, Here is not recommended a schema file, But if you have a big file to
extract, it is faster to read it from column import and using a schema file( i m
talking about performance here)
5) May I know, in what way you think the performance will be improved (Unless you
want to skip few columns to be segregated)? Sequential file will have an column
import operator by itself
6) am not ok with Joe, but regarding my reply, it s what i saw in redbook :)
7) My suggestion of using a schema is my personal preference. Everything has more
than one way of getting the job done. Guidance was requested and we don't need
egos getting in the way of a possibly helpful suggestions. So, Arash, it's a shame
you are not OK with me. Maybe you really meant something else?
8) :) Its not about Ego my friend!! Its all about optimization and performance.
Schema file is generally used when the schema is dynamically be changed or likely
to be changed. Else it would have an additional overhead of accessing the schema
file and parsing it for every run
9) greatn i see you are still following this post,

i am not saying you are wrong , i just suggested what was written as performance
from IBM, yes, you are right there is the different way to create a job, but i just give
some idea for performance , if it s not interested forget it
20)Q I need to implement the below logic in Datastage, If only 1 record is read with
a particular GRP-ID,then its TYPE is changed to RA. If more than 1 record * is read
with the same GRP-ID, then they aren't change. How do I do this, any help really
appreciate.
SOL) Not much clear on your requirement.
Is your requirement to change the TYPE based on the GRP-ID? If so, use sort stage
to add the ClusterKeyChagne column based on the GRP-ID. Sort it again by
Descending based on the GRP-ID and ClusterKeyChagneColumn. If you get two
continues "1"s then its RA.
1) Hi Deepak, Thanks for the reply, now my out put looks like below My output
now looks below GRP-ID Keychange 50025 1, 50026 1, 50026 0, 50027 1
How do I filter both 50026 records as I don't want to set RA for those 2 records, but
have to set RA for other records
ANS) Hi, use aggregator stage and group by the GRP-ID key. Also set the
aggregation
type as 'count'. at the output of your aggregator stage, you will get only 2
columns.....
GRP-ID and Count. for getting rest of the columns you need to do something
like this, a copy stage which will have 2 output links...one going to aggregator satge
and the othe to join stage. output of aggregator stage will be given as one of the
inputs to the
previously mentioned join stage...the join key(inner join) being GRP-ID. (please see
the attached image)
output of this join will then be given to a tranformaer stage where the actual logic
will be implemented. in the derivation of the TYPE column, use something like
this.... if count = 1 then [whatever you want to pass in the TYPE column] else 'RA'
Please let me know if this works for you
Hi As you see when you have 0 after sorting That mean you have more than 1
records So then you can make your change!
Good that you already got it fixed. But anyway, the solution with your previous
approach would be, to sort the records based on KeyChange in opposite order. So
that your 0's comes first. Now, For those records whose KeyChange =1, you need to
use Stage variable to check the previous values to see if its "1". If so then you need
ot assign 'RA'.
21) Q Hi, My requirement is I need to get the output in a field as
filename_currenttimestamp.dat.gz in the derivation of transformer I have given
'filename_':currenttimestamp():'dat':'gz' am getting the output as filename_201103-09 14:32:44 but my requirement should be this way filename_20110309-022941
which am unable to get.. even i have tried to concatenate with the parameter
currenttimestamp but this is not possible Can any one help me in this regard ASAP
as its urgent!!!! thanx in advance
21SOL) Use TimestampToString function to convert its format
If you are using datastage server, in transfomer server you can do it easily by
iconv/oconv
but if you are using the datastage px do as deepak say in above You transtyp it to
string and then you use convert('-','',yourstring) it should be OK
22Q) Hi , I am working on datastage v 7.5.3 I have 2 jobs Jb1 and Jb2. In Jb2 I have a
routine which reads the record from src1 sequential file. And the same file is source
file for Jb2 job. When I execute Jb1 and immediately Jb2 is executed , both jobs run
simultaneously and fine . But when I execute Jb2 first and the Jb1 , it aborts saying "
the process cannot access the file because it is being used by another process" . In
the routine I have used OPENSEQ and READSEQ command. Is there a way through
BASIC command to open a file in some mode so as other process can
simultaneously read from it .
23Q) get the following warnings, Conversion error calling conversion routine
date_from_ustring data may have been lost what's the reason for such a warning ?
the use of stringtodate or some other reasons
23SOL) There could be many reasons. As you have already posted many similar
post over the past, now you should be able to judge by yourself that, it could due to
the reason that, the String data that you trying to convert to Date is either not
matching to the pattern that you have provided in the function
Deepak are right, if you do not try to find some issue by your self you will never
learn it, it is good to help but before posting do some investigation by your self
normaly if you are transtyping a string to a date you shoud get your data like YYYY-
MM-DD and if it is not the case you will get such warning the other thing is if you are
getting the data as i said in above YYYY-MM-DD but the date is wrogn ex: 2010-0231 ' which does not exist in real time) you get that warning
24Q) Hi All, My question is how to remove a duplicates in record. for example the
input column having string like 00A00B00C00D01A01A01B01B00A00B i want the
output as 00A00B00C00D01A01B Here you have to take two numeric and one
alfabat as one value EX:00A How can we design the code to sort out the above
requirement?
24SOL) This is on similar lines what Deepak mentioned. This worked for me. Try it:
Define stage variables and their derivations in following way: a - DSLink2.input[1,3]
b - If DSLink2.input[4,3] = a Then a Else a:DSLink2.input[4,3] c - If
Index(b,DSLink2.input[7,3],1) > 0 then b else b:DSLink2.input[7,3] d - If
Index(c,DSLink2.input[10,3],1) > 0 then c else c:DSLink2.input[10,3] and so on..
Let us know if any error you get
25Q) Hi All,
We have created a sequence job which has 3 jobs in it. What I want to do is that if
any data is rejected and written into the reject file in the first job then the second
and third job should not be executed
25SOL) What you are using to reject the record ? if you are using a Transformer
then in the contraint you can abort the job after the first row. Now when the job is
aborted in the sequencer you can use the status for any purpose like the second
and third job will not be executed. We have reject file at lookup stage in the job.
What we have to do is 1. In case there are any reject records, then send these
records in a particular format via mail(For which we have written a shell script which
is run after execution of first job ) 2. Requirement is that - in case this mail is sent
via shell script then the next job in sequence should not be executed. Transformer
has "Abort after n rows" option
Use job activity to execute the Main Job. Use Execute command Activity to check for
the number of lines in the reject file. If the Number of line is greater than zero, send
a return code as "1" (just example) else 0. Use Nested Condition with
Custom(conditions) to check the return code, based on that, either you can execute
the Email Notification or the execute the other jobs
Completely agree with Deepak, "Use job activity to execute the Main Job. Use
Execute command Activity to check for the number of lines in the reject file. If the
Number of line is greater than zero, send a return code as "1" (just example) else 0.
Use Nested Condition with Custom(conditions) to check the return code, based on
that, either you can execute the Email Notification or the execute the other jobs."
In the sequence job, after your first job use a execute command stage and using the
"wc -l" unix command you can check for the number of lines in your rejected file.
And as said by Deepak, you can then decide whether or not to run the remaining
jobs in sequence depending on the line count that you get as output of execute
command stage
26Q) Posted by Rajesekhar on Mar 8 at 6:26 PM Hi, I am using DataStage Version
8.1, wanted to develop a parallel job. Here is the logic that I wanted to implement:
Db2 Ent Stage (Source) - This has date as a column in it. I have a reference table
Db2 Ent stage which I will use to do lookup and get few columns. I want to pass the
date of DB2 Source stage to DB2 Reference Stage and get the desired results. DB2
reference table will have a query which will use the date of DB2 Source stage.
Select S.Col1, R.Col2 FROM S.Tab1, R.Tab2 WHERE S.Tab1.Col1=R.Tab2.Col1 and
R.tab1.Date < S.Tab2. Date I can do all this in a single query suing a DB2 Ent stage
but I am dealing with huge data, I am not sure of the performance if I do so. Can
any one please tell me is there any better way to implement this?
26ANS) How big your Reference Data and source data is? Its is always better to use
the one single query in the DB2UDB stage and pull out the output and do complex
transformation after that. For the performance improvement, You need to make both
the Source and reference table has be properly indexed or portioned on the right
keys which you are joining.
27Q) What is syntax file? You me compiler? You'll need to map the compiler path to
the Environmental variables. You need to compile the routine using C++ compiler.
Create a Object file and link the object file from the Datastage routine. (If you
choose object method) You can check the Manager guide on how to link the object
file.
27SOL) Did u get this error during compilation or during runtime? Not able to open
1 file ?? does not mean that your file is not there !!!
Hi all , The issue got solved . Actually instead of providing directory path, I gave
directory path plus file name . But now , It gives error if I pass only one key (join
key) column in output out of two files .since the files are joined(inner join) on these
columns it shud allow to select any one key column from any one file . Hi divya, It
was run time error . Got resolved, the path provided Was for file, but required was
directory path ,And you don't need the other columns at all? What is the exact error
message?
28Q) Hi All, Is there anyone who could kindly tell me the main difference between
Tansformer Stage and Basic Transformer Stage? In addition, I'm using DS 7.5.2 and
Basic Transformer Stage appears not to be found in this edition
28SOL) Basic transformer used in server jobs only. It supports one input link, 'n'
number of output links, 'n' number of target links and only one reject link. In DS
7.5.2 there is no lookup stage this transformer stage behaves like look up stage. In
this basic transformer stage all functions, macros, routines are writtened by using
BASIC language. It is not supporting any parallelism and partitioning. This
transformer call in both server and parallel jobs. But in PX Transformer all functions,
macros are written in C++ language Basic Transformer is related to DataStage
Server only Features are Mostly common few advantages are more in Transformer
stage which is available in Datastage parallel extender like functions and calling
other stages.(As px having more stages) Transformer stages can have one primary
input link, multiple reference input links, and multiple output links. The link from the
main data input source is designated as the primary input link.
29Q) We are getting fields in a recod as below format :
1. All fields are seperated by | (pipe) delimiter. 2. First fields have three
different information(subfields), which are separated by /t (i.e. TAB delimited).
We can separate all the fields based on | (pipe) delimiere but how to seperate
tab delimited subfileds coming in the first-field. ex. Input : "3 Records rejected
due to Invalid CUST_ID value 5408| Operator|Address|21345|DELHI|
DELHI|....." and output is like : 3,Records rejected due to Invalid CUST_ID
value,5408,Operator,Address,21345,DELHI,DELHI,.... ."
29SOL) Are you meaning by '/t' ? can u give some example?
It could be done by transformer, but normally if it is more complicate we do it by
shell command/script and run it before running the job, Give more detail so we can
give you more help
Hi.. Its not /t it is tab delimited... Refer to first column of example Given in question
asked below.. > ex. Input : "3 Records rejected due to Invalid CUST_ID value 5408|
Operator|Address|21345|DELHI|DELHI|....." and output will be like : 3,Records
rejected due to Invalid CUST_ID
value,5408,Operator,Address,21345,DELHI,DELHI,.... ."
Hello,Srilal! You can use the 'Column import stage' to split the first field into
subfields. The delimiter should be set as 'tab'.
30Q) Hi All, I am using a sequence job which contains a number of parallel jobs. I
want the sequence job to get aborted, if the no records are loaded in the target
table specified in one of the parallel jobs inside. Is there any way in DataStage EE
8.1 to forcefully abort the job based on the resultset of a SQL query? We can check
the number of records loaded in the target table through After SQL and then how
can we implement it to abort the job, if the result of the query is zero? We are using
DataStage 8.1 in Windows 2003 server environment.
30SOL) Use the result in as a parameter in the query stage and based on the result
of query divert the flow to a terminator stage You can include a email notification
stage to monitor it by support.
1) 1.You could add a 'Execute command stage' in your seq,including a script or
batch file. 2.Make your query result on the target table as your exit code for
the script or batch file. (e.g. the exit code 0 means no record has returned)
3.Link your 'Execute command stage' to a 'Terminator stage' with the trigger
as "ReturnValue-(Conditional) =0"
2) You can use the Routine_Activity stage with: 3)RoutineName =
UtilityAbortToLog (Routine>sdk>Utility>UtilityArbortToLog) and Argument = "
Arbot message that you want to add in log"
3) For you specific case, this could be a solution:
4) 1-Create a job that count the number of record loaded and write it into a text
file "NameFile.txt"
5)
6) 2-in a sequence job link the following stage:
7) 2-1-call previous job (1-) through a Job activity
8) 2-2-then add an Executed command to read the text file with command cat
NameFile.txt
9) 2-3-then add a User_Variable_Activity stage to get the contain of the value in
the file
10)
2-4-then use a Nested_Condition with an output link wahen the
previous value (User_Variable_Activity) is equal to 0
11)
2-5-then add a Routine_Activity stage with RoutineName =
UtilityAbortToLog (in Routine>sdk>Utility>UtilityArbortToLog)
12)
and Argument = " message to add to log"
31Q) My input source date is in varchar(10), and like this: My input source
date is in varchar(10), and like this: 6/13/2001
31ANS) Use stringtodate function.
stringtodate('6/1/2001','%m-%d-%yyyy")Since I need to populate it into DB2
table, the format will be in YYYY-MM-DD,
32Q) I use stringtodate function to convert varchar to date. My source
data is like: 2010-12-01
My code is as follows: if DSLink418.Rehire_Dt =''
then '' else
Stringtodate(DSLink418.Rehire_Dt, "%yyyy-%mm-%dd")
error message:
APT_CombinedOperatorController(2),0: Conversion error calling conversion
routine date_from_ustring data may have been lost
32ANS) Might be causing Null values in your date field hence write the below
function it would work.
if DSLink418.Rehire_Dt ='' then '' else
Stringtodate(NullToValue(DSLink418.Rehire_Dt, '0001-01-01'), "%yyyy-%mm-?") Try
this.
33QS) Hi, I have a requirement to implement where in I have to use slowly
changing dimension stage.
We are missing price details for skus. For a store over months sku prices can
change. We loaded a historical table with all price details date, sku, shop and price.
We need to find for a sku how many prices are present and on what dates. I have
never used this stage so I am having trouble in implementing this
32ANS) Raj,
I would suggest you to use lookup stage (if your history table is not huge) or join
stage (if your history table is huge, in millions) .
Write SQL in DB2 stage: Select SKU, count(price), date From Product Group By SKU,
Date then do the look up on SKU (if lookup stage) or left outer on SKU and drop
records which is not matched.
I would prefer to use in database Join over any other thing. Hope I haven't confused
you. Moreover you can use the same SQL in SCD stage also.
33Q)
I have a data like this: MANHATTAN BASKING RIDGE, N NEW YORK, NY NEW
YORK NY NY PLAINSBORO. NJ I want to separate city and state like IN DATASTAGE
JOB. Can i do this?
ANS)I need to know if you just looking for just parsing or looking for cleansing as
well. You can do the earlier using the Transformer (May not even require, if you read
the file with delimiter). Later would need the QualityStage stages. And the first
records doesn't have the State name as NY and second record doesn't have the
proper State name as NJ and your 5th records doesn't have the city name at all.
Deepak, All the record is in double quote. So delimiter won't help. I want to just
parse the record. If there is no city then i want to set null same thing for state also.
and last record have fullstop instead of comma. so delimiter wont help
You can always remove the double quote by preprocessing the data. Eg using "tr"
command. But the real trick is, who are you going to identify, on what is missing
with the below two records. MANHATTAN NY
It could be either City or State. All
you can do is, just reject those records saying either one of Parameter is missing. If
you want to point out the exact item like State is missing or City is missing, then
either you need to do a lookup or do the cleansing using quality stage. For parsing,
you can get the substring of last two character without any space and compare with
your pre defined set of state list to generate the error report.
Hi Deepak, It is not at all a problem. You can try reading the complete record as a
single column from the file and you can either use a column importer or a
transformer to get your desired result.
Hi, By using transformer we can do this. In transformer use field function
Field('inputcolumn',',',1).
if i tried to use field function its giving me city name. for example. JERSEY CITY NJ
NEW BRUNKSWICK NJ ,NEW YORK NY ,WEST NEW YORK NY CAN ANY ONE HAVE
IDEA TO READ STRING FROM RIGHT SIDE AND USE FIRST DELIMITER ? WHICH WILL
GIVE NJ OR NY.
You can separate an inputfield into two fields, city and state, through a transform
stage as follows based on your example:
City is derived as left(inputfield, index(inputfield, ' ', count(inputfield, ' '))-1)
Count will return the number of times blank exists in the input field
Index will return the position of the last blank in the input field
1 is subtracted because you do not want to include the blank in city. State is derived
as ereplace(inputfield, left(inputfield, index(inputfield, ' ', count(inputfield, ' '))), '')
Ereplace will substitute a null string in place of the city data. This time the -1 is left
off because you want to remove the last blank as well.
34Q) We have atransation table,it contains Jan month transcation data and Feb
monthtranscation data(Current Month). would like to run the job in Jan month,it
should be display Jan month transcationdata Like that if i will run the Feb month,it
should be display Jan month + feb =2 months transcation data similarly if i will run
the March month,it should be display Jan +Feb+March = 3 monthstranscation data
etc.
What is logic for above queries and let me know how to create a job? For that job
how many targets are required? Can you please help in the above queries in
datastage?
34SOL) You can simply select the data with the where class of the current year.
Looks like you have the live transaction table. So all the every month you run, you
are expecting to have all the data. If you think, you might go back and do the back
data run, you can have the where class based on the Month. And passing the month
as parameter should be decided by your design team. And about the Target, its
totally depends on your business logic. You job shouldn't look complex as, the
condition can be achieved in DB query level. 2) Simple orcl->transformer>ds(or)any target in transformer if then else if monthfromdate(currentdate())
='jan' then jandata else if " " " 'feb' then jandata + febdata and so on upto
december in 7.5 or 8.0 it is possible
2) Inserting a filter (Where clause) while extracting from source would be little
more efficient than, filtering (Transformer or filter)it out after extraction.
3) In EE, and all parallel versions, it is always better to do as much in the SQL as
possible......the database is designed for the ultimate performance so it's not
just a little better/faster, it is the most efficient way to develop using
DataStage. It's best to do as much data conversion in the select statement
as possible, including using the where clause to select only the data that's
needed......only select columns required at the target.......by doing so, you
make each of the required steps in DS easier to code and manage.....and you
reduce the number of items on the pallet. Use SQL to it's fullest in the select
SQL and you will find there is much less demand on DS, increasing efficiency
and throughput, and the amount of DS coding that is required.
4) Not all the case. Datastage are strong in Transformations. With dedicated
CPUs and memory, and after all the sole purpose of the ETL is to do that. We
need to define the line based on the resource Usage in either of the servers.
Overloading highly utilized DB servers with comparatively idle ETL servers

may not be an advisable option
5) I disagree with Deepak When datastage was designed and even today its best
processing is files Everything that can be done in the database should be done in
the database. Transformation stages are resource intensive and should be used
sparingly. Use other objects which contain the functionality you want when you can
Like the modify stage for instance. You can read any documents on Tuning and see
that things like what was said by rjhcc are there. Only process what needs to go to
the target, drop columns as soon as possible. (DB is best place) and there's more
but not enough time to go through those kinds of details. Unless of course your
database is an Access database ..........
As I said, its not the case, all the time. If this is 100% true, ELT would be in place
and ETL would be demolished by now or would have been used when there is a
option of Files and legacy systems.. And more move, ELT are less expensive in
infrastructure wise. And many jobs could have been converted into Store proc too.
And when you say, Transformation stages are resource intensive, its totally a
relative term. Transformers are resource intensive when compared to the other
stages in Datastage. They were never compared to the DB processers. It could be
possibly more when compared rightly, but 90% CPU utilized DB server might yield
very less through put with the given transformations than the 10% CPU utilized ETL
servers. We have had circumstances where we need to extract 6Mil data in the
place where we only need less than 1Mil data. As I said, it all depends on
environment and resource utilization. When I always support the concept of
neglecting the unnecessary data as early as possible, I ll also need take wise
decision based on the other factors too.
Mathematically and logically, there can never be an instance where it is more
efficient to extract 6 million records where only one million are needed. The extra
network bandwidth alone supports this convention. This is even more significant on
a busy DB server......if a join was causing performance issues, the DB developers
have solutions using the DB to allow better performance. If the join was across
systems, or data was coming from multiple DB's on different servers,
than.......ADS......and that's not the same as the original comment indicated.
In a heavily utilized processing environment, what has been said holds true to an
even greater degree. If your extracting from an OLTP server that is very busy, than
there are other ways.......may I suggest unloading the data to an ADS or some other
pre-processing storage medium before beginning processing....? At that point, SQL
is extremely powerful......and pre-processing a necessity.
Using stored procedures certainly solves certain issues and is recommended when
possible.
DataStage offers solutions for ETL and ELT that normal SQL is not able to perform.
This debate will rage on for years between ETL developers and SQL coders until the
end of time......and is outside the scope of this debate. If you wish to implement
everything in DataStage, go ahead....but there are trade offs. Same goes if you wish
to create an entire system with only SQL......
ETL has a place, as does SQL. It is the combined results of both that give DataStage
it's power, among other features. It is the wise use of both that gives the DataStage
developer an advantage. There are always many solutions to a given problem, and
while saying that it 'depends' is certainly true based upon the environments,
business requirements, and available equipment/resources. That being said, just
because a solution worked in a given situation does not make it proper, does not
mean that it complies with best practices, does not make it maintainable, nor does
it make it wise. It simply means that it worked once in that particular situation. I've
opened canned foods with a knife......it wasn't very smart when can openers do the
job safely and much more efficiently.....Logical explanation would be? when a data
was extracted from a high visible OLTP DB, for miscellaneous purpose (compared to
the actual purpose of DB itself) like nice to have reporting. Hence it was given only
a percentage of CPU allocation and short batch window. Exceeding the limit would
block the extract option from the DB. When the data need to be extracted after
couple of heavy aggregation and followed by a couple of joins CPU and memory and
memory usage and the processing time shot up way high beyond the limit!!
Whatever the hints or index or partitions we use. It was resolved simply by pulling
the data with simple select and aggregation and join was made to perform in ETL
server in the memory. Mathematically, it satisfied the requirement!!
35Q) When running a job, i get the following errors, Join_54: When checking
operator: User inserted sort "Sort_58" does not fulfill the sort requirements of the
downstream operator "APT_JoinSubOperatorNC in Join_54" i sort two columns in
Sort_58, and then use these two columns as key in Join_54, what's the reason for
such an error?
35ANS) But it doesn't say that. It says, you have changed. Or perhaps did you
repartitioned the data before join? If so, make it has same partition. Check both the
streams for the join stage.
just got this warning and determined the cause was sorting with a coulmn or two
that might have a Null value. I changed the Sort column to and ID column that will
not be null and the warnings are gone.
Go into input; partitionement, choose hash, and then choose the same key for the
both join stage and then click to SORT it
Thanks, there is no NULL value in
the sort key columns.
And after i use hash in partitionment, and choose the same key for the both join
stage to sort, such warnings still there,
If you have columns with the same name on both input link and don't use these
columns in join key, the process generate warning.
Sorry, after check, except for the two join keys, there is no other columns with the
same name on input link and output link,
36QS) In the transformer derivative for a field i have this 00" :
Left(Right(AmountToString,17),14) : Right(AmountToString,2) Can anyone explain
me the functioning of this
SOL) 1. Pad two zeros to the right. "00" 2. Get substring of 3rd to 14th position
(skip first 3 character) [Left(Right(AmountToString,17),14)] 3. Append first 2
character to the last. [Right(AmountToString,2)]
Lets say AmountToString value is 0123456789ABCDEFGHIJ(20 characters)
Right(AmountToString,17) = right(0123456789ABCDEFGHIJ,17) =
3456789ABCDEFGHIJ (Take 17 characters from ur right hand side)
Left(Right(AmountToString,17),14) =
left(3456789ABCDEFGHIJ,14)=3456789ABCDEFG (Take 14 characters starting from
ur left hand side)
Right(AmountToString,2)= right(0123456789ABCDEFGHIJ,2) = IJ (Take 2 characters
starting from ur right hand side)
Final functional: 00" :
Left(Right(AmountToString,17),14) : Right(AmountToString,2) =
003456789ABCDEFGIJ

Usefull Stuff Datastage

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Usefull Stuff Datastage

Uploaded by

Copyright:

Available Formats

Q1.

3SOL) . First read all the files that are to be processed.

in as the state and looks up against the abbreviations it Matches on NE and so I am

9) greatn i see you are still following this post,

3) For you specific case, this could be a solution:

and Argument = " message to add to log"

then '' else

Overloading highly utilized DB servers with comparatively idle ETL servers

You might also like