Professional Documents
Culture Documents
3/17/2014
shakthidhar bommireddy
3/17/2014
shakthidhar bommireddy
Ascential Platform
3/17/2014
shakthidhar bommireddy
What is DataStage?
o Design jobs for Extraction, Transformation, and loading (ETL) o Ideal tool for data integration projects - such as, data warehouses, data marts, and system migrations o Import, export, create, and managed metadata for use within jobs o Schedule, run, and monitor jobs all within DataStage o Administer your DataStage development and execution environments
3/17/2014
shakthidhar bommireddy
3/17/2014
shakthidhar bommireddy
Datastage Administrator
3/17/2014
shakthidhar bommireddy
Datastage Administrator
Use the Administrator to specify general server defaults, add and delete projects, and to set project properties.
Set job monitoring limits and other Director defaults on the General tab. Set user group privileges on the Permissions tab. Enable or disable server-side tracing on the Tracing tab. Specify a user name and password for scheduling jobs on the Schedule tab. Specify hashed file stage read and write cache sizes on the Tunables tab.
3/17/2014
shakthidhar bommireddy
Client Logon
3/17/2014
shakthidhar bommireddy
DataStage Manager
3/17/2014
shakthidhar bommireddy
DataStage Manager
Use the Manager to store and manage reusable metadata for the jobs you define in the Designer. This metadata includes table and file layouts and routines for transforming extracted data. Manager is also the primary interface to the DataStage repository. In addition to table and file layouts, it displays the routines, transforms, and jobs that are defined in the project. Custom routines and transforms can also be created in Manager.
3/17/2014
shakthidhar bommireddy
10
DataStage Designer
3/17/2014
shakthidhar bommireddy
11
DataStage Designer
The DataStage Designer allows you to use familiar graphical point-and-click techniques to develop processes for extracting, cleansing, transforming, integrating and loading ,data into warehouse tables. The Designer provides a "visual data flow" method to easily interconnect and configure reusable components.
shakthidhar bommireddy
12
3/17/2014
DataStage Director
Use the Director to validate, run, schedule, and monitor your DataStage jobs. You can also gather statistics as the job runs.
3/17/2014
shakthidhar bommireddy
13
Developing in DataStage
Define your project's properties: Administrator Open (attach to) your project Import metadata that defines the format of data stores your jobs wil read from or write to: Manager Design the job: Designer - Define data extractions (reads) - Define data flows .- Define data integration - Define data transformations - Define data constraints - Define data loads (writes) - Define data aggregations Compile and debug the job: Designer Run and monitor the job: Director
shakthidhar bommireddy
14
3/17/2014
DataStage Projects
3/17/2014
shakthidhar bommireddy
15
DataStage Projects
3/17/2014
shakthidhar bommireddy
16
Review
o DataStage Designer is used build and compile your ETL jobs. o Manager is used to execute your Jobs after you build them. o Director is used to execute your jobs after you build them. o Administrator is used to set global and project properties.
3/17/2014
shakthidhar bommireddy
17
3/17/2014
shakthidhar bommireddy
18
Module Objectives
After this module you will be able to: - Explain how to create and delete projects - Set project properties in Administrator - Set EE global properties in Administrator
3/17/2014
shakthidhar bommireddy
19
Project Properties
Projects can be created and deleted in Administrator. Project properties and defaults are set in Administrator.
Recall from module 1: In DataStage all development work is done within a project. Projects are created during installation and after installation using Administrator.
Each project is associated with a directory. The directory stores the objects (jobs, metadata, custom routines, etc.) created in the project.
Before you can work in a project you must attach to it (open it).
You can set the default properties of a project using DataStage Administrator.
3/17/2014
shakthidhar bommireddy
20
3/17/2014
shakthidhar bommireddy
21
Licensing Tab
3/17/2014
shakthidhar bommireddy
22
3/17/2014
shakthidhar bommireddy
23
You can limit the logged events either by number of days or number of job runs.
3/17/2014
shakthidhar bommireddy
24
Environment Variables
3/17/2014
shakthidhar bommireddy
25
Permissions Tab
3/17/2014
shakthidhar bommireddy
26
Permissions Tab
Use this page to set user group permissions for accessing and using DataStage. All DataStage users must belong to a recognized user role before they can log on to DataStage. This helps to prevent unauthorized access to DataStage projects. There are three roles of DataStage user: DataStage Developer, who has full access to all areas of a DataStage project. DataStage Operator, who can run and manage released DataStage jobs. <None>, who does not have permission to log on to DataStage. UNIX note: In UNIX, the groups displayed are defined in /etc/group.
3/17/2014
shakthidhar bommireddy
27
Tracing Tab
3/17/2014
shakthidhar bommireddy
28
Tracing Tab
This tab is used to enable and disable server-side tracing. The default is for server-side tracing to be disabled. When you enable it, information about server activity is recorded for any clients that subsequently attach to the project. This information is written to trace files. Users with in-depth knowledge of the system software can use it to help identify the cause of a client problem. If tracing is enabled, users receive a warning message whenever they invoke a DataStage client. Warning: Tracing causes a lot of server system overhead. This should only be used to diagnose serious problems.
3/17/2014
shakthidhar bommireddy
29
TunablesTab
On the Tunables tab, you can specify the sizes of the memory caches used when reading rows in hashed files and When writing rows to hashed files. Hashed files are mainly used for lookups and are discussed in a later module.
3/17/2014
shakthidhar bommireddy
30
Parallel Tab
You should enable OSH for viewing - OSH is generated when you compile a job.
3/17/2014
shakthidhar bommireddy
31
3/17/2014
shakthidhar bommireddy
32
Module Objectives
After this module you will be able to: - Describe the DataStage Manager components and functionality - Import and export DataStage objects - Import metadata for a sequential file
3/17/2014
shakthidhar bommireddy
33
What Is Metadata?
Metadata is "data about data" that describes the formats of sources and targets. This includes general format information such as whether the record columns are delimited and, if so, the delimiting character. It also includes the specific column definitions.
3/17/2014
shakthidhar bommireddy
34
DataStage Manager
DataStage Manager is a graphical tool for managing the contents of your DataStage project repository, which contains metadata and other DataStage components such as jobs and routines. The left pane contains the project tree. There are seven main branches, but you can create subfolders under each. Select a folder in the project tree to display its contents.
3/17/2014
shakthidhar bommireddy
35
Manager Contents
Metadata describing sources and targets: Table definitions DataStage objects: jobs, routines, table, definitions, etc..
3/17/2014
shakthidhar bommireddy
36
Any object in Manager can be exported to a file Can export whole projects Use for backup Sometimes used for version control Can be used to move DataStage objects from one project to another Use to share DataStage jobs and projects with other developers
3/17/2014
shakthidhar bommireddy
37
Export Procedure
In Manager, click "Export>DataStage Components" Select DataStage objects for export Specified type of export: DSX, XML Specify file path on client machine
3/17/2014
shakthidhar bommireddy
38
Review Q
You can export DataStage objects such as jobs, but you can't export metadata, such as field definitions of a sequential file. (T/F) The directory to which you export is on the DataStage client machine, not on the DataStage server machine. (T/F)
3/17/2014
shakthidhar bommireddy
39
3/17/2014
shakthidhar bommireddy
40
3/17/2014
shakthidhar bommireddy
41
Import Procedure
3/17/2014
shakthidhar bommireddy
42
3/17/2014
shakthidhar bommireddy
43
Import Options
3/17/2014
shakthidhar bommireddy
44
Metadata Import
Import format and column destinations from sequential files Import relational table column destinations Imported as "Table Definitions" Table definitions can be loaded into job stages
3/17/2014
shakthidhar bommireddy
45
In Manager, click Import>Table Definitions>Sequential File Definitions Select directory containing sequential file and then the file Select Manager category Examined format and column definitions and edit is necessary
3/17/2014
shakthidhar bommireddy
46
In Manager, select the category (folder) that contains the table definition. Double-click the table definition to open the Table Definition window. Click the Columns tab to view and modify any column definitions. Select the Format tab to edit the file format specification.
3/17/2014
shakthidhar bommireddy
47
3/17/2014
shakthidhar bommireddy
48
3/17/2014
shakthidhar bommireddy
49
Module Objectives
After this module you will be able to:
-Describe what a DataStage job is
- List the steps involved in creating a job - Describe links and stages - Identify the different types of stages - Design a simple extraction. and load job - Compile your job - Create parameters to make your job flexible - Document your job
3/17/2014
shakthidhar bommireddy
50
What Is a Job?
Executable DataStage program Created in DataStage Designer, but can use components from Manager Built using a graphical user interface Compiles into Orchestrate shell language (OSH)
3/17/2014
shakthidhar bommireddy
51
In Manager, import metadata defining sources and targets In Designer, add stages defining data extractions and loads Add Transformers and other stages to defined data transformations Add links defining the flow of data from sources to targets Compile the job In Director, validate, run, and monitor your job
shakthidhar bommireddy
52
3/17/2014
3/17/2014
shakthidhar bommireddy
53
Designer Toolbar
3/17/2014
shakthidhar bommireddy
54
Tools Palette
3/17/2014
shakthidhar bommireddy
55
Stages can be dragged from the tools palette or from the stage type branch of the repository view Links can be drawn from the tools palette or by right clicking and dragging from one stage to another
3/17/2014
shakthidhar bommireddy
56
Used to extract data from, or load data to, a sequential file Specify full path to the file Specify a file format: fixed width or delimited Specified column definitions Specify write action
3/17/2014
shakthidhar bommireddy
57
3/17/2014
shakthidhar bommireddy
58
3/17/2014
shakthidhar bommireddy
59
Meta data may be dragged from the repository and dropped on a link.
3/17/2014
shakthidhar bommireddy
60
3/17/2014
shakthidhar bommireddy
61
Any required properties that are not completed will appear in red. You are defining the format of the data flowing out of the stage, that is, to the output link. Define the output link listed in the Output name box. You are defining the file from which the job will read. If the file doesn't exist, you will get an error at run time. On the Format tab, you specify a format for the source file. You will be able to view its data using the View data button. Think of a link as like a pipe. What flows in one end flows out the other end (at the transformer stage).
3/17/2014
shakthidhar bommireddy
62
3/17/2014
shakthidhar bommireddy
63
Defining a sequential target stage is similar to defining a sequential source stage. You are defining the format of the data flowing into the stage, that is, from the input links. Define each input link listed in the Input name box. You are defining the file the job will write to. If the file doesn't exist, it will be created. Specify whether to overwrite or append the data in the Update action set of buttons. On the Format tab, you can specify a different format for the target file than you specified for the source file. If the target file doesn't exist, you will not (of course!) be able to view its data until after the job runs. If you click the View data button, DataStage will return a "Failed to open ..." error. The column definitions you defined in the source stage for a given (output) link will appear already defined in the target stage for the corresponding (input) link. Think of a link as like a pipe. What flows in one end flows out the other end. The format going in is the same as the format going out.
shakthidhar bommireddy
64
3/17/2014
Transformer Stage
Used to define constraints, derivations, and column mappings A column mapping maps an input column to an output column In this module will just defined column mappings (no derivations)
3/17/2014
shakthidhar bommireddy
65
3/17/2014
shakthidhar bommireddy
66
shakthidhar bommireddy
67
3/17/2014
shakthidhar bommireddy
68
Stage variables are used for a variety of purposes: Counters Temporary registers for derivations Controls for constraints
3/17/2014
shakthidhar bommireddy
69
Result
3/17/2014
shakthidhar bommireddy
70
Makes the job more flexible Parameters can be: - Used in constraints and derivations - Used in directory and file names Parameter values are determined at run time
3/17/2014
shakthidhar bommireddy
71
Job Properties - Short and long descriptions - Shows in Manager Annotation stage - Is a stage on the tool palette - Shows on the job GUI (work area)
3/17/2014
shakthidhar bommireddy
72
3/17/2014
shakthidhar bommireddy
73
3/17/2014
shakthidhar bommireddy
74
3/17/2014
shakthidhar bommireddy
75
3/17/2014
shakthidhar bommireddy
76
Compiling a Job
Before you can run your job, you must compile it. To compile it, click File>Compile or click the Compile button on the tool bar. The Compile Job window displays the status of the compile. A compile will generate OSH.
3/17/2014
shakthidhar bommireddy
77
If an error occurs: Click Show Error to identify the stage where the error occurred. This will highlight the stage in error. Click More to retrieve more information about the error. This can be lengthy for parallel jobs,
3/17/2014
shakthidhar bommireddy
78
3/17/2014
shakthidhar bommireddy
79
Module Objectives
After this module you will be able to:
- Validate your job - Use DataStage Director to runyour job - Set to run options - Monitor your job's progress - View job log messages
3/17/2014
shakthidhar bommireddy
80
3/17/2014
shakthidhar bommireddy
81
DataStage Director
Can schedule, validating, and run jobs Can be invoked from DataStage Manager or Designer Tools > Run Director
3/17/2014
shakthidhar bommireddy
82
This shows the Director Status view. To run a job, select it and then click Job>Run Now. Better yet: Shift to log view from main Director screen. Then click green arrow to execute job.
3/17/2014
shakthidhar bommireddy
83
3/17/2014
shakthidhar bommireddy
84
3/17/2014
shakthidhar bommireddy
85
Click the Log button in the toolbar to view the job log. The job log records events that occur during the execution of a job. These events include control events, such as the starting, finishing, and aborting of a job; informational messages; warning messages; error messages; and programgenerated messages.
3/17/2014
shakthidhar bommireddy
86
3/17/2014
shakthidhar bommireddy
87
Schedule job to run on a particular date/time Clear job log Set Director options
- Row limits - Abort after x warnings
3/17/2014
shakthidhar bommireddy
88