Data Stage

Data Stage 1. Extraction from operational source or archive system which are the primary source of data 2.
Transformation involves cleansing and filtering and applying data rules 3. Loads the data in DW or other DB or application 4. Data stage ETL segments->jobs accessing source systems, loading look ups, transformation jobs, loading jobs 5. jobs accessing source systems ->extracts data from source system like filtering and validation includes trimming blank spaces and filtering irrelevant data 6. Loading look ups -> these jobs have to run inorder to run the transformations. They load look up files, prepares surrogate key mapping files, set up data sequence, set up the parameters. 7. transformation -> they apply business rules and shape the data that would be loaded into the DW( dimensions and facts) 8. Load-> loads the data into the data base. They loads he surrogate keys and loading dimension tables and fact tables 9. FEATURES Data Stage-> graphical design tool Extracts data from any type of data source Handles all the metadata definitions required to build DW Aggregates data 10. Data stages have server and client components. Server in unix or windows and client in windows 11. It can extracts data from mainframe datasets, ERP applications, flat files, various data sources. 12. Designer->used to create DS application called jobs. Job specifies data source, transformation and the destinations of data. Jobs are compiled to create executables that are scheduled by the director and run the server 13. Director-> validate, schedule, run and monitor jobs and parallel jobs 14. Administrator-> interface to view and edit the contents of repository. 15. Manager-> interface to perform administrator works like moving the projects, setting up the users, setting up the purging criteria. 16. Repository which contains all the jobs, containers, routines and table definitions. 17. Palette contains various stages. 18. Debug menu will be available only for server jobs or server shared containers. Gives access to debugger command. 19. ODBC- Open database connectivity-> intermediate stage for aggregating data. 20. Hashed files acts as an intermediate stage for quick lookups. 21. Sequential files loads data into operating system text files. 22. Aggregator and transformers are the passive stages. 23. Inter process provides communication between DS processes that are running simultaneously in the same job. 24. Passive stages are used to read or write data from data sources. 25. Server jobs supports 2 types of links stream and reference 26. Stream represents the flow of data that is used by the passive and active stages.
27. References are like look ups. They are used by the active stages alone. It provides the information that might affect the way the data is changed and dont supply the data to be changed. 28. Stream links are represented by solid lines and the reference links are represented by dotted lines. 29. If the jobs are compiled successfully then u can validate, run, schedule, release, and package the job for deployment on other data stage system. 30. One can run the job from designer only if the focused job is compiled and saved. 31. Detailed info about the jobs status will be available in director tool. 32. You can validate the job in the job run option in DS director. 33. One can view all the active stages and CPU utilization for the active stages in DS director. 34. One can stop the job from DS director. 35. One can reset the job to return it to the run able state. 36. Log view is available in DS director to view the current runs and previous runs 37. Can view the log event details in DS director. 38. Jobs can be scheduled through the schedule option in DS director. 39. We have project view and host view in DS manager. 40. We can create, delete and manage the categories in the repository. 41. We can create new objects in DS manager. 42. We can Export the repository components through the export menu in the DS manager. 43. We can do usage analysis in DS manager. We can view the relationship of items with the currently selected item. 44. We can configure the local host using the DS manager. 45. We can view, delete and can set the properties for the project using DS administrator. 46. We can add and delete the projects using Ds administrator. 47. Environment variables are defined in the project. 48. Only user defined category allows creation of new variables. 49. W can trace the project using the DS administrator. 50. The trace files will be created for all the projects in DS administrator. 51. Using scheduler user account, one can schedule the jobs in DS administrator. 52. We can set the memory cache files for reading and writing hashing files. 53. We can also enable the row buffering type and size in DS administrator. 54. There are three components 1) Repository 2)DS Server 3)DS Package installer 55. Repository-> central store that contains all the information to build the DW. 56. DS stores all the user account, jobs, metadata in relational DB. 57. The hash files can be accessed by writing the SQL commands using the DS administrator. 58. DS is previously called as Ardent DataStages and Ascential DataStages. After that it was named WebSphere DataStage. 59. DS editions 1) Server edition 2) Enterprise edition -> includes parallel and server jobs 3) MVS edition -> for mainframe systems 4)DS for peoplesoft 5) DataStageTX (Mercator) 6)DS SOA -> can turn server or parallel job into SOA

Data Stage

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Stage

Uploaded by

Copyright:

Available Formats

Data Stage 1. Extraction from operational source or archive system which are the primary source of data 2.

You might also like