Front Cover: Ibm Infosphere Information Server Administration V9.1

V7.0.
cover
Front cover
IBM InfoSphere Information

Server Administration v9.1
(Course code KM502)
Student Notebook
ERC 1.0
Student Notebook
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
Business Machines Corp., registered in many jurisdictions worldwide.
The following are trademarks of International Business Machines Corporation, registered in
many jurisdictions worldwide:
DataStage DB2 IA
Informix InfoSphere MVS
QualityStage WebSphere z/OS
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or
both.
Windows is a trademark of Microsoft Corporation in the United States, other countries, or
both.
UNIX is a registered trademark of The Open Group in the United States and other
countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of
Oracle and/or its affiliates.
Other product and service names might be trademarks of IBM or other companies.
December 2012 edition

The information contained in this document has not been submitted to any formal IBM test and is distributed on an as is basis without
any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer
responsibility and depends on the customers ability to evaluate and integrate them into the customers operational environment. While
each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will
result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk.
Copyright International Business Machines Corporation 2007, 2012.

This document may not be reproduced in whole or in part without the prior written permission of IBM.
Note to U.S. Government Users Documentation related to restricted rights Use, duplication or disclosure is subject to restrictions
set forth in GSA ADP Schedule Contract with IBM Corp.
V7.0.1
Student Notebook
TOC Contents
Course description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
Unit 0. IBM InfoSphere Information Server Administration v9.1 . . . . . . . . . . . . . . . 0-1

Course objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-2
Course objectives, continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-3
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-4
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-5
Introductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0-6
Unit 1. Technical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
Information Server functional categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
Hosted products support functional categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
Role-based tools with integrated metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6
Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
Blueprint Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-8
Information Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-9
Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-10
Business Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
Metadata Workbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-12
Cleansing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-13
Why data cleansing with QualityStage is needed . . . . . . . . . . . . . . . . . . . . . . . . . 1-14
QualityStage functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-16
Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18
Using Information Server to transform data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-19
DataStage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-20
FastTrack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-21
Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-22
Information Services Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-23
Change Data Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-24
Information Server Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-25
Information Server architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-26
Information Server backbone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-27
Parallel processing engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-28
Information Server architectural tiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-29
Architecture diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-30
Platform topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-32
Client tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33
Services tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-34
Engine tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-35
Repository tier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-36
Tier interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-37
Copyright IBM Corp. 2007, 2012 Contents iii

Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-38
Exercises Unit 01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-39
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-40
Unit 2. Overview of Clients used for Administration . . . . . . . . . . . . . . . . . . . . . . . . 2-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Client-Server architecture overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Information Server client icons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Dedicated administrative clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Administration within hosted products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
Dedicated Administrative Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
Information Server Web Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
Logging into the Information Server Web Console . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Information Server Web Console tabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11
Web Console functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
Metadata Asset Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
Repository Management tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
WebSphere Application Server (WAS) console . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
WAS servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-18
Product Clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-19
Engine clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20
Multi-Client Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-21
DataStage Administrator tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22
Logging Into Administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23
DataStage Administrator Projects tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-24
DataStage Administrator General Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-25
DataStage environment variables settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-26
Permissions tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27
Parallel tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-28
Job Sequence defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29
DataStage job log defaults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30
DataStage Designer administrative tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
Logging into Designer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-32
Designer work area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
Monitoring a running DataStage job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34
Performance statistics in Designer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-35
Director client Status View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-36
Job log messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-37
DataStage and QualityStage Operations Console . . . . . . . . . . . . . . . . . . . . . . . . 2-38
Operations Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-39
FastTrack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-40
FastTrack data source configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-41
Business Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-42
Business Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-43
Metadata Workbench (MWB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-44
Metadata Workbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-45
Viewing the Information Server Metadata Model . . . . . . . . . . . . . . . . . . . . . . . . . 2-46
Information Server Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-47
iv Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012

V7.0.1
Student Notebook
TOC Logging on to the Information Server Console . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-48

Information Server Console Home tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-49
Information Server Console System Configuration menu . . . . . . . . . . . . . . . . . . . 2-50
Checkpoint questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-51
Exercises Unit 02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-52
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-53
Unit 3. Authentication and Suite Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
IS Authentication Registry Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
Security administration tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
Information Server authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
Architecture for internal user registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
Architecture for an OS user registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
Architecture for an LDAP external user registry . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8
WAS security configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
IS Web Console User Registry Configuration tab . . . . . . . . . . . . . . . . . . . . . . . . . 3-10
Switching to the local OS user registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11
Configuring the local OS user registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
Configuring the local OS user registry, continued . . . . . . . . . . . . . . . . . . . . . . . . . 3-13
Switching to the LDAP user registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Configuring the LDAP user registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
Configuring the LDAP user registry, continued . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16
Switching a user registry for a system in use . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17
Engine Security Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18
Engine security configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19
Shared OS user registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-20
Shared LDAP user registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-21
Configuring IS for sharing the user registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-22
Credential mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-23
Credential mappings diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-24
Information Server User Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25
Assigning roles for access control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-26
Suite roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27
Suite Component roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-28
Creating IS users and groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-29
Creating a new group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-30
Selecting group attributes and roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-31
Creating a new user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-32
Specifying user attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-33
Credential Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-34
Default credential mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-35
Specify the default credential mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-36
User credential mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-37
Individual credential mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-38
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-39
Exercises Unit 03 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-40
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-41
Copyright IBM Corp. 2007, 2012 Contents v

Student Notebook
Unit 4. Stopping and Starting Information Server . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Starting and stopping Information Server (IS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
Stopping Information Server (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
Checking for DataStage processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
Example: Stopping the Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
Starting Information Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
Starting the ASB agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11
Starting the DataStage engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12
Checking the Engine status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13
Other checks on the engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-15
Exercises Unit 04 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17
Unit 5. Session Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2
Client session management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
Viewing active client sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-4
Global session properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5
Session details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6
Disconnecting sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
Log Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
Log management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
Managing configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10
DataStage component configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11
DataStage.ALL configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12
Log views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13
Log view messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14
Creating a new log view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
Example log view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16
Reporting Administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-17
Reporting administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-18
Creating a report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-19
Selecting the report template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-20
Editing the report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21
Running a report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-22
Sample report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23
Report access control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24
Information Server Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25
Locking overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-26
Clearing Repository locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27
Manually clearing locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28
Clearing Engine locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-29
Clearing locks in Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-30
vi Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012

V7.0.1
Student Notebook
TOC Clearing logs in Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31

Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-32
Exercises Unit 05 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-33
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-34
Unit 6. Engine Tier Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
Traditional batch processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
Traditional approach to parallel processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4
Data flow model of application design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
Data pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
Partition parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7
Parallel engine combines partition and pipeline parallelism . . . . . . . . . . . . . . . . . . 6-8
Partitioning and collecting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
Partitioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10
Collectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11
Parallel sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
Parallel Job Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13
Parallel job compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14
Generated OSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15
Parallel Engine Runtime Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16
Parallel engine runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17
Parallel engine runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-18
Job execution: the process orchestra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19
Runtime control and data networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20
Understanding the job Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21
Viewing the job Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22
Example job Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-23
Counting the total number of processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-24
Parallel Job Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-25
Configuration file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-26
Configuration file nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-27
Sample configuration file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-28
Factors affecting optimal degree of parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-29
Node pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-30
Node pools example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-31
Disk pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-32
Sort resource usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-33
Buffer scratch disk pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-34
Buffer scratch disk pools example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-35
Buffer resource usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-36
Configuration file guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-37
Configuration file - the default.apt file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-38
Configuration file - sizing the number of nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-39
Configuration file tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-40
Minimizing resource requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-43
Copyright IBM Corp. 2007, 2012 Contents vii

Student Notebook
Editing a configuration file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-44

Running a job with a non-default configuration file . . . . . . . . . . . . . . . . . . . . . . . . 6-45
Engine Command Line Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-46
Engine command line interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-47
dsjob command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-48
dsjob command syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-49
dsjob -lprojects command example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-50
dsjob -run command example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-51
dsjob -logsum (log summary) command example . . . . . . . . . . . . . . . . . . . . . . . . 6-52
dsjob -report (job report) command example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-53
dsadmin command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-54
dsadmin command syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-55
dsadmin command examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-56
DSXImportService -List command example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-57
DSXImportService import command example . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-58
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-59
Exercises Unit 06 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-60
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-61
Unit 7. Engine Tier Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
DataStage Project Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3
DataStage project configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
Administrator tabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5
Administrator Project Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6
Runtime Column Propagation (RCP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7
Enabling Runtime Column Propagation (RCP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
Enabling RCP at project level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
DataStage project user permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-10
Permissions tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-11
Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13
Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-14
Job with Data Set stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-15
Data Set Management utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16
Data and schema displayed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-17
Application Data Set usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-18
Using orchadmin command utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19
"orchadmin ll"command output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-20
Sample orchadmin ll data set report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-21
Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-22
Setting environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-23
Environment variable settings in dsenv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-24
Minimum set of environment variables in dsenv . . . . . . . . . . . . . . . . . . . . . . . . . . 7-25
Project level environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-26
DSParams file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-27
Operational Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-28
Capturing operational metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-29
viii Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
V7.0.1
Student Notebook
TOC Operational metadata option in Administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-30

What is operational metadata? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-31
Configuring Run Import (runimport.cfg) file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-32
Generated XML files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-33
Executing the Run Import utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-34
Job run reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-35
Deleting operational metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-36
Multiple Job Compile Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-38
Multiple job compile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-39
Selection Criteria window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-40
Selection Override window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-41
Compile Process window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-42
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-43
Exercises Unit 07 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-44
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-45
Unit 8. Engine Tier Database Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
Enterprise Application Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3
Engine database connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
Engine database connectivity, continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5
Information Server connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
Information Server supported connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7
Configuring Database Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-8
Database connectivity software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-9
Common database software requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-10
File system permission requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-11
Engine environment variable requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-12
Database-specific environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-13
Database permission requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-14
Setting LD_LIBRARY_PATH in Administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-15
Operator specific environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-16
Setting LD_LIBRARY_PATH in the dsenv file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-17
dsenv file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-18
ODBC Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-19
ODBC drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-20
ODBC architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-21
Configuring ODBC connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-22
Sample database settings to add to dsenv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-23
.odbc.ini file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-24
Sample .odbc.ini entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-25
.odbc ODBC data source listing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-26
uvodbc.config . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-27
Sample uvodbc.config file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-28
Testing ODBC connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-29
Running the dssh command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-30
For non-wired ODBC drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-31
Database Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-32
Copyright IBM Corp. 2007, 2012 Contents ix

Student Notebook
DB2 DataStage configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-33

DB2 configuration example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-34
Oracle configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-35
Teradata configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-36
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-37
Exercises Unit 06 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-38
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-39
Unit 9. Engine Tier Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
Monitoring DataStage jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3
Monitoring job sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-4
Job sequence example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
Monitoring job messages in Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6
Sequence job log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7
Operations Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
Operations Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9
Configuring the Operations Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10
Starting the Operations Console services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11
Operations Console GUI - Dashboard tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12
Dashboard GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-13
Operations Console GUI - Projects tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-14
Projects GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15
Example - Run and monitor a job sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-16
View the job activity on the Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17
Job run details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-18
Workload management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19
Workload Management tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20
Queue Management tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21
Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22
Performance analysis in the past . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-23
Performance Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-24
Enabling performance data recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-25
Example job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-26
Job timeline chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-27
Viewing by partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-28
Record throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-29
Stage CPU usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-30
Displaying selected stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-31
Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-32
Resource Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-33
Resource Estimation tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-34
Creating a model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-35
Information the model contains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-36
Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-37
Resource Estimation window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-38
Input Projections folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-39
Job Tree folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-40
x Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012

V7.0.1
Student Notebook
TOC Stages folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-41

Charts folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-42
Creating a model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-43
Creating a projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-44
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-45
Exercises Unit 09 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-46
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-47
Unit 10. Metadata Asset Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
Asset Interchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3
What is asset interchange? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4
Uses of asset interchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-5
Invoking the asset interchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6
Asset interchange archive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7
DataStage export / import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-8
Specifying DataStage assets in istool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-9
Security export / import command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-10
Example: Exporting parallel jobs in a project folder . . . . . . . . . . . . . . . . . . . . . . 10-11
Import example for DataStage assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-12
Example: Exporting security assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-13
Information Server Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-14
Information Server Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-15
Deploying packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16
Information Server Manager packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-17
Deploying the package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-18
Incremental builds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-19
Exporting and importing engine assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-20
Metadata Asset Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-21
Metadata asset management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-22
Common Model and its extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23
External metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-24
Metadata Workbench Model View tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-25
Data resource metadata asset examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-26
Metadata Asset Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-27
Logging into InfoSphere Metadata Asset Manager (IMAM) . . . . . . . . . . . . . . . . 10-28
Metadata Interchange Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-29
Importing metadata assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-30
Import settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-31
Creating a new import area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-32
Import parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-33
Select type of import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-34
View results in the staging area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-35
Browsing metadata assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-36
Browse logical data models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-37
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-38
Exercises Unit 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-39
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-40
Copyright IBM Corp. 2007, 2012 Contents xi

Student Notebook
Unit 11. Information Services Console Configuration . . . . . . . . . . . . . . . . . . . . . . 11-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Information Analyzer Product Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-4
Post Information Server installation steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-5
ODBC data source connection to IADB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-6
Setting user permissions in the Web Console . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7
Analysis Engine settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-8
Analysis database settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9
Data source configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-10
Define source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-11
Define source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12
Importing table definitions for source tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13
Creating a project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-14
Associate metadata with the project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-15
Add users to project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-16
Information Analyzer project roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-17
Information Services Director Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-18
Information Services Director (ISD) configuration . . . . . . . . . . . . . . . . . . . . . . . 11-19
ISD users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-20
Creating an ISD application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-21
Configure an information services connection . . . . . . . . . . . . . . . . . . . . . . . . . . 11-22
Configuring the Datastage service provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-23
Configuring a DB2 service provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-24
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-25
Exercises Unit 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-26
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-27
Unit 12. Installation and Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
Information Server Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3
Deployment models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-4
Linux Installation Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-8
Suite installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-9
Installation steps - 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-10
Installation steps - 7 - WAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-16
Installation steps - 9 - Repository database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-17
Installation steps - 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-18
Installation steps - 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-19
Installation steps - 12 - DataStage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-20
xii Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
V7.0.1
Student Notebook
TOC Installation steps - 13 - DataStage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-21

Installation steps - 14 - System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 12-22
Client Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-23
Client installation steps - 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-24
Testing the Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-28
Version.xml file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-29
Sample server version.xml file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-30
Sample client version.xml file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-31
Client tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-32
Server tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-33
Installing Information Server Fix Packs and Patches . . . . . . . . . . . . . . . . . . . . . 12-34
Information Server updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-35
Information Server update installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-36
Fix Pack and Patch install prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-37
Patch install workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-38
Verifying the fix pack installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-39
Information Server Backup and Restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-40
Backing up and restoring Information Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-41
Placing Information Server in maintenance mode . . . . . . . . . . . . . . . . . . . . . . . . 12-42
Backup procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-43
Backup and restore wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-44
Backup wizard parameters - 01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-45
Backup wizard parameters - 02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-46
Restore wizard parameters - 01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-47
Restore wizard parameters - 02 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-48
Database Capacity Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-49
Repository database (XMETA) database sizing . . . . . . . . . . . . . . . . . . . . . . . . . 12-50
Information Analyzer analysis database (IADB) . . . . . . . . . . . . . . . . . . . . . . . . . 12-51
IADB and XMETA deployments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-52
IADB sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-53
Engine High Availability Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-54
Engine High Availability (HA) option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-55
Active-Passive topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-56
HA Active-Passive model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-57
Installation configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-58
Engine HA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-59
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-60
Exercise 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-61
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-62
Unit 13. Serviceability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1

Unit objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2
Audit tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-3
Server audit tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4
Project deletion/creation messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-5
Copyright IBM Corp. 2007, 2012 Contents xiii

Student Notebook
Example DSAuditTrace.log file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6

Client audit tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-7
Example client trace log file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-8
ISA Lite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-9
ISA Lite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-10
ISA Lite Sync Project functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-11
Example Sync Project report output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-12
ISA Lite tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-13
ISA Lite window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14
Sample ISA System Summary report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-15
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-16
Exercises Unit 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-17
Unit summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-18
xiv Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
V7.0.1
Student Notebook
pref Course description

IBM InfoSphere Information Server Administration v9.1
Duration: 4 days
Purpose
IBM InfoSphere Information Server hosts a suite of products designed
for the development and delivery of data integration, data quality, and
data governance jobs. This course describes and discusses
Information Server administrative tasks surrounding the Suite as a
whole, such as security, session management, and backup and
recovery, and administrative tasks related to key Information Server
products such as DataStage and Information Analyzer.
Audience
Information Server administrators who will be supporting developers
for IBM InfoSphere Information Server and IBM InfoSphere
Information Server for z/OS products, including DataStage,
QualityStage, Information Analyzer, FastTrack, Information Services
Director, and Metadata Workbench.
Prerequisites
Those taking this course should have some experience with database
and system configuration. Some experience with Linux is helpful, but
not required.
Objectives
After completing this course, you should be able to:
Identify Information Server functional components, product
modules, and architecture components
Use and administer the Information Server products using their
clients
Configure Information Suite security for users and groups
Start and stop Information Server (IS) components
Manage IS sessions, logging and reporting
Copyright IBM Corp. 2007, 2012 Course description xv

Student Notebook
Configure and manage IS Engine components including

environment variables, configuration files, data sets, and
operational metadata
Establish database connectivity with IS
Monitor DataStage jobs from the command line
Monitor DataStage jobs and the environment in which they are
running using the DataStage Operations Console
Monitor the performance and resource usage of DataStage jobs
using the Performance Analyzer and Resource Estimator tools
Archive and package metadata assets using istool
Deploy and manage metadata assets using Information Server
Manager
Import, search, and manage metadata assets using Metadata
Asset Manager
Back up and restore IS using the ISRecovery tool
Configure Information Analyzer and Information Services Director
Install and deploy Information Server
Apply patches and fix packs to Information Server
Examine the IS system and its health using audit tracing and ISA
Lite
Contents
Unit 1. Technical Overview
Unit 2. Overview of Clients used for Administration
Unit 3. Authentication and Suite Security
Unit 4. Stopping and Starting Information Server
Unit 5. Session Management
Unit 6. Engine Tier Architecture
Unit 7. Engine Tier Configuration
Unit 8. Engine Tier Database Connectivity
Unit 9. Engine Tier Monitoring
Unit 10. Metadata Asset Management
Unit 11: Information Services Console Configuration
Unit 12: Installation, Deployment, and Recovery
xvi Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
V7.0.1
Student Notebook
pref Unit 13: Serviceability
Copyright IBM Corp. 2007, 2012 Course description xvii

Student Notebook
xviii Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
V7.0.1
Student Notebook
pref Agenda
Day 1
Unit 0: Welcome and Agenda
Unit 1: Technical Overview
Exercise 01
Unit 2: Overview of Clients used for Administration
Exercise 02
Unit 3: Authentication and Suite Security
Exercise 03
Unit 4: Stopping and Starting Information Server
Exercise 04
Day 2
Unit 5: Session Management
Exercise 05
Unit 6: Engine Tier Architecture
Exercise 06
Unit 7: Engine Tier Configuration
Exercise 07
Day 3
Unit 8: Engine Tier Database Connectivity
Exercise 08
Unit 9: Engine Tier Monitoring
Exercise 09
Unit 10. Metadata Asset management
Exercise 10
Day 4
Exercise 11
Copyright IBM Corp. 2007, 2012 Agenda xix

Student Notebook
Unit 12: Installation and Deployment

No exercise for Unit 12
Unit 13: Serviceability
No exercise for Unit 13
xx Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012

V7.0
Student Notebook
Uempty Unit 0. IBM InfoSphere Information Server

Administration v9.1
What this unit is about

This unit describes the course objectives and agenda.
Copyright IBM Corp. 2007, 2012 Unit 0. IBM InfoSphere Information Server Administration v9.1 0-1
Student Notebook
Course objectives
Identify Information Server functional components, product
modules, and architecture components
Use and administer the Information Server products using their
clients
Configure Information Suite security for users and groups
Start and stop Information Server (IS) components
Manage IS sessions, logging and reporting
Configure and manage IS Engine components including
environment variables, configuration files, data sets, and
operational metadata
Establish database connectivity with IS
Copyright
Copyright
IBM
IBM
Corporation
Corporation
2007,
20102012
Figure 0-1. Course objectives KM5021.0
Notes:
0-2 Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012
V7.0
Student Notebook
Uempty
Course objectives, continued

Monitor DataStage jobs from the command line
Monitor DataStage jobs and the environment in which they are running
using the DataStage Operations Console
Monitor the performance and resource usage of DataStage jobs using
the Performance Analyzer and Resource Estimator tools
Deploy and manage metadata assets using Information Server Manager
Import, search, and manage metadata assets using Metadata Asset
Manager
Back up and restore IS using the ISRecovery tool
Configure Information Analyzer and Information Services Director
Apply patches and fix packs to Information Server
Examine the IS system and its health using audit tracing and ISA Lite
Copyright
Copyright
IBM
IBM
Corporation
Corporation
2007,
20102012
Figure 0-2. Course objectives, continued KM5021.0
Notes:
Student Notebook
Agenda
Day 1
Unit 0: Welcome and Agenda
Unit 1: Technical Overview
Exercise 01
Unit 2: Overview of Clients used for Administration
Exercise 02
Unit 3: Authentication and Suite Security
Exercise 03
Unit 4: Stopping and Starting Information Server
Exercise 04
Day 2
Unit 5: Session Management
Exercise 05
Unit 6: Engine Tier Architecture
Exercise 06
Unit 7: Engine Tier Configuration
Exercise 07
Copyright
Copyright
IBM
IBM
Corporation
Corporation
2007,
20102012
Figure 0-3. Agenda KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Agenda
Day 3
Unit 8: Engine Tier Database Connectivity
Exercise 08
Unit 9: Engine Tier Monitoring
Exercise 09
Unit 10: Metadata Asset Management
Exercise 10
Day 4
Exercise 11
Unit 12: Installation, Deployment, and Recovery
Exercise 12
Unit 13: Serviceability
Exercise 13
Copyright
Copyright
IBM
IBM
Corporation
Corporation
2007,
20102012
Figure 0-4. Agenda KM5021.0
Notes:
Student Notebook
Introductions
Name
Company
Where you live
Your job role
Current experience with products and technologies
in this course
Databases
ETL (Extraction Transformation Load) tools
Metadata management tools
Data quality technology
Do you meet the course prerequisites?
Some experience with database and system configuration
Class expectations
Copyright
Copyright
IBM
IBM
Corporation
Corporation
2007,
20102012
Figure 0-5. Introductions KM5021.0
Notes:
V7.0
Student Notebook
Uempty Unit 1. Technical Overview

This unit presents an overview of Information Server functionality and
components. It also discusses the architecture of Information Server.
What you should be able to do

After completing this unit, you should be able to:
List the Information Server functional categories
List the Information Server products and components that support
the Information Server functional categories
List the Information Server software, architectural tiers
How you will check your progress

Lab exercises and checkpoint questions.
Copyright IBM Corp. 2007, 2012 Unit 1. Technical Overview 1-1

Student Notebook
Unit objectives
List the Information Server functional categories
List the Information Server products and components that
support the Information Server functional categories
List the Information Server software, architectural tiers
Copyright IBM Corporation 2007, 2012
Figure 1-1. Unit objectives KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Information Server functional categories
IBM InfoSphere Information Server
Unified Deployment
Understand Cleanse Transform Deliver
Discover, model, and Standardize, merge, Combine and Deliver information

govern information and correct information restructure and functionality to
structure and content information for information consumers
consumers
Integrated Metadata Management
Parallel Processing Engine

Rich Connectivity to Applications, Data, and Content
Figure 1-2. Information Server functional categories KM5021.0
Notes:
Information Server (IS) provides four basic categories of functionality: Understand,
Cleanse, Transform, Deliver. These functional categories support many different types of
enterprise data processing projects, including data integration, data quality, and business
information exchange projects, as well as many other types of enterprise projects.
Information Server hosts various products and components that provide this functionality.
These are discussed on the following pages.
Understanding has to do with functionality that helps you understand your data,
functionality that helps you understand how to accomplish what you want to accomplish,
and functionality that helps you to understand the jobs you are building to accomplish your
goals.
Cleansing functionality is used to correct and standardize the data processed by your jobs.
Transformation functionality is used to combine and restructure the data processed by your
jobs into useful information for your consumers.
Deliver functionality is used to deliver the product of your jobs to consumers.

Student Notebook
Metadata produced and consumed by the hosted Information Server products is stored in a
unified, integrated Repository. This enables the produced and consumed metadata to be
shared across the platform of hosted products.
The Information Server functionality is executed using the Information Server parallel
processing engine, which uses parallel technology to process huge amounts data at
tremendous speeds.
V7.0
Student Notebook
Uempty
Hosted products support functional categories
IBM InfoSphere Information Server
Understand Cleanse Transform Deliver
Blueprint Director
Information Analyzer QualityStage FastTrack Information Services
Discovery DataStage Director
Business Glossary Change Data Delivery
Metadata Workbench

Rich Connectivity to Applications, Data, and Content
Figure 1-3. Hosted products support functional categories KM5021.0
Notes:
Information Server (IS) hosts various products that support each of the various functional
categories. This graphic lists the products that apply to each functional category.
Some of these products support more than one functional category. Later pages will
discuss these products in more detail.

Student Notebook
Role-based tools with integrated metadata
Business Subject Matter Architect Data Developers DBA

Analyst Expert Analyst
Simplify Integration Increase trust and confidence

in information
Facilitate change Increase compliance to
Design Operational management and reuse standards
Figure 1-4. Role-based tools with integrated metadata KM5021.0
Notes:
Different roles are involved in the typical enterprise data integration project, each role
producing and consuming different types of metadata. With IBM Information Server,
metadata is managed across these different roles and functions. Different products are
geared towards different user roles. For example, FastTrack is geared towards business
analysts. DataStage is geared towards developers. As each product creates new
metadata, that metadata is immediately available to others working on the project. This
enables the different user roles to communicate with one another and to work together and
share information.
Integrated metadata management has many benefits including simplified data integration,
change management, reliable information, and increased data governance.
V7.0
Student Notebook
Uempty
Understanding
Figure 1-5. Understanding KM5021.0
Notes:

Student Notebook
Blueprint Director
Define and manage a
blueprint of your data
integration project from initial
sketches through delivery Business
Analysts
Link Information Server
metadata assets (files, table
definitions, mapping
specifications, DataStage
ETL jobs) to blueprint icons
Use pre-built templates for
usage scenarios, including
warehousing projects
Figure 1-6. Blueprint Director KM5021.0
Notes:
You use Blueprint Director to create a plan or blueprint of your Information Server project.
The blueprint is created by laying stages on a canvas and linking them together. The
stages represent different types of metadata assets (files, table definitions, mapping
specifications, DataStage jobs, and so on).
Blueprint Director comes with a set of pre-built templates for different, standard project
scenarios. Each step of the project is fully documented.
V7.0
Student Notebook
Uempty
Information Analyzer
In-depth analysis of existing data
systems
Analysis of application, database, and
file-based sources for content, quality,
and structure
Profiling of fields, and relationship Subject Matter Data
analysis across fields and across Experts Analysts
sources
Other Business
Ongoing measurement and baseline
reporting of information quality Product Modules Glossary
Analyze source data structures, and
monitor adherence to integration and
quality rules
Creates metadata that describes
where information is managed across
systems
Provides an understanding of the fitness
of specific sources and highlights data
that may need downstream attention
Physical View
Figure 1-7. Information Analyzer KM5021.0
Notes:
Information Server takes a three-sided approach to understanding, each side leveraging a
different type of metadata. The first is focused on physical metadata the structure and
contents of the different source systems within your environment.
This is accomplished through data-centric profiling and analysis of source systems,
including column analysis, table analysis, and cross-table analysis, that provide detailed
profiling of the data in each column (cardinality, nullability, range, scale, length, precision).
This activity is typically conducted by data analysts and subject matter experts. The
product that automates this is Information Analyzer. It provides insight into the quality and
usage characteristics of the information. It can also help uncover data relationships across
systems, through foreign key affinity mapping. Profiling is designed to become an ongoing
process, comparing ongoing quality against a baseline, to understand how data quality
changes over time and to ensure that the understanding assumptions are still holding true.

Student Notebook
Discovery
Compliments Information Analyzer functionality
Discover and validate possible matching keys across multiple
data sources
Discover complex business rules between two structured data
sets
Cross source data preview that enables analysts to see values
that conform to the business rules and anomalies that do not
conform
Figure 1-8. Discovery KM5021.0
Notes:
Discovery complements some of the functionality of Information Analyzer. Both products
are used to understand the data in project sources and targets. You can use Discovery to
look for and validate possible keys in different sets or sources of data. And you can use it to
look for data that is related by possibly complex business rules.
You can also use Discovery to search for anomalies in the data, that is, data that does not
conform to the business rules used to generate it.
V7.0
Student Notebook
Uempty
Business Glossary
Facilitate communications
between roles by creating and
managing a shared vocabulary of
categories and terms
Assign and manage stewards Subject Matter Data

Experts Analysts
who are responsible for metadata
assets
Link business terms to metadata

assets to facilitate greater
understanding and
communication of those assets
Figure 1-9. Business Glossary KM5021.0
Notes:
Business Glossary is a web-based tool that enables analysts and subject matter experts to
create, manage, and share a common enterprise vocabulary and classification system.
The terms used in the glossary can be linked to Information Server metadata assets, such
as columns, tables, and DataStage jobs. These terms can be used to clarify and describe
the asset.
Also within Business Glossary, stewards can be assigned to metadata assets. These
stewards are responsible for the assets. They are the ones to go to if there are questions
about the assets.

Student Notebook
Metadata Workbench
Graphical exploration of metadata
assets generated and consumed by
Information Server component
applications
Cross-tool graphs describing data Data
Integration Developers
lineage, business meaning, and Managers
impact dependencies
Provides IT professionals with a tool for
Ability to extend lineage and impact exploring and understanding the assets
generated and used by the Information
analysis to applications and assets Server suite.
outside of Information Server

Can apply terms, labels, and
stewards developed in Business
Glossary to explored assets
Figure 1-10. Metadata Workbench KM5021.0
Notes:
Metadata Workbench provides visual web-based exploration of metadata assets generated
and used by IBM Information Server components. It improves business trust in information
and increases IT responsiveness by tracing and maintaining the relationship paths of
information throughout an integration lifecycle. It visually depicts these relationships from
the sources of information to the places where information is actually used, even across
different tools and technologies. Metadata Workbench describes the complete data lineage
from applications, reports, and data warehouses back to source systems, including the
types of processing that was performed on them along the way. It also visualizes the
impact of any change to any information asset, including databases and services that
would be affected if changes occurred within a DataStage job.
V7.0
Student Notebook
Uempty
Cleansing
Figure 1-11. Cleansing KM5021.0
Notes:

Student Notebook
Why data cleansing with QualityStage is needed

Lack of information standards
Different formats and structures across different systems
Data surprises in individual fields
Data misplaced in the database
Information buried in free-form fields
Data myopia
Lack of consistent identifiers inhibit a single view
The redundancy nightmare
Duplicate records with a lack of standards
Figure 1-12. Why data cleansing with QualityStage is needed KM5021.0
Notes:
There are several types of problems within enterprise data stores.
1. The first is a lack of information standards. Names, addresses, part numbers, and other
data are entered in inconsistent ways, particularly across different systems.
2. Another common issue involves data surprises in individual fields. Data in the database
is often misplaced, or fields are used for multiple purposes as where a name field
contains company and address information, a tax ID field contains telephone numbers,
and the telephone field has a variety of mistakes.
3. A third common problem is information buried in free-form fields. In this case valuable
information is hidden away in text fields. Since these fields are difficult to query using
SQL, this information is often not leveraged, although it likely has value to the business.
This type of problem is common in product information and Customer Support case
records.
4. The fourth problem is data myopia a term for the lack of consistent identifiers across
different systems. Without adequate foreign-key relationships, it is impossible to get a
V7.0
Student Notebook
Uempty complete view of information across systems. This example shows three products that
look very different, but are actually the same.
5. The final problem is redundancy within individual tables. This is extremely common,
where data is re-entered into systems because the data entry mechanism is not aware
that the original record is already there.

Student Notebook
QualityStage functionality
Provides specialized data quality
processing
Ensures clean, standardized, de-
duplicated information
Enables a single version of the truth
Supports global postal verification
Provides visual tools for designing Subject Matter Data
Experts Analysts
quality rules and matching logic
Seamlessly integrated with DataStage Standardize and correct source data
Precisely calibrates matching rules fields, and match records together
across sources to create a single view
Allows quality logic to be deployed
seamlessly within DataStage
Extraction, Transformation, Load
(ETL) jobs
Visual Match Rule Design
Figure 1-13. QualityStage functionality KM5021.0
Notes:
QualityStage is a product that helps to identify and resolve the data cleansing issues
previously discussed. It provides data quality functions on an easy-to-use,
design-as-you-think flow diagram. This allows data quality to be embedded in any
information integration process.
QualityStage data quality functions include:
Free-form text investigation: Enables you to recognize and parse out individual fields of
data from free-form text
Standardization: Enables individual fields to be made uniform according to your
standards
Address verification and correction: uses postal information to standardize, validate,
and enrich address data
Matching: Enables duplicates to be removed from individual sources, and common
records across sources to be identified and linked
V7.0
Student Notebook
Uempty Survivorship: Enables the best data from across different systems to be merged into a
consolidated record.

Student Notebook
Transformation
Figure 1-14. Transformation KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Using Information Server to transform data

Transformation is key to
enabling information to be used
in new business contexts
Designed for use by information
experts using the
understanding provided by the Data
Data DBAs Subject Matter
metadata Analysts
Architects Experts
Transformation and delivery
can be reused across multiple
mechanisms
Logic
Large volume batch movement Reuse
Real-time event-driven
response
Service-oriented architecture Request Response
Federated query
Query
Figure 1-15. Using Information Server to transform data KM5021.0
Notes:
Information Server transforms information from the application-centric context in which it is
currently locked, into a entirely new business contexts that are appropriate to new business
opportunities or challenges. This type of transformation is not simply about
format-to-format translation, but is more focused on merging data together. Since
transformation is really focused on the context of information, it requires an understanding
of the information sources, business meaning, and relationships, so it needs to be created
by information experts (data analysts, database administrators, subject matter experts),
using the understanding provided by the metadata.

Student Notebook
DataStage
Create codeless, visual design of ETL
data flows using built-in transformation
components (stages) and links
Use stages to extract data from and load data
to data resources, including database tables,
sequential files, enterprise resources
Developers Architects
Links specify the flow of data from one stage
to another Transform and aggregate any volume
of information in batch or real time
Can create reusable sets of components through visually designed logic
(shared containers) that can be shared across
jobs, projects, and developers
Complete ETL functionality with
metadata-driven productivity
Supports team-based development
and collaboration
Figure 1-16. DataStage KM5021.0
Notes:
DataStage is the main Information Server product that is focused on transformation and
movement of information. DataStage enables codeless visual design of data flows, and
includes built-in transformation components (stages) and connectors.
DataStage is built around team collaboration and reuse. Everything from individual stages,
to connections, to entire data flows can be reused across different jobs and projects. In
addition, DataStage leverages the shared platform services for parallel processing,
administration, deployment, and connectivity.
V7.0
Student Notebook
Uempty
FastTrack
Business
Used in conjunction Users
with DataStage
Build mapping
specifications that
describe and
document DataStage Generated
ETL jobs DataStage job
Generate DataStage
jobs from the
mapping
specifications FastTrack
mapping
Reverse-engineer
specification
DataStage jobs into
mapping
specifications
Figure 1-17. FastTrack KM5021.0
Notes:
Mapping specifications specify how data is mapped and transformed from source fields to
target fields. Business analysts create mapping specifications, leveraging source analysis,
target models, and metadata to facilitate the mapping process. Prototype DataStage ETL
jobs can be generated from these FastTrack mapping specifications. These mapping
specifications guide the DataStage developers work, and provide DataStage them with a
head-start in designing and building their DataStage jobs.
DataStage jobs can also be reverse-engineered back into mapping specifications that
document their mappings and transformations.

Student Notebook
Delivery
Figure 1-18. Delivery KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Information Services Director

Rapid SOA Deployment
Package information integration logic

(DataStage jobs) as services
These services to be invoked as

Developers Architects
Enterprise Java Beans or Web
services Flexibly deploy and manage reusable
information services without hand
coding
Provides load balancing and fault
tolerance for requests across multiple
servers
Rapid SOA Deployment
Figure 1-19. Information Services Director KM5021.0
Notes:
Information Services Director is used to deliver functional and component logic as
Enterprise Java Beans or Web Services. Within the Information Server context, this logic
includes database functionality as well as DataStage ETL functionality.
DataStage jobs can include ISD input stages and/or ISD output stages. The ISD input
stages are used in a service to pass values to the job. ISD output stages are used to return
data to the service that can then be passed to the service consumers.
All functions are deployed as shared services within a Service Oriented Architecture
(SOA). This is done consistently, whether you are using DataStage, QualityStage, or DB2.

Student Notebook
Change Data Delivery

Provides real time changed-data
capture and delivery for
Dynamic warehousing, eBusiness
Synchronization
Replication Developers Architects
Provides high-volume, low-latency IS Change Data Capture

Replication Server
replication for Data Event Publisher
iReflect
Business continuity
Workload distribution
Business integration scenarios
Minimal impact on production
systems
High scalability and end-to-end
performance
Minimizes impact on performance
Wide breadth of RDBMS support of production systems
Figure 1-20. Change Data Delivery KM5021.0
Notes:
Change Data Delivery is used to deliver changed data to consumers of the data. The
changed data can be delivered for data replication or synchronization or for dynamic data
warehousing.
Change Data Delivery can replicate large volumes of data with a minimal impact on
production systems.
Replication is supported for a large number of different relational database systems.
V7.0
Student Notebook
Uempty
Information Server Architecture
Figure 1-21. Information Server Architecture KM5021.0
Notes:

Student Notebook
Information Server architecture

Provides a unified architecture
Common services for Information Server products and
components
Parallel processing engine
Repository
Service-oriented architecture
Efficiently uses hardware resources
Reduces the amount of development and administrative effort
that are required to deploy an integrated solution
Figure 1-22. Information Server architecture KM5021.0
Notes:
Information Server provides a unified architecture that works with all types of information
integration. Common services, unified parallel processing, and unified metadata are at the
core of the IS architecture.
The architecture is service-oriented, enabling Information Server to work within an
organization's evolving enterprise service-oriented architectures. A service-oriented
architecture also connects the individual products of Information Server.
V7.0
Student Notebook
Uempty
Information Server backbone
Information Business Information DataStage QualityStage MetaBrokers

Metadata
Services Glossary Analyzer Workbench
Director
Metadata Metadata
Access Services Analysis Services
Metadata Server
Figure 1-23. Information Server backbone KM5021.0
Notes:
This graphic shows the Information Server backbone. The hosted applications are at the
top. They all share the same services displayed in the middle. They all share the same
repository displayed at the bottom. The Information Server parallel processing engine is
used by several Information Server applications to run their jobs, including DataStage ETL
jobs, QualityStage data cleansing jobs, and Information Analyzer data analysis jobs.

Student Notebook
Parallel processing engine

Supports all
hardware
configurations
(single processor,
SMP, MPP, GRID,
Cluster)
Scale up by adding
processors or
nodes with no
design change or
re-compilation
External
configuration file
specifies hardware
MPP, GRID, and
configuration and Single
processor
SMP System
Clustered
resources Systems
Figure 1-24. Parallel processing engine KM5021.0
Notes:
Information Server uses a parallel processing layer (Engine) that is used by DataStage,
QualityStage, Information Analyzer, and other IS products and components. This
architecture enables those products to scale up their processing speeds by adding
additional processors, in several different hardware configurations.
V7.0
Student Notebook
Uempty
Information Server architectural tiers

Four tiers
Client tier (Information Server clients and hosted products
clients)
Services tier
Repository tier
Engine tier
Tiers may be installed on multiple computers
For example: Client tier on one computer, Services and Repository
tiers on a second computer, and engine tier on one or more
additional computers
Figure 1-25. Information Server architectural tiers KM5021.0
Notes:
Information Server functionality, products, and components are separated into four different
layers or tiers. During Information Server installation you specify which tier or tiers you want
to install on a particular computer system. Different tiers can be installed on the same or
different computers that are network connected.
These different tiers are described and discussed in the following pages.

Student Notebook
Architecture diagram
Information Server
Information Server Platform
Services 1 Repository 1
Client 1 .. N
Platform Services
Common Product-specific
Administrative Clients Services Services
Metadata
Desktop and Web Repository
Application Server
User Clients
Desktop and Web
Working Areas
Engine 1 .. N
DataStage/QualityStage
Scratch and Dataset
Information Server Engine
Information Analyzer data
QualityStage Match data Connectors, Packs, Service

QualityStage Modules Agents
ISD Resource Providers
Communication Agents
Figure 1-26. Architecture diagram KM5021.0
Notes:
Information Server clients include:
- Information Server Web Console (IS administration/reporting)
- DataStage/QualityStage clients (Administrator, Designer and Director)
- FastTrack client
- Metadata Workbench client
- Information Server Console: hosts Information Analyzer and Information Services
Director
- WebSphere Application Server (WAS) client
- Information Server Manager
- Multi-Client Manager
- Information Server Command Line Interface (istools)
Services:
V7.0
Student Notebook
Uempty - Uses IBM WebSphere Application Server (WAS) to implement the J2EE services
functionality
Repository:
- DB2, Oracle, and SQL Server
Parallel engine:
- A C++ compiler is required to compile DataStage, QualityStage, and Information
Analyzer jobs into an executable form capable of being run by the parallel engine.

Student Notebook
Platform topologies
Two-systems deployment Three-systems deployment
Services
Services
Figure 1-27. Platform topologies KM5021.0
Notes:
The diagram shows DB2 as the Repository database server, but Oracle and SQL Server
are also supported, as previously noted. Although only one Engine is shown for each
topology, Information Server supports multiple parallel engines on the same or separate
systems.
All tiers should be installed in the same physical LAN, connected by high-speed network
connections.
The Services and Engine platform types must match. The Repository database need not
match platform type of the Services and Engine.
V7.0
Student Notebook
Uempty
Client tier
Provides access to both administrative clients and user clients
Administrative clients include Information Server clients as well as clients specific
to Information Server hosted products:
Information Server Web Console
Security
Session maintenance
Logging and reporting management
DataStage Administrator client
DataStage global and project configuration and defaults
DataStage Designer client
Configuration file editing
Other Information Server products have a single client used for both administration and
user tasks
Administrative tasks require product administrator authorization
User clients for specific Information Server products and functional components:
Appropriate interfaces for the type of user (business or technical)
Facilitate the Information Server analysis, cleansing, integration, and delivery functions
Figure 1-28. Client tier KM5021.0
Notes:
Information Server products and components can be accessed through client components.
The client tier contains both administrative clients and user clients.
Some products and functionality are accessed through a web browser. These are called
thin clients, because the functional components exist on the server but are delivered to
the web browser.
Other clients are called thick clients, because functional components are installed and
exist on the client computer system as well as the server computer system.

Student Notebook
Services tier
Set of shared services that centralize core tasks across the platform
Administrative tasks such as security, user administration, logging, and
reporting
Repository services
Shared services allow these tasks to be managed and controlled in one
place, regardless of which product is using the service
Various product components add additional product-specific services to those

that are deployed
Deployed on IBM WebSphere Application Server (WAS)
Figure 1-29. Services tier KM5021.0
Notes:
The Services tier consists of a set of shared services that centralize core tasks across the
platform.
Some services address functionality that is unique to a specific Information Server product
or component. Other services, such as security services, are used across multiple products
and components.
The services tier is deployed within an IBM WebSphere Application Server (WAS) instance.
The computer system running the WAS instance is referred to as the domain or services
host system.
V7.0
Student Notebook
Uempty
Engine tier
Components
Engine: The high-performance, parallel engine that performs analysis,
cleansing, and transformation
Connectors: Provide common connectivity to external resources such as
DB2, Teradata, Oracle, Sybase, InfoSphere MQ, and others
Packs: provide high-speed connectivity to packaged enterprise applications
QualityStage modules: a set of integrated modules for accomplishing data
cleansing and re-engineering tasks such as Investigating, Standardizing,
Matching and Survivorship
Service Agents: manages bi-directional communication between the engine
processes and the Repository
To deploy the Engine tier to multiple machines, the Information Server engine
installation software is copied or NFS mounted to each engine server
Figure 1-30. Engine tier KM5021.0
Notes:
The engine tier consists of the following pieces:
- Information Server Parallel Engine: The high-performance, parallel engine that
performs analysis, cleansing and transformation processing
- Connectors: Provide common connectivity to external resources such as DB2,
Teradata, Oracle, Sybase, InfoSphere MQ, and others.
- Packs: provide high-speed connectivity to packaged enterprise applications
- QualityStage Modules: a set of integrated modules for accomplishing data cleansing
and re-engineering tasks such as Investigating, Standardizing, Matching and
Survivorship
- Service Agents: manages bidirectional communication between the engine
processes and the Metadata Repository
To deploy the engine tier to multiple computer, the Information Server engine software is
copied or NFS mounted to each server.

Student Notebook
Repository tier
Stores objects and metadata for Information Server and each
of its hosted products
Enables Information Server products to share metadata with
each other throughout the data integration lifecycle
For the Repository database (named XMETA by default), the
Information Server installation package comes with DB2
An existing instance DB2 instance can also be configured
If another DBMS is used (for example, Oracle), scripts must be run
before the installation to configure the Repository
Figure 1-31. Repository tier KM5021.0
Notes:
The Information Server Repository stores the objects and metadata produced and
consumed by Information Server hosted products and components. The Repository is
implemented as a database, named XMETA by default. Since all the products hosted by
Information Server use the same XMETA database, metadata produced by one product
can be shared with other Information Server products.
For the XMETA database, DB2 is supported. DB2 can be installed as part of the
Information Server installation or an existing DB2 instance can be used. Other database
systems, such as Oracle, are also supported.
V7.0
Student Notebook
Uempty
Tier interaction
2. Authentication Service
retrieves credential
information
Client
Client Services Repository
1. Client Logs in to IS Server .

using the IS Authentication Common
Service (using host and port) Services
Metadata
3. List of DSEngines and Repository
mapped credentials for
logged user
5. List of DS projects, jobs,

and design information.
Engine 8. Access services and

4. Logs in to Engine data on primary IS Server
Server (TCP/IP) using
Credential Mappings
and server short name Information
Server 7. Logs in to IS Server using the IS
Engine Authentication Service (using host and
port provided by user)
6. DS Job compile and

run information.
33
Figure 1-32. Tier interaction KM5021.0
Notes:
DataStage clients log into the IS Server and retrieve the DataStage credentials the users
are mapped to. The DataStage client, using the IS Authentication Service, logs into the IS
Server as follows:
- The host name and port number provided in the DataStage login window are used to
do an HTTP request with the IS server.
- The HTTP request is going to return the JNDI properties needed to establish a
remote EJB session between the client and the IS server. One of these JNDI
properties is the Provider URL which include the hostname and port number (from
the InfoSphere serverindex.xml file). The client uses JNDI lookups to call and work
with IS Services using the retrieved JNDI properties.
- The IS Server returns to the client the mapped credentials for the user. Even if
credential mapping is turned off (shared user registry mode), the credentials needed
to log in to the DataStage Server are returned from the IS Server (in this case, the
credentials will be the same as the ones used to login to the IS server). These will
allow the client to log onto the various DataStage Servers installed.

Student Notebook
Checkpoint
1. List the four Information Server platform functions?
2. Which IS product or component is used to build ETL (Extract,
Transform, Load) jobs?
3. Name an IS product or component that can be used for
metadata management of the IS shared Repository?
4. List the four IS architecture tiers.
Figure 1-33. Checkpoint KM5021.0
Notes:
Write your answers here:
V7.0
Student Notebook
Uempty
Exercises Unit 01
In this lab exercise, you will:
Identify Information Server functions and
associated components
Figure 1-34. Exercises Unit 01 KM5021.0
Notes:

Student Notebook
Unit summary
Having completed this unit, you should be able to:
Identify Information Server platform functional components
Identify Information Server platform component modules
Identify Information Server software architecture components
Figure 1-35. Unit summary KM5021.0
Notes:
V7.0
Student Notebook
Uempty Unit 2. Overview of Clients used for

Administration

This unit presents an overview of the Information Server clients used
for Information Server administration and for accessing Information
Server products and components.

Log in and explore Information Server dedicated administrative
clients, including:
- WebSphere Application Server (WAS) console
- Metadata Asset Manager
Log in and explore Information Server hosted product clients,
including:
- Console for IBM Information Server
- DataStage clients
- FastTrack
- Business Glossary
- Metadata Workbench

Lab exercises and checkpoint questions
Copyright IBM Corp. 2007, 2012 Unit 2. Overview of Clients used for Administration 2-1
Student Notebook
Unit objectives
clients, including:
WebSphere Application Server (WAS) console
Metadata Asset Manager
including:
Console for IBM Information Server
DataStage clients
FastTrack
Business Glossary
Metadata Workbench
Notes:
V7.0
Student Notebook
Uempty
Client-Server architecture overview
Client system, with fat clients and thin clients,

interacting with server systems
Services
3
Figure 2-2. Client-Server architecture overview KM5021.0
Notes:
The Information Server clients run on Windows only. Unless the server systems are also
running on Windows, the clients will be accessing the server systems from separate
computers. Typically, this is the case. Information Server includes both fat clients and
thin clients. Fat clients are those that require functionality to be installed on each Client
system. Thin clients do not require this. They provide a client interface to functionality that
is fully installed on the Server system.
In this diagram, the Repository, Services, and Engine tiers are all placed on one computer.
As mentioned earlier, this is just one possible deployment. For example, commonly, the
Engine tier is separated from the Repository and Services tiers.
Student Notebook
Information Server client icons
Thin Clients are accessed using a web browser,

Internet Explorer or Mozilla
Information Command Line

Server Interface (istool)
Manager
Fat clients icons
Import
Export
Manager
Information
Server
Console
FastTrack DataStage clients 4

Figure 2-3. Information Server client icons KM5021.0
Notes:
Thin clients include the Information Server Web Console, Business Glossary, and
Metadata Workbench. These are clients such that no client components are installed on
the client system. Any systems that support a web browser can access these clients.
Fat clients include the Information Server Console (which provides access to Information
Services Director and Information Analyzer), Information Server Manager, Multi-Client
Manager, Information Server Command Line Interface, IBM Import Export Manager,
FastTrack, and the DataStage clients.
The Command Line Interface (istool) and Information Server Manager clients are Engine
tier clients that are discussed in a later unit.
The Import Export Manager is a tool for importing metadata from business intelligence and
modeling tools outside of Information Server into the Information Server Repository.
V7.0
Student Notebook
Uempty
Dedicated administrative clients

Session management
User and group security
Report management
Logging
WebSphere Application Server Web Console
Configure and manage user registry
DataStage and QualityStage Operations Console
Monitor Engine status and job activity
Monitor OS resources
Metadata asset management products and tools
Information Server Manager, istool command line tool
Import / export metadata assets
Package deployment
Import metadata assets produced outside of Information Server into the Repository
Manage the Repository
Search and browse metadata assets
Metadata Workbench
Search, browse, and query metadata assets
View and analyze operational metadata
View relationships and dependencies between metadata assets (impact analysis)
Business Glossary
Manage business terms and categories
Link business terms to metadata assets
Figure 2-4. Dedicated administrative clients KM5021.0
Notes:
Within Information Server, there are a number of different clients used for different types of
administrative purposes. The Information Server Web Console is the primary general
administrative client within Information Server. Use it for configuring security and for
session management, among other tasks.
A WebSphere Application Server instance is used to configure and manage the Information
Server user registry.
DataStage jobs can be monitored using several different clients, including the DataStage
Designer and Director clients and command line utilities. The DataStage and QualityStage
Operations Console provides a web browser interface for monitoring jobs across all engine
systems and all DataStage projects. You can also use it to monitor the use of system
resources while the jobs are runnings.
Metadata asset management is accessible several Information Server products, including
Metadata Workbench and Business Glossary. There are also a number of different tools
devoted to metadata management tasks. Information Server Manager is devoted to
DataStage metadata assets. istool is command-line driven tool for exchanging assets from
Student Notebook
all Information Server products. Metadata Asset manager can be used to browse and
manage assets produced outside of Information Server, but consumed by Information
Server products.
V7.0
Student Notebook
Uempty
Administration within hosted products

Administrative functionality also exists within Information Server hosted
product clients
Data resource connectivity
Metadata management
Metadata import / export
Product projects configuration
Product clients with administrative functionality include:
DataStage / QualityStage: Engine configuration and monitoring, ODBC data
source configuration, metadata import, shared Repository metadata management
Operations Console: Engine monitoring
FastTrack: Data source connections, metadata import
Information Analyzer: Data source connections and configurations, metadata
import, IADB (Information Analyzer Database) configuration
Information Services Director: Including information services deployment
Figure 2-5. Administration within hosted products KM5021.0
Notes:
As mentioned earlier, some administrative functionality exists within product clients. Within
DataStage, Information Analyzer, and FastTrack, for example, data source connections
can be created and metadata can be imported. In addition, development work within
several products is done within projects. Project configuration is generally done within
product clients.
Student Notebook
Dedicated Administrative Clients
Figure 2-6. Dedicated Administrative Clients KM5021.0
Notes:
V7.0
Student Notebook
Uempty

Thin client
Primary access point for Information Server administration
functionality, including:
Session management
Users and groups
Logging management
Reporting management
Engine credential mappings
Provides links to Business Glossary and Metadata Asset
Manager
Figure 2-7. Information Server Web Console KM5021.0
Notes:
The Information Server Web Console is a thin client. No special installation components
need to be installed on a client system to access the Web Console. All that is needed is a
web browser.
Using the Web Console you can perform a number of tasks, which are discussed later in
this course, including session management, security, logging, reporting, and engine
credential mappings.
Although you can log into Business Glossary and Metadata Asset Manager directly, you
can also open these applications from within the Web Console.
Student Notebook
Logging into the Information Server Web Console
Information
Server Web
Console address
Information
Server
administrator ID
Figure 2-8. Logging into the Information Server Web Console KM5021.0
Notes:
To open the Administrative Web Console, open a web browser (Internet Explorer or
Mozilla) and then enter the Web Console address.
The console address is of the form: http://machine:nnnn/ibm/iis/console.
Here machine is the host name of the machine running the Services tier, that is, running
the WebSphere Application Server instance hosting the services.
nnnn is the port address of the console. By default, it is 9080.
The initial Information Server administrator ID and password are specified during
installation. The default administration ID is isadmin. After installation, new administrator
IDs can be specified.
V7.0
Student Notebook
Uempty
Information Server Web Console tabs

IS Administration
Link to Metadata
Asset Manager
Reporting
Link to Business
Glossary
Figure 2-9. Information Server Web Console tabs KM5021.0
Notes:
The Information Server Web Console is an interface to several different administrative
functions.
The Administration tab is where you perform general IS administrative tasks, including
session management, managing users, and logging.
The Reporting tab is where IS reports can be created and managed. Reports related to
specific IS products, such as FastTrack or Metadata Workbench, can also be accessed
and managed within those clients.
The Glossary tab is the Business Glossary (BG) administrative interface where BG
administrators can create and manage terms, categories, and stewards.
The Information Services Catalog can be used to publish Information Services Director
services to the IBM WebSphere Service Registry and Repository application. This
application supports the annotation of services with information that is used to select, start,
govern, and reuse services.
Student Notebook
The Repository Management tool can be used to browse all physical data resources and
metadata assets in the Repository. Redundant or unnecessary metadata assets can be
managed or deleted.
V7.0
Student Notebook
Uempty
Web Console functionality

The Administration tab is where you perform general Information Server
administrative tasks, including session management, managing users,
and logging.
The Reporting tab is where Information Server reports can be created

and managed.
Reports related to specific IS product components, such as FastTrack or
Metadata Workbench, can also be accessed and managed within those clients
The Glossary tab provides a link to Business Glossary
The Information Services Catalog can be used to publish Information

Services Director services to the IBM WebSphere Service Registry and
Repository application
This application supports the annotation of services with information that is used
to select, start, govern, and reuse services
The Repository Management tab provides a link to Metadata Asset

Manager
Figure 2-10. Web Console functionality KM5021.0
Notes:
The Information Server Web Console is an interface to several different administrative
functions.
The Administration tab is where you perform general IS administrative tasks, including
session management, managing users, and logging.
The Reporting tab is where IS reports can be created and managed. Reports related to
specific IS products, such as FastTrack or Metadata Workbench, can also be accessed
and managed within those clients.
The Glossary tab is the Business Glossary (BG) administrative interface where BG
administrators can create and manage terms, categories, and stewards.
The Information Services Catalog can be used to publish Information Services Director
services to the IBM WebSphere Service Registry and Repository application. This
application supports the annotation of services with information that is used to select, start,
govern, and reuse services.
Student Notebook
The Repository Management tool can be used to browse all physical data resources and
metadata assets in the Repository. Redundant or unnecessary metadata assets can be
managed or deleted.
V7.0
Student Notebook
Uempty

Requires Common Metadata Administrator authorization role
for full functionality
Search and browse physical data resource metadata (PDR)
and business intelligence (BI) metadata in the Information
Server Repository
PDR includes database, data file, hosts, etc.
BI includes metadata imported into the Repository from business
intelligence tools
Import PDR and BI metadata assets produced outside of
Information Server into the Repository
Managing metadata assets in the Repository:
Delete assets
Manage orphaned assets
Manage duplicate assets
Figure 2-11. Metadata Asset Manager KM5021.0
Notes:
Metadata Asset Manager is discussed in detail in a later unit. It has three main categories
of functionality. With Metadata Asset Manager (IMAM) you can import business intelligence
(BI) and physical data resource metadata (PDR) into the Information Server Repository.
These types of metadata are consumed by Information Server products. You can also
search and browse these types of metadata within the Repository.
Only a subset of the metadata stored within the Repository is visible within IMAM. To view
all the metadata, log into Metadata Workbench.
You can also manage metadata assets using IMAM. You can delete assets as well as
import assets. And you can search for duplicate or orphaned assets.
Student Notebook
Repository Management tab
Search metadata
assets
Browse metadata
assets
Manage Repository
assets
Copyright IBM Corporation 2007,

2007,2011
2012
Figure 2-12. Repository Management tab KM5021.0
Notes:
This graphic shows the Repository Management tab in IMAM. Here you can browse and
search through the categories of PDR and BI metadata stored in the Repository. Notice the
categories of metadata assets you can browse listed in the Browse Assets folder.
At the bottom of the Navigation panel, you can search and manage duplicate metadata
assets and disconnected metadata assets.
V7.0
Student Notebook
Uempty

Thin client
A WAS instance hosts the Information Server Metadata Server
backbone
Metadata Server provides services to the IS functional components
By default, named server1
Most IS administrative tasks can be done through the Web Console
A few may need to be done through the WAS console, including:
Changing the user registry configuration
Trouble-shooting
Log in through the Integration Solutions Console client for the
WebSphere Application Server
Thin client
Address: http://servername:9060/ibm/console
Replace servername with the name of your services host system
Interface to several different server types, including WebSphere application
servers, WebSphere MQ servers, and Web servers
Log into WAS using WAS administrator ID (wasadmin)
Can log into the Integration Solutions Console using IS administration ID
Figure 2-13. WebSphere Application Server (WAS) console KM5021.0
Notes:
Like the Information Server Web Console, the WebSphere Application Server (WAS)
console is a thin client. You log into the client using a web browser. Enter the following
address: http://servername:9060/ibm/console. Here, replace servername by the name
of the system where the WAS is installed. This is also known as the services system
because the WAS provides the services to the Information Server products and
components.
A WAS instance may host multiple server instances. The server instance that provides the
services for Information Server is called the Metadata Server component of Information
Server and it is named, by default, server1.
By default the WAS administrator user ID is wasadmin. It is important not to confuse the
WAS administrator with the Information Server administrator, which by default it isadmin.
Student Notebook
WAS servers
Applications servers
IS server instance
Figure 2-14. WAS servers KM5021.0
Notes:
This graphic shows the main window of the Console. The Servers folder lists the servers
hosted by this WAS instance. In this example, only one server named server1 is hosted.
This is the Metadata Server component of Information Server, which provides the services
to Information Server products.
V7.0
Student Notebook
Uempty
Product Clients
Figure 2-15. Product Clients KM5021.0
Notes:
Student Notebook
Engine clients
DataStage / QualityStage clients
Administrator client
DataStage / QualityStage administration
Configure DataStage development environment
Configure Engine runtime environment
Designer client
Build DataStage jobs
Run DataStage jobs
Monitor DataStage jobs as they run
Director client
Run and monitor DataStage ETL jobs
Operations Console
Monitor DataStage jobs as they run
Multi-Client Manager
Switch between different DataStage client versions
Figure 2-16. Engine clients KM5021.0
Notes:
The Information Server Engine system refers to a computer system where DataStage is
installed. It is called the Engine because this is the system where jobs are run that perform
various Information Server tasks. Within an Information Server domain there can be
multiple engine systems.
DataStage actually has two engines: the parallel engine and the server engine. These refer
to two types of DataStage jobs that can be run: parallel jobs and server jobs. When the
word engine is used without qualification, it refers to the parallel engine.
Engine clients refers to the DataStage product clients (Designer, Administrator, Director) as
well as the clients for other products and components associated with DataStage. The
Operations Console is a client used to monitor running DataStage jobs. This client is
discussed in a later unit. The Multi-Client Manager is a client used to switch between
different versions of DataStage.
V7.0
Student Notebook
Uempty
Multi-Client Manager
The Multi-Client Manager allows multiple versions of
DataStage/QualityStage clients to exist on a single Client system.
Only one set/version of clients can be active at any one time.
Multi-Client Manager allows developers to switch between different
client versions
The IS installation wizard detects previous client versions and registers
them with Multi-Client Manager
Multiple versions
would be listed if
they existed. Here
only 9.1 is installed.
Figure 2-17. Multi-Client Manager KM5021.0
Notes:
The Multi-Client Manager allows multiple versions of InfoSphere DataStage and
QualityStage clients (Designer, Director, and Administrator) to exist on a single Client
system. Only one set and version of clients can be active at any one time.
Multi-Client Manager is needed when the same computer system is being used to connect
to two different versions of DataStage. Different versions of DataStage require different
versions of the clients. You cannot, for example, connect a DataStage Designer v8.2 to a
v9.1 DataStage server.
If the Multi-Client Manager is already installed, the installation wizard detects and registers
the new versions of DataStage clients when they are installed.
Student Notebook
DataStage Administrator tasks

Add and delete DataStage projects
Enable / Disable Runtime Column Propagation (RCP)
Environment variable settings. Some examples:
LD_LIBRARY_PATH (General Category)
Specify paths to database libraries
APT_CONFIG_FILE (Parallel Category)
Path to default DataStage configuration file
Operator-specific defaults, e.g., database specific variables like:
APT_DB2INSTANCE_HOME
APT_DBNAME
Reporting information defaults
APT_DUMP_SCORE: Display Score in the job log
OSH_DUMP: Display the OSH in the job log
APT_STARTUP_STATUS: Display DataStage job startup information
Set DataStage user permissions
Set Parallel defaults:
OSH visibility
Format defaults
Sequence defaults
Restart
Logging
Director logging defaults
Auto-purge
Figure 2-18. DataStage Administrator tasks KM5021.0
Notes:
DataStage developers work with projects. A project stores the objects, such as DataStage
jobs, that the developers build. Multiple DataStage developers can work within the same
project. In order to work within a particular project a user must be authorized. As will be
discussed later, authorization is provided partially within the Information Server Web
Console and partially within the DataStage Administrator client.
The development and runtime environments for a particular DataStage project is specified
within the DataStage Administrator client. In addition, there is a set of environment
variables, configured within the Administrator client, that set the project environment.
These include variables that specify database libraries that DataStage jobs will access
(LD_LIBRARY_PATH) and variables that determine how much information is logged
during a DataStage job run (for example, APT_DUMP_SCORE).
V7.0
Student Notebook
Uempty
Logging Into Administrator
Host name of
services system
DataStage
administrator ID
and password
Name of DataStage
server system
Figure 2-19. Logging Into Administrator KM5021.0
Notes:
This graphic shows the log in screen for DataStage/QualityStage Administrator client.
In the Host name of the services tier type the name of the system that hosts the services.
This is the system where the WAS instance is installed.
In the User name and Password boxes type the user name and password with DataStage
Administrator role authorization and with DataStage credentials.
Multiple DataStage Servers can exist either on the same or on different systems. In the
Host name of the Information Server engine box, you select the server system that has
the DataStage projects you want to work with.
Student Notebook
DataStage Administrator Projects tab
Add / Delete
projects
DataStage projects Specify project

properties
Link to Information Server

Web Console
Figure 2-20. DataStage Administrator Projects tab KM5021.0
Notes:
This graphic shows the Projects tab in the Administrator client. It lists all
DataStage/QualityStage projects. Click the Properties button to configure the properties
and environment for the project.
You can also add and delete projects from this window.
V7.0
Student Notebook
Uempty
DataStage Administrator General Tab
Enable / Disable
Runtime Column
Propagation (RCP)
Environment
variable settings
Figure 2-21. DataStage Administrator General Tab KM5021.0
Notes:
This graphic displays the Project Properties window for the project selected on the
Projects tab. When it opens you are placed on the General tab.
Runtime Column Propagation (RCP) allows data to flow through DataStage job stages
without being explicitly mapped from input columns to output columns. This is a very
powerful feature which can be used to simplify development and to create flexible
components and jobs. Unless it is carefully managed, however, it can lead to unexpected
errors. It is recommended that, if it is enabled, it is not specified as the default setting for
new Parallel jobs. This is the setting shown in the graphic.
The General tab also provides access to the environment variables. Click the
Environment button to display the environment variables settings.
Student Notebook
DataStage environment variables settings
Environment Default values

variables Export variables
and settings to a
file
Figure 2-22. DataStage environment variables settings KM5021.0
Notes:
Click the Environment button on the General tab to specify environment variables. There
are several folders of environment variables. The variables listed under the Parallel branch
apply to Parallel jobs.
You can also specify your own environment variables under the User Defined branch.
These variables can be passed to jobs through their job parameters to provide project level
job defaults.
V7.0
Student Notebook
Uempty
Permissions tab
Assigned role
DataStage users
Add a user
Figure 2-23. Permissions tab KM5021.0
Notes:
The Permissions tab lists IS users and groups that have a DataStage Administrator role
and users and groups that have a DataStage User role and have been added by a
DataStage Administrator.
When Suite users or groups that have a DataStage Administrator role are added, they are
automatically entered here and assigned the role of DataStage Administrator.
Suite users or groups that have a DataStage User role need to be manually added. To
accomplish this, click the Add User or Group button. Then you need to select the
DataStage user role (Operator, Super Operator, Developer, Production Manager) that this
user ID is to have.
Student Notebook
Parallel tab
OSH visibility
Format defaults
Figure 2-24. Parallel tab KM5021.0
Notes:
This graphic shows the Parallel tab. Here you can enable OSH visibility (recommended in
most cases on development platforms) and you can specify standard data type formats for
date, time, and timestamp strings.
V7.0
Student Notebook
Uempty
Job Sequence defaults
Restart
Logging
Figure 2-25. Job Sequence defaults KM5021.0
Notes:
This graphic shows the Sequence tab. Here you can specify defaults for job sequences.
Job sequences are DataStage jobs that control batches of other DataStage jobs. You can
use them to run a batch of DataStage jobs (including parallel jobs, server jobs, and other
job sequences) in a particular order and with specified triggers.
A major feature of job sequences is that they are restartable. This means that if a job aborts
after a number of other jobs have successfully run, the job sequence can be restarted
where it left off, with the aborted job. This and other options can be turned on by default.
Regardless of the settings specified here, they can be overridden at the job sequence level.
Student Notebook
DataStage job log defaults
Auto-purge
Figure 2-26. DataStage job log defaults KM5021.0
Notes:
This graphic shows the Logs tab. Here you can specify defaults for the Director job logs
including purging defaults. Job log messages are stored in Repository. Each time a job is
run, it generates many messages that are stored in the Repository until they are purged.
Here, you can specify purging defaults.
You can also specify filtering defaults for operational repository logging. Operational
logging messages are written to the operational respository, which contains messages
that are available to other Information Server products such as the DataStage and
QualityStage Operations Console. Information Server administrators using the Operations
Console are less interested in the informational and warning messages that are written to
the job log, which DataStage developers are probably more interested in. This optional
allows a number of these informational and warning messages to be filtered out of the
operational repository.
V7.0
Student Notebook
Uempty
DataStage Designer administrative tasks

DataStage developers use the Designer client to build, run, and monitor
their DataStage jobs
Administrative tasks within Designer include managing data sets,
managing configuration files, backing up DataStage objects
Data sets
Temporary files used in DataStage jobs
Can be managed using the Designer Data Set Management tool
In Designer, click Tools> Data Set Management
Configuration files
Configuration files specify the degree of parallel-ness (number of nodes) and
other resources used when a job runs
All DataStage parallel jobs run under a specified configuration file
Can be managed using the Designer Configurations tool
In Designer, click Tools>Configurations
Backing up DataStage objects
Figure 2-27. DataStage Designer administrative tasks KM5021.0
Notes:
In addition to the administrative tasks performed in the DataStage Administrator client,
there are also administrative tasks that can only be performed in the DataStage Designer
client. These tasks, which will be discussed in more detail in later units, include managing
data sets, managing configuration files, and backing up DataStage objects.
Student Notebook
Logging into Designer
Host name of
services (WAS)
system
DataStage
user ID
Name of DataStage
server system followed
by name of the
DataStage project
Figure 2-28. Logging into Designer KM5021.0
Notes:
Logging into Designer is like logging into Administrator, except that in Designer you are
logging into a specific DataStage project. You select this project in the Project list. Multiple
DataStage servers can exist either on the same or on different systems. The name of the
project is preceded by the name of the DataStage server that hosts it.
The user ID entered here requires a DataStage Administrator or DataStage Developer role.
These roles are discussed in a later unit.
V7.0
Student Notebook
Uempty
Designer work area

Repository Menus Toolbar
Parallel
canvas
Palette
Figure 2-29. Designer work area KM5021.0
Notes:
The appearance of the Designer work space is configurable. The graphic shown here is
only one example of how you might arrange the GUI components.
In the right center is the Designer canvas, where you create stages and links. On the top
left is the Repository window. Items in the Repository, such as jobs and table definitions
can be dragged to the canvas area. On the bottom left is the Palette, which contains
stages you can add to the canvas.
Shown on the canvas is an example of a DataStage ETL (Extraction Transformation Load)
job. The stages are functional components of the job. The links are like pipes through
which data flows. This job reads a sequential file, transforms the data, then writes it to DB2
tables using the DB2 Connector stage.
Student Notebook
Monitoring a running DataStage job

Displays the status of each job at runtime
Displays messages that are generated by each job as it runs
Performance monitor displays runtime statistics on a partition
basis
Row counts per stage (operator) per partition
Performance statistics are also displayed on the Designer canvas as
the job runs
The job log can be viewed within Designer or Director
Designer job log viewing is limited to the job currently open in
Designer
Figure 2-30. Monitoring a running DataStage job KM5021.0
Notes:
A job can be run from Designer or Director. When it is run from Director, it displays runtime
statistics on the diagram as it runs.
When a job runs, it generates messages that are written to the job log. In both Designer
and Director, a window can be opened to view the job log messages. In Designer, click
View>Job Log to view the messages written by the job opened on the canvas.
V7.0
Student Notebook
Uempty
Performance statistics in Designer
Figure 2-31. Performance statistics in Designer KM5021.0
Notes:
When a job runs it collects statistical information. These statistics show up in the job log
and also on the Designer client diagram, if it is open.
In this graphic, a job open on the Designer canvas is running. For each link, through which
data is flowing, row throughput (rows/sec) is provided.
The links also turn colors as the job runs. They turn blue when data begins flowing through.
The turn green when all the rows have been successfully processed through the link. They
turn red if errors occur during the processing of the rows.
Student Notebook
Director client Status View
Figure 2-32. Director client Status View KM5021.0
Notes:
Click Tools>Run Director to move from the Designer client to the Director client. This
graphic shows the Director Status View window. Here you see the status of the jobs in the
project: Compiled, Not Compiled, Running, Aborted.
V7.0
Student Notebook
Uempty
Job log messages

Click the open
book icon to view
log messages
Messages.
Double-click to
open
Figure 2-33. Job log messages KM5021.0
Notes:
Click the Log button in the toolbar to view the job log for a job selected in the Status View.
The job log records events that occur during the execution of a job.
These events include control events, such as the starting, finishing, and aborting of a job;
informational messages; warning messages; error messages; and program-generated
messages.
You can also open a window in Designer to view these messages for an open job, without
having to open the job in Director.
Student Notebook
DataStage and QualityStage Operations Console

Thin client
URL: http://server:9080/ibm/iis/ds/console/index.html
Log in using Operations Console user ID
Monitor jobs running on any server in any project in the domain
View job run information, including:
Job run times
Configuration file used
Performance information
Log information
View system resources (CPU, memory) as the jobs are
running
Figure 2-34. DataStage and QualityStage Operations Console KM5021.0
Notes:
The DataStage and QualityStage Operations Console is a thin client used to monitor
running DataStage jobs. Like with the monitoring functionality in DataStage Designer and
Director, you can view the job log messages as a job runs. In addition, you can monitor the
resource usage as the jobs are running.
The Operations Console also displays information about the DataStage environment,
including environment variable settings and project objects.
V7.0
Student Notebook
Uempty
Operations Console
Job activity
Engine status
System
resources
Figure 2-35. Operations Console KM5021.0
Notes:
In this graphic, you see the Dashboard tab of the Operations Console. The Operations
Console opens to the Dashboard tab, which contains three sections of information. The
Job Activity section shows which jobs are currently running and their statuses within a
time range, for example, last 10 minutes.
The Operating System Resources section displays the CPU usage and free memory that
is currently available within a time range.
The Engine Status section displays the current status of engine services, including the
Operational Console services and WLM (Workload Management).
Student Notebook
FastTrack
Fat client
Logon procedure same as for other fat clients
Used to create mapping specifications
Defines mappings, filters, and transformations between source and
target columns
DataStage jobs can be generated from mapping specifications
Administrative tasks
Define source connections
Import metadata of mapping specification sources and targets
FastTrack projects configuration
Figure 2-36. FastTrack KM5021.0
Notes:
Logging into FastTrack is similar to logging into other fat clients. You specify the services
system as the port used to communicate with it, and you specify a user ID and password
with FastTrack credentials.
FastTrack is a product designed to work with DataStage. With FastTrack you can create
mapping specifications that document the mappings and transformations of a DataStage
job. This mapping specification can be used to document a DataStage job, as well as to
provide a DataStage developer with specifications for building it.
From mapping specifications, prototype DataStage jobs can be generated, which
implement the mappings and transformations specified in the mapping specification.
V7.0
Student Notebook
Uempty
FastTrack data source configuration
Existing
Connection New Connection
Import Metadata
Figure 2-37. FastTrack data source configuration KM5021.0
Notes:
One administrative task you may be called on to perform with respect to FastTrack is to
define data resource connections to database tables. These database table definitions are
stored in the Information Server Repository, to be used by FastTrack as well as other
Information Server products, such as Information Analyzer.
After a connection has been defined, developers can import metadata for selected
schemas and tables, to be used in their mapping specifications.
Student Notebook
Business Glossary
Thin client
URL: http://server:9080/bg
Also accessible from the Information Server Web Console
Create and manage business metadata assets, including:
Terms
A word or phrase that describes a metadata asset in business terms
Stewards
A user or group of users assigned responsibility for a metadata asset
Categories
A specified folder-type object to organize your Glossary content
Link terms and stewards to Repository assets
Notes:
Business Glossary supports metadata management from the business users point of view.
With Business Glossary, developers can create a glossary of business terms that
document and explain Information Server assets. These terms can be linked to the assets,
so they are accessible to developers working with the assets.
Stewards can be assigned to specific metadata assets. A steward may be a subject matter
expert with respect to the specific asset, one who can be contacted by others for
information about the asset.
V7.0
Student Notebook
Uempty
Business Glossary
Browse
business terms Assign terms,
and categories labels, stewards to
assets
Create
business terms
and categories
Notes:
This graphic shows the Business Glossary tab where a developer can create and
manage terms and categories, and create and manage data stewards.
Student Notebook
Metadata Workbench (MWB)

Thin client
Address: http://servername:9080/workbench
Functions:
Browse, search, query Information Server metadata assets
View relationships and dependencies between metadata assets
View the flow of data across metadata assets
Browse the Information Server metadata model
Defines the format of all metadata stored in the IS Repository
Figure 2-40. Metadata Workbench (MWB) KM5021.0
Notes:
Metadata Workbench is another thin client. It is the primary tool within Information Server
for viewing, monitoring, and analyzing the metadata assets stored in the Information Server
Repository.
With Metadata Workbench you can not only browse and query metadata assets, but you
can view diagrams that document relationships and dependencies between them, and you
can view the flow of data through a set of metadata assets.
V7.0
Student Notebook
Uempty
Metadata Workbench Administration /

Metadata model
Browse
Search and query
Engine
asset
DataStage
project
Figure 2-41. Metadata Workbench KM5021.0
Notes:
On the Browse tab you can browse different types of metadata assets. Shown here is an
Engine asset, which includes DataStage project assets.
On the Discover tab you can search and query metadata assets.
On the Advanced tab you can perform MWB administrative functions. For example, you
can run the Automated Metadata Services which detects and retrieves for analysis
relationships between IS metadata assets.
On the Advanced tab you can also view the Metadata model, which lists and describes all
metadata assets.
Student Notebook
Viewing the Information Server Metadata Model
Model View
Host asset
details
Metadata Common Model

Figure 2-42. Viewing the Information Server Metadata Model KM5021.0
Notes:
This graphic shows the Advanced>Model View tab. Here you can browse the metadata
model used for defining and organizing Information Server metadata assets. This model
documents the meaning of the different assets stored within the Information Server
Repository. This model is discussed in more detail in a later unit.
V7.0
Student Notebook
Uempty
Information Server Console

Fat client
Access to Information Analyzer and Information Services
Director (ISD)
Information Analyzer (IA) administrative tasks
Configure a connection to the analysis database
Validate the configuration
Configure an IA project
Create a data store (data sources whose data will be analyzed)
Information Services Director administrative tasks
Configure connections to information components such as DataStage
that will implement services defined in ISD
Configure an ISD project
Copyright IBM Corporation 2007,

2007,2011
2012
Figure 2-43. Information Server Console KM5021.0
Notes:
The Information Server Console provides access to two different Information Server
products: Information Analyzer (IA) and Information Services Director (ISD). (Information
Services Director is also known as WISD, because it used to be a WebSphere product.)
Information Analyzer is used to analyze data in order to determine its quality and formats. It
might be used to analyze the data sourced by DataStage jobs, and it might be used to
analyze the data loaded into a data warehouse by DataStage jobs.
Information Services Director is used to wrap DataStage and QualityStage ISD jobs and
other function components into services that can be delivered to consumers.
Student Notebook
Logging on to the Information Server Console
Figure 2-44. Logging on to the Information Server Console KM5021.0
Notes:
This graphic shows the log in screen of the Information Server Console. Here, you specify
the host name of the services tier and a user ID and password for logging into Information
Analyzer or Information Services Director. Although the Information Server Console is used
to access both products, there are separate user authentication roles for each product.
Once you are in the Console, you can open a project specific to either Information Analyzer
or Information Services Director.
V7.0
Student Notebook
Uempty
Information Server Console Home tab
Configure data
source
Create a project
Figure 2-45. Information Server Console Home tab KM5021.0
Notes:
This graphic shows the Home tab of the Information Server Console.
Click the Home menu for access to configuration tasks. Here you can create and edit
projects. Here, the project you create or open can be either an Information Services project
or an Information Analyzer project.
Student Notebook
Information Server Console System Configuration menu
Home menu Console

dashboard
Import
metadata
Define data stores
Figure 2-46. Information Server Console System Configuration menu KM5021.0
Notes:
This graphic shows the Information Server Console Configuration menu. This is the
menu, an administrator would use to configure Information Analyzer data sources and
connections. A later unit discusses this configuration in detail.
V7.0
Student Notebook
Uempty
Checkpoint questions
1. How would you distinguish a thin client from a thick client?
2. Name two Information Server thick clients?
3. What role does WebSphere Application Server (WAS) play in
Information Server?
Figure 2-47. Checkpoint questions KM5021.0
Notes:
Student Notebook
Exercises Unit 02
Log into and explore the Information
Server Web Console Administration
and Reporting tabs
Log into and explore the Metadata
Asset Manager thin client
Log into and explore the WebSphere
Application Server (WAS) Integrated
Solutions Console
Log into and explore the Information
Server Console
Log into and explore DataStage
client functionality
Log into and explore the DataStage
and QualityStage Operations
Console
Log into and explore the FastTrack
client
Log into and explore Metadata
Workbench
Notes:
V7.0
Student Notebook
Uempty
Unit summary
clients, including:
including:
Console for IBM Information Server
DataStage clients
FastTrack
Business Glossary
Metadata Workbench
Notes:
Student Notebook
V7.0
Student Notebook
Uempty Unit 3. Authentication and Suite Security

This unit describes how to configure Information Server security for
users and groups.

Configure the authentication registry
Create Information Server users
Configure Suite Users and Groups
Configure DataStage credentials for Engine users

Copyright IBM Corp. 2007, 2012 Unit 3. Authentication and Suite Security 3-1
Student Notebook
Unit objectives
Notes:
V7.0
Student Notebook
Uempty
IS Authentication Registry Overview
Figure 3-2. IS Authentication Registry Overview KM5021.0
Notes:
Student Notebook
Security administration tasks

Choose and configure the user registry
Stores user account information
User IDs and passwords
User attributes: email address, company
Configured in WebSphere Application Server (WAS) Console
In WAS, click Security>Global security
Create Information Server (IS) user and group accounts and
assign security roles
Configure Engine security
Figure 3-3. Security administration tasks KM5021.0
Notes:
A user registry stores user account information. This includes IDs and passwords as well
as user attributes, such as email addresses. A default user registry is created during
Information Server installation. After installation, it can be configured in the WebSphere
Application Server (WAS) Console.
V7.0
Student Notebook
Uempty
Information Server authentication

WebSphere Application Server (WAS) is used for Information Server
authentication and security
Supported user registries include:
Internal registry
Least complex
Suitable for small-scale installations
Stored in IS Repository
Operating System registry
Suitable for small-scale installations, if internal registry is unsuitable
User attributes are still stored in IS Repository
IS Directory Service communicates with the registry through WAS
LDAP
Most complex to configure
Most powerful
Support for features such as password policies
User attributes are still stored in IS Repository
IS Directory Service communicates with the registry through WAS
Figure 3-4. Information Server authentication KM5021.0
Notes:
Information Server uses WAS for authentication and security. Three types of user registries
are supported. One supported registry is the Information Server internal registry, which is
created and configured by default during Information Server installation. This is the least
complex type of user registry, and is suitable for small-scale installations.
After installation, Information Server can be configured to use either an operating system
(OS) user registry or an LDAP user registry. Even when these alternative registries are
used, user attributes are still stored in the Information Server Repository. The LDAP user
registry is the most powerful, with features such as enforceable password policies.
However, it is also the most complex to configure.
Student Notebook
Architecture for internal user registry
Figure 3-5. Architecture for internal user registry KM5021.0
Notes:
This graphic depicts the architecture when the internal user registry option (the default) is
chosen.
This graphic assumes that Repository and Services (WAS) tiers are both on the same
computer. The top graphic represents a client system, which interacts with the Information
Server Directory service when a user logs into an Information Server product through its
client. The user IDs and passwords, and the user roles they possess, are all stored in the
Repository, along with other user attributes. The Directory service checks the login
information with the information stored in the Repository.
V7.0
Student Notebook
Uempty
Architecture for an OS user registry
Figure 3-6. Architecture for an OS user registry KM5021.0
Notes:
This graphic shows the architecture when the operating system user registry option is
chosen.
client. The user IDs and passwords, and the user roles they possess, are all stored in the
local operating system user registry. The other user attributes are stored in Repository. The
Directory service checks the login information through the WAS, which checks the
information stored in the operating system registry.
Information about the other user attributes is still retrieved directly from the Repository by
the Directory Service.
Student Notebook
Architecture for an LDAP external user registry
Figure 3-7. Architecture for an LDAP external user registry KM5021.0
Notes:
This graphic shows the architecture when the LDAP option is chosen.
client. The user IDs and passwords, and the user roles they possess, are all stored in the in
the external LDAP user registry. The other user attributes are still stored in Repository. The
Directory service checks the login information through the WAS, which checks the
information stored in the LDAP registry.
Information about the other user attributes is still retrieved directly from the Repository by
the Directory Service.
V7.0
Student Notebook
Uempty
WAS security configuration
Custom registry. Select

if implementing an IS
internal user registry
Choose OS or
LDAP user registry
Configure OS or
LDAP user registry
Figure 3-8. WAS security configuration KM5021.0
Notes:
This graphic depicts how the user registry is selected in WAS. After you log into WAS, click
Security>Global security. The Current realm definitions box identifies the type of user
registry that has been selected. By default, after Information Server installation, the
selection is Standalone custom registry. This is configured as an Information Server
internal user registry.
After installation, the user registry type can be changed. Select the type of user registry,
and then click Configure to configure it.
See the Information Server Administration Guide for more details. The Administration
Guide will point you to the relevant information for configuring WAS.
Student Notebook
IS Web Console User Registry Configuration tab

Domain Management>User Registry Configuration page of the
Information Server Web Console Administration tab
Displays two types of user registry usage:
Use the IS internal user registry
Use the WAS J2EEProvider to connect to the user registry specified by the WAS
instance
Configuration here is automatically updated as needed when there is a
change to the configuration in WAS
Occurs after WAS is restarted
Using internal
user registry
Figure 3-9. IS Web Console User Registry Configuration tab KM5021.0
Notes:
You can determine the current user registry type from within the Information Server Web
Console on the Administration>Domain Management>User Registry Configure panel.
The type of user registry currently in effect is indicated. (Note that this panel is read-only.)
In particular, it identifies whether the user registry is an Information Server internal user
registry, accessed through the Information Server Directory Service, or whether it is a user
registry the Directory Service connects to through WAS.
In this example, Information Server is configured to use its internal user registry.
V7.0
Student Notebook
Uempty
Switching to the local OS user registry

This is done after IS installation
Recommend doing this as soon as possible after installation to avoid
issues concerning pre-existing user IDs
Create or choose an OS user for WAS administration directed
to the local user registry
Can be the same as the WAS installation owner
In WAS, click Security>Global Security
Select Local operating system from the Available realm
definitions list and then click the Configure button
Specify user for WAS administration directed to the local user
registry and then save edits
Set your configuration as current
After stopping WAS, run AppServerAdmin command
Propagates WAS administrator user ID to the WAS instance
Restart WAS
Figure 3-10. Switching to the local OS user registry KM5021.0
Notes:
During installation, Information Server is configured to use its own internal registry. After
installation, this can be changed to a local OS user registry. It is recommended that you do
this as soon as possible after installation to avoid issues concerning IDs created after
installation, but before the switch.
As noted, this configuration change is done in WAS. After the configuration changes are
made in the WAS, WAS needs to be restarted for the change to take effect.
Student Notebook
Configuring the local OS user registry
WAS registry
administrator
WAS registry
administrator
Figure 3-11. Configuring the local OS user registry KM5021.0
Notes:
This graphic indicates the central properties that need to be edited, if you are configuring a
local operating system user registry in WAS.
V7.0
Student Notebook
Uempty
Configuring the local OS user registry, continued
Set as current
New registry
configuration
Figure 3-12. Configuring the local OS user registry, continued KM5021.0
Notes:
After specifying the properties you need to select the new registry configuration and then
click the Set as current button.
Student Notebook
Switching to the LDAP user registry

Select Standalone LDAP registry from the Available realm
definitions list and then click the Configure button
Specify a valid user for WAS administration directed to the
LDAP registry
Select the type of LDAP Server and specify its host name and
port
Enter the base distinguished name (DN) to limit scope search
Specify additional settings
Save your configuration
Select configuration as current
After stopping WAS, run AppServerAdmin command
Propagates WAS administrator user ID to the WAS instance
Restart WAS
Figure 3-13. Switching to the LDAP user registry KM5021.0
Notes:
During installation, Information Server is configured to use its own internal registry. After
installation, this can be changed to an LDAP user registry. It is recommended that you do
this as soon as possible after installation to avoid issues concerning IDs created after
installation, but before the switch.
As noted, this configuration change is done in WAS. After the configuration changes are
made in the WAS, WAS needs to be restarted for the change to take effect.
V7.0
Student Notebook
Uempty
Configuring the LDAP user registry
Administrative ID
LDAP Server type
LDAP Server host

Additional server
identity used for Base DN
internal process
communication
Figure 3-14. Configuring the LDAP user registry KM5021.0
Notes:
This graphic highlights the central properties that need to be specified if you are configuring
an LDAP user registry.
Student Notebook
Configuring the LDAP user registry, continued
Set as current
New registry
configuration
Figure 3-15. Configuring the LDAP user registry, continued KM5021.0
Notes:
After specifying the properties you need to select the new registry configuration and then
click the Set as current button.
V7.0
Student Notebook
Uempty
Switching a user registry for a system in use

If your IS system has been used for awhile by multiple users,
you must clean up the security repository
Not necessary if you switch the user registry immediately after IS
installation
Deletes existing users and groups
On the Services tier computer run the Directory Admin tool
Found in the ASBServer bin directory
Commands:
DirectoryAdmin.sh delete_groups
DirectoryAdmin.sh delete_users
Figure 3-16. Switching a user registry for a system in use KM5021.0
Notes:
Things are more complicated if you switch user registries after the initial registry has been
in use for some time. The problem is with users and groups that were created in the initial
internal registry. These users must be removed before changing to a new user registry.
You can use the DirectoryAdmin.sh -delete command to delete existing users and
groups. It will be necessary to recreate these users and groups in the new registry.
Student Notebook
Engine Security Configuration
Figure 3-17. Engine Security Configuration KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Engine security configuration

The IS Engine (DataStage Engine) performs user
authentication separately from other IS Server components
The user registry used by the Engine is, by default, different
from the user registry used by Information Server
By default, IS uses the internal user registry in the IS Repository
By default, the Engine uses the local OS user registry on the
computer where the Engine is installed
When the Engine user registry is different from the IS user
registry, credentials have to be mapped between them
The Engine user registry can be the same as the IS user
registry if:
They share an OS user registry
Services tier and Engine tier must be on the same computer
They share an LDAP user registry
They cannot share the internal user registry in the IS Repository
Figure 3-18. Engine security configuration KM5021.0
Notes:
The Information Server engine (also known as the DataStage engine) performs user
authentication separately from other Information Server components. This has to do with
the fact that prior to Information Server v8.0, DataStage was a stand-alone product that
used the local OS user registry on the computer where it was installed. It continues to use
this in Information Server.
If the Engine user registry is different from the Information Server user registry, as it will be
in most cases if the Information Server user registry is not the OS user registry, then user
credentials must be mapped between them.
Student Notebook
Shared OS user registry
Shared OS
User attributes stored user registry
in internal user
registry
Figure 3-19. Shared OS user registry KM5021.0
Notes:
The Engine user registry can be the same as the IS user registry if they share an operating
system user registry. This graphic depicts that situation. The top graphic depicts a client
system. The lower system depicts the services tier. It is assumed in the graphic that the
engine and repository tiers are also installed on the same system.
When a user logs into DataStage, the Directory Service through the WAS checks the name
within the operating system user registry. If it finds the name and password, it passes the
user ID and password to DataStage, which then attempts to authenticate it. It will
authenticate it, since the user ID is in the operation system registry that DataStage uses.
V7.0
Student Notebook
Uempty
Shared LDAP user registry
Shared LDAP
User attributes stored user registry
in internal user
registry
Figure 3-20. Shared LDAP user registry KM5021.0
Notes:
The Engine user registry can be the same as the IS user registry if they share the same
LDAP user registry. This graphic depicts that situation. The top graphic depicts a client
system. The lower system depicts the services tier. It is assumed in the graphic that the
engine and repository tiers are also installed on the same system.
When a user logs into DataStage, the Directory Service through the WAS checks the name
within the LDAP user registry. If it finds the name and password, it passes the user ID and
password to DataStage, which then attempts to authenticate it. It will authenticate it, since
the user ID is in the LDAP registry that DataStage is using.
Student Notebook
Configuring IS for sharing the user registry

Click Domain Management>Engine Credentials
Then select the Engine
Registry sharing is configured separately for each Engine
Share user
registry
Figure 3-21. Configuring IS for sharing the user registry KM5021.0
Notes:
This graphic depicts how to configure Information Server so that the registry is shared
between Information Server and DataStage. If there is more than one engine on different
systems or on the same system, then this needs to be done for each one.
If the Share User Registry between InfoSphere Information Server and its engine box
is checked, it tells Information Server that the user directory it is configured to use is the
same as the user directory DataStage is configured to use. By default, DataStage is
configured to use the operating system user registry on the system on which it is installed,
but DataStage can be configured to use an LDAP user registry.
V7.0
Student Notebook
Uempty
Credential mappings
Credential mappings must be created when IS and the IS
Engine do not share the same user registry
This is necessary when IS uses the internal user registry, because the
Engine cannot use this registry
Credential mappings are stored with the internal user registry
in the Repository
Mappings can be either from one Information Server user to
one operating system user, or all Information Server users can
be mapped to the same, default operating system user
If the user registry is shared, Information Server must be
configured through the IS Web Console to indicate this
Click Domain Management>Engine Credentials
Select the Share User Registry option
Figure 3-22. Credential mappings KM5021.0
Notes:
If Information Server and DataStage do not share the same user registry, then mappings
must be created between Information Server user IDs, having DataStage Administration or
DataStage User roles, and user IDs that exist locally in the operating system registry where
DataStage is installed.
Assume that DataStage is using the operating system user registry. A credential mapping
consists of mapping an Information Server user ID (and password), who has a DataStage
User or Administrator role attached to it, to an operating system user ID (and password).
Alternatively, a single operating system user ID and password can be specified as the
default operating system user ID that all Information Server user IDs are mapped to.
Student Notebook
Credential mappings diagram
Engine OS
IS user registry user registry
Figure 3-23. Credential mappings diagram KM5021.0
Notes:
This diagram depicts credential mappings between the Information Server user registry
and the DataStage user registry, here assumed to be the operating system user registry.
Here the Information Server Repository and the Engine are on the same computer, but this
is not required.
The credential mappings are stored in the Information Server Repository. When a user logs
into DataStage, the Directory Service checks the name within the internal user registry. If it
finds the name and password, it locates the user ID and password it is mapped to, and then
it passes that user ID and password to DataStage, which then attempts to authenticate it. It
will authenticate it, since the user ID is in the operation system registry that DataStage
uses.
V7.0
Student Notebook
Uempty
Information Server User Configuration
Figure 3-24. Information Server User Configuration KM5021.0
Notes:
Student Notebook
Assigning roles for access control

Three types of roles
Suite roles: Provide access to Suite-level clients, for example, IS Web
Console
Assigned using the IS Web Console
Suite Component roles: Provide access to specific IS product clients
Assigned using the IS Web Console
Project-level roles: Roles defined within a specific IS product
Example: For a specific DataStage project, a user can be assigned the
role of Developer or alternatively of Operator
Assigned using administrative functionality within the specific project
Suite and Suite Component roles can be assigned to users or
groups
Users added to a group inherit the roles of the group
Figure 3-25. Assigning roles for access control KM5021.0
Notes:
There are three types of roles used to control access to Information Server products and
components. Suite roles control access to suite-level clients such as the Information Server
Web Console. Suite Component roles control access to specific Information Server
products. In addition, some products have additional roles, defined within the product, for
controlling access to its objects.
Roles can be assigned to individual users are to groups of users. Roles assigned to a
group are inherited by all users who are members of the group.
V7.0
Student Notebook
Uempty
Suite roles
Suite Administrator: Maximum privileges
Suite User: Minimum requirement to access any IS suite or
product client
Common Metadata Administrator
Full functionality within Metadata Asset Manager to browse and
manage metadata assets
Common Metadata Importer
Log into Metadata Asset Manager to impor metadata assets
Common Metadata User
Log into Metadata Asset Manager to browse metadata assets
Figure 3-26. Suite roles KM5021.0
Notes:
There are four different types of Suite roles. Three of the roles apply to Metadata Asset
Manager product. These are discussed in a later unit.
There are two standard Suite roles: Suite Administrator, Suite User. A Suite Administrator
can log into the Information Server Web Console and perform any task, including creating
user IDs. A Suite User has limited authority within the Information Server Web Console. A
Suite User can, for instance, log into the Web Console and view reports, but cannot create
user IDs.
Student Notebook
Suite Component roles

Product Administrator (FastTrack Administrator, Metadata Workbench
Administrator, DataStage Administrator, and so on)
Create and manage projects and users
Perform other administrative tasks depending on product
Product component user (FastTrack User, and so on)
Use product component user functions
Other specialized roles
Business Glossary Author: Create and edit business terms and categories and
assign metadata assets to terms
Business Glossary Basic User: More limited than Business Glossary User, in that
cannot examine metadata assets in the Repository
Metadata related roles, including:
Operational Metadata Administrator: Can import operational metadata into the
Repository
Operational Metadata Analyst: Can create and run reports on operational metadata
Roles related to rule sets used by QualityStage, including:
Rule Administrator: Administer who can access and run rules and rule sets
Figure 3-27. Suite Component roles KM5021.0
Notes:
For each product there is a Suite Component Administrator role and a Suite Component
User role. Some products have additional specialized roles. The nature of these roles
differs depending on the product.
For example, with respect to DataStage a user can be an Administrator or a User. An
Administrator has full authorization, including the ability to specify user project roles. A
Users authorizations are limited to those assigned by a DataStage Administrator.
V7.0
Student Notebook
Uempty
Creating IS users and groups

Performed on the IS Web Console Administration>Users
and Groups tab
Requires Suite Administrator privledges
Creating a Group
Specify user ID (for example, DEV)
Specify Name (IS Developers)
Specify other attributes: email, organization, and so on
Specify Suite and Suite Component roles
Add users
Users must already exist
Creating a User
Specify ID
Name and other attributes
Specify Suite and Suite Component roles
Figure 3-28. Creating IS users and groups KM5021.0
Notes:
Security roles can be applied to users or groups. Users in the group inherit the roles
defined for the group.
When creating a user or group, the primary tasks are to specify the name of the user and
group and other attributes, and to specify the Suite and Suite Component roles that apply
to the user or group. Users are also given a password.
Student Notebook
Creating a new group

Click Administration>Users and Groups>Groups
Click New Group
New Group
Groups
Figure 3-29. Creating a new group KM5021.0
Notes:
This graphic shows how to create a new group in the IS Web Console Administration tab.
First click on Users and Groups>Groups on the Administration tab. Then click New
Group. This opens the window where you specify the group attributes, shown on the next
page.
V7.0
Student Notebook
Uempty
Selecting group attributes and roles
Suite roles
Suite
Component
User ID and roles
other
attributes
Browse for
users to add to
the Group
Figure 3-30. Selecting group attributes and roles KM5021.0
Notes:
This graphic shows the page where you specify the attributes of a group. Required
attributes include the group ID and Name. In the Roles panel, select the Suite roles for the
group in the top panel, and select the Suite Component roles for the group in the bottom
panel.
In this example, the group ID is DEV. Two Suite roles have been chosen for the group
(Suite User, Common Metadata Administrator), and one Component role has been
chosen for the group (DataStage and QualityStage User).
Click the Browse button to add users to the group. These users must already been
defined.
Student Notebook
Creating a new user

Click Administration>Users and Groups>Users
Click New User
New User
Users
Figure 3-31. Creating a new user KM5021.0
Notes:
This graphic shows how to create a new user in the IS Web Console Administration tab.
First click on Users and Groups>Users on the Administration tab. Then click New User.
This opens the window where you specify the group attributes, shown on the next page.
V7.0
Student Notebook
Uempty
Specifying user attributes
Member of
User DEV Group
attributes
Add to a
Group
Figure 3-32. Specifying user attributes KM5021.0
Notes:
This graphic shows the page where you specify the attributes of a user. Required attributes
include the User Name and Password. In the Roles panel, select the Suite roles for the
user in the top panel, and select the Suite Component roles for the user in the bottom
panel.
In this example, the user name is dev1. One Suite role has been chosen for the user (Suite
User).
Click the Browse button to add the user to one or more groups. These groups must
already been defined. Additional Suite and Suite Component roles will be acquired by the
users membership in these groups.
In this example, the user acquires the roles possessed by the DEV group.
Student Notebook
Credential Mappings
Figure 3-33. Credential Mappings KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Default credential mapping

Click Administration>Domain Management>Engine
Credentials
Select the engine
A single IS domain can contain multiple engines
Click Open Configuration
Engine
Open Configuration
Engine credentials
Figure 3-34. Default credential mapping KM5021.0
Notes:
Credential mappings are specified in the Information Server Web Console in the Domain
Management>Engine Credentials folder on the Administration tab.
Begin by selecting the engine. In this example, there is only one engine to select, but
multiple engines are possible in a domain. Then click Open Configuration to open the
Engine Credentials window, shown on the next page.
Student Notebook
Specify the default credential mapping

Specify a user ID in the Engine user registry
In this example, the Engine user registry is the Engine Server OS
registry
dsadm is a valid OS user
Share User Registry must be unchecked
Engine user
registry user
Figure 3-35. Specify the default credential mapping KM5021.0
Notes:
A default credential mapping can be specified in the Default Credentials panel,
highlighted in the graphic. Here you specify an operating system user name and password
on the engine system.
This mapping will be applied to DataStage users that have not been given any explicit,
specific mapping. If you leave this blank, then every DataStage user must be explicitly
mapped to an engine system user.
V7.0
Student Notebook
Uempty
User credential mappings

Click Administration>Domain Management>Engine
Credentials
Select the engine
A single IS domain can contain multiple engines
Click Open User Credentials
Engine
Open User Credentials

Engine credentials
Figure 3-36. User credential mappings KM5021.0
Notes:
This graphic shows how to map an individual DataStage user to an engine operating
system user ID. After selecting the engine, click Open User Credentials. This opens the
Map User Credential window, shown on the next page.
Student Notebook
Individual credential mappings
IS user ID Engine user ID
Specify engine
system user ID
Browse for IS here
user ID
Figure 3-37. Individual credential mappings KM5021.0
Notes:
First click Browse to retrieve the DataStage user ID. Then specify the engine system user
ID and password it is to be mapped to. You must include both the engine system ID and its
associated password. Note that if the engine system ID password changes, the mapping
will no longer work and will have to be updated.
After you specify the engine system user, click Apply to complete the mapping.
In this example, dev1 has been mapped to dsadm. Here, dev1 is a user with DataStage
authorization. dsadm is a user on the engine system.
V7.0
Student Notebook
Uempty
Checkpoint
1. What client is used to specify DataStage credential
mappings?
2. What two types of authentication roles can be assigned to a
user or group?
3. What client is used to configure the IS user registry?
4. What three types of user registries are supported?
Notes:
Student Notebook
Exercises Unit 03
View the User Registry configuration in
the Information Server Web Console
View WAS user registry configuration
Review and create DataStage
credentials
Notes:
V7.0
Student Notebook
Uempty
Unit summary
Notes:
Student Notebook
V7.0
Student Notebook
Uempty Unit 4. Stopping and Starting Information Server

This unit describes how to stop and start Information Server
components.

Stop Information Server
Start Information Server
Check for running Information Server processes

Copyright IBM Corp. 2007, 2012 Unit 4. Stopping and Starting Information Server 4-1
Student Notebook
Unit objectives
Notes:
V7.0
Student Notebook
Uempty
Starting and stopping Information Server (IS)

Stopping Information Server
Stop Engine services:
DataStage DSRPC Server
ASB Agent
Logging agent
Stop WAS Metadata Server (server1)
Stop XMETA, IADB database server
For XMETA (IS Repository)
For IADB (Information Analyzer database)
Starting Information Server: Reverse the process
Start XMETA, IADB database server
Start WAS Metadata Server (server1)
Start Engine services
Figure 4-2. Starting and stopping Information Server (IS) KM5021.0
Notes:
Starting or stopping Information server involves starting or stopping many individual
Information Server components. These components need to be started or stopped in the
right order. First stop the Engine services. Then stop the domain, WAS services. At that
point, Information Server will be stopped. You can then, if you choose, stop the Information
Server supporting databases and database systems, including XMETA and IADB.
When you start Information Server, reverse the process. The supporting database systems
and databases must be running before you attempt to start the WAS Metadata Server.
Student Notebook
Stopping Information Server (1)

Log into the each Engine computer
Log in as root, unless IS agents have been configured for non-root
administration
Check that no one is using DataStage
Check if there are any DataStage processes running
ps ef | grep phantom
Phantom processes occur when DataStage jobs are running
ps ef | grep dsapi
ps ef | grep dscs
Each client connection initiates both a dsapi process and a dscs process
Check DSRPC has no established connections
If DSRCP is running, it will return a status of LISTEN
If connections are established, it will return a status of ESTABLISHED
netstat a | grep dsrpc
Each DataStage client connection will show an ESTABLISHED connection
Figure 4-3. Stopping Information Server (1) KM5021.0
Notes:
Before you stop DataStage, you may want to check that no one is using it. There are a
number of commands you can use to determine whether DataStage processes are
running.
The ps ef command displays process statuses. The grep command searches for a
pattern in the output from the grep command. Processes labeled phantom, dsapi, and
dscs are DataStage-related processes that indicated either that DataStage jobs are
running or that DataStage users are logged into DataStage.
The netstat a | grep dsrpc command displays DataStage network connections.
V7.0
Student Notebook
Uempty
Checking for DataStage processes
No jobs running
Job running
DataStage
client
connection
Figure 4-4. Checking for DataStage processes KM5021.0
Notes:
This graphic shows some example output from using the commands discussed previously.
We see output from the commands when DataStage jobs are running, DataStage clients
are running, and client connections are established.
In this example, the ps ef | grep dscs command is ran twice. The first time it is run, no
output other than the root process of running the command is displayed, indicating that no
DataStage jobs are running. The second time it is run, a dscs process owned by dsadm is
displayed. This indicates that DataStage jobs are running.
Towards the bottom, the netstat -a | grep dsrpc command is run. The output indicates that
a DataStage client connection is established.
Student Notebook

Stop DataStage services
Change to DataStage home directory
cd /opt/IBM/InformationServer/Server/DSEngine
Run the DataStage dsenv file
. dsenv
Sets the environment
Execute the ./bin/uv admin stop command to stop the DataStage Engine
instance
Check that there are no memory segments for tag ade
Check that there is no dsrpcd port activity
Notes:
To stop DataStage services, first run the dsenv file to initialize the DataStage environment.
Then execute the uv -admin -stop command. The default DataStage home directory is
/InformationServer/Server/DSEngine. If you are not sure what the home directory is, the
`cat /.dshome` command will return the DataStage home directory.
V7.0
Student Notebook
Uempty
Example: Stopping the Engine
Set the environment
Stop Engine
Check for memory Check dsrpcd port

segments activity
Figure 4-6. Example: Stopping the Engine KM5021.0
Notes:
In this example, we first change to the DataStage home directory. Then we execute the
dsenv command. Then we execute the uv -admin -stop command. The command output
indicates that the DataStage job monitor service, the resource tracking service, and the
Engine are all shut down.
Afterwards, you can run the ipcs and netstat commands shown to check whether there
are any remaining memory segments or dsrpcd port activity.
Student Notebook

Stop ASB agent
Establishes communication between the Engine and the Services layer
Change to ASBNode bin directory
cd /opt/IBM/InformationServer/ASBNode/bin
Stop the ASB agent
./NodeAgents.sh stop
Type 'yes' if you receive a message asking about deleting the Agent.pid file
Check that the ASB agent has stopped
ps ef | grep agent
Stop Agent
Check for Agent

processes
Notes:
The ASB agent establishes communication between the Engine and the Services layers,
which is necessary when the layers are installed on different computer systems. To stop
the ASB agent, run the NodeAgents.sh stop script, which is in the
/InformationServer/ASBNode/bin directory.
In the graphic, we first change to the /InformationServer/ASBNode/bin directory. Then
we run the NodeAgents.sh stop. Afterwards, we check whether any ASB agent processes are still
running.
V7.0
Student Notebook
Uempty

Stop the Metadata Server (server1)
cd /opt/IBM/InformationServer/ASBServer/bin
Issue the stop command
./MetadataServer.sh stop
Check that there are no Metadata Server processes running
ps ef | grep server1
Stop server1
Check for server1

processes
Notes:
You can use the MetadataServer.sh stop script to stop the Metadata Server services
layer. The MetadataServer.sh script runs the WAS stopServer.sh server1 script.
In this example, we first change to the /InformationServer/ASBServer/bin directory. Then
we issue the MetadataServer.sh stop script. When you run this command, make a note of
directory containing the log files. You may want to consult log files in that directory to verify
that no errors occurred.
Afterwards, we check whether any ASB agent processes are still running using the ps -ef command.
Student Notebook
Starting Information Server

Confirm that the database servers for XMETA and IADB are running
Start the Metadata Server (server1)
cd /opt/IBM/InformationServer/ASBServer/bin
Issue the start command: ./MetadataServer.sh start
Runs the WAS startServer.sh server1 command
Check the WAS startServer log file to verify that server1
was started
Log files located in
/WebSphere/AppServer/profiles/InfoSphere/logs/server1
Figure 4-9. Starting Information Server KM5021.0
Notes:
Starting Information Server involves starting the components in the opposite order you use
when stopping them. Before attempting to start Information Server, verify that the database
servers for XMETA and IADB are running. Then execute the Metadata Server.sh start
script. Then start the ASB agent and the DataStage Engine.
V7.0
Student Notebook
Uempty
Starting the ASB agent

Start ASB agent
Establishes communication between the Engine and the Services layer
Change to ASBNode bin directory
cd /opt/IBM/InformationServer/ASBNode/bin
Start the ASB agent
./NodeAgents.sh start
Figure 4-10. Starting the ASB agent KM5021.0
Notes:
To start the ASB agent, first change to the /InformationServer/ASBNode/bin directory.
Then run the NodeAgents.sh start command.
This agent must be running if DataStage and WAS are installed on separate systems. The
ASB agent establishes communication between these two Information Server layers.
Student Notebook
Starting the DataStage engine

Change to DataStage home directory
Run the DataStage dsenv file
. dsenv
Sets the environment
Execute the ./bin/uv admin start command to start
the DataStage Engine instance
Figure 4-11. Starting the DataStage engine KM5021.0
Notes:
After you start the ASB agent and the DataStage Engine, change to the
/InformationServer/Server/DSEngine directory, run dsenv to initialize the DataStage
environment, then run the vu -admin -start command.
V7.0
Student Notebook
Uempty
Checking the Engine status

Commands to check the Status of the Engine:
. dsenv
./bin/uv admin info
Figure 4-12. Checking the Engine status KM5021.0
Notes:
The uv -admin -info command can be used to check the status of the Engine. As with any
of the uv commands, first run dsenv to initialize the DataStage environment.
In this example, we first run the command. Output from the command indicates that it is
running, and NLS is active.
Notice the reference to the DataStage startup script. This script can be modified, in order to
start additional engine services when the DataStage engine is started. As you will see later,
the Operations Console, which monitors DataStage running jobs, uses additional services.
The command that runs these services can be added to the ds.rc script to start these
services automatically.
Student Notebook
Other checks on the engine

Check that the dsrpc daemon is listening: netstat a | grep
dsrpc
Check that the dsrpc daemon is listening: ps ef | grep
dsrpcd
Check the status of the ASBNode agent: netstat a | grep
31531
31531 is the default port for the ASBNode agent
dsrpc is
listening
ASBNode agent is
listening
Figure 4-13. Other checks on the engine KM5021.0
Notes:
In the graphic, several commands are executed to verify that the engine services are
running. The netstat command is used to check whether the DataStage dsrpc service is
running. The ps -ef command is used to check whether the DataStage dsrpcd service is
running. Finally, the netstat command is used to check whether the ASBNode agent is
running.
V7.0
Student Notebook
Uempty
Checkpoint
1. Stopping IS involves stopping what?
2. What command would you use to start the DataStage
engine?
3. How do you set the DataStage environment for running this
command?
Notes:
Student Notebook
Exercises Unit 04
Check for running engine processes
Stop engine services
Stop the ASB agent
Stop the Metadata Server (server1)
Start the IS Metadata Server
Start the ASB agent and DataStage
engine
Check DataStage status
Notes:
V7.0
Student Notebook
Uempty
Unit summary
Notes:
Student Notebook
V7.0
Student Notebook
Uempty Unit 5. Session Management

This unit describes how to manage sessions, configure and manage
logging, configure reporting, and release locks.

Configure and manage sessions
Configure and manage logging
Create, run, and manage reports
Describe Information Server locking

Copyright IBM Corp. 2007, 2012 Unit 5. Session Management 5-1

Student Notebook
Unit objectives
Notes:
V7.0
Student Notebook
Uempty
Client session management

Each user connection to Information Server results in the creation
of a client session
Two connections with the same user ID result in the creation of two
sessions
A session has a timeout and expires if not touched
While a client is active, it touches the services tier on a regular
basis to avoid expiration
If a client crashes, the session will expire
A session can be disconnected by an Information Server
administrator
From the Information Server Web Console
No warning sent to the client
Repository services are listening to the session and are notified
when a session disappears
The services then can remove cached objects, locks, and so on
3
Figure 5-2. Client session management KM5021.0
Notes:
Each user connection using an Information Server client results in the creation of a
session. A user can log into multiple clients as the same time. Each established connection
creates another session.
A session will timeout and expire if nothing happens in it for an extended period of time.
Alternatively, a session will cease if the user closes the client or if an Information Server
administrator stops it. The latter can be done in the Information Server Web Console.

Student Notebook
Viewing active client sessions

Log into Web Console using a Suite administrator ID (isadmin)
On the Administration tab, click Session Management>Active Sessions
The active client sessions are listed
The address or hostname of the client is provided
Select a client session and then click Open to get more details about the session
User information: user attributes, user security roles
Session duration
Click Global Session Properties to specify general session properties
Global
session
properties
Client address List of active

sessions
Active sessions Client type
Figure 5-3. Viewing active client sessions KM5021.0
Notes:
User sessions can be managed by an Information Server administrator in the Information
Server Web Console. On the Administration tab, click Session Management>Active
Sessions. The current active sessions are listed.
In this example, there are three active sessions. The Type column identifies the type of
session. The first session was established when the administrator isadmin logged into the
Web Console. The second session was established when a user logged into DataStage
Designer. The third session was established when a user logged into a thick client, such as
FastTrack or Information Analyzer.
The Address column identifies the computer name or IP address of the client system.
To open or disconnect a specific session, select the session and then click the appropriate
link in the right panel.
Click Global Session Properties to specify general session attributes.
V7.0
Student Notebook
Uempty
Global session properties
Session
properties
5
Figure 5-4. Global session properties KM5021.0
Notes:
This graphic shows the Global Session Properties window.
Each session consumes WAS and engine resources. At some point as more and more
sessions are established performance will begin to deteriorate. You can limit this
deterioration by reducing the maximum number of sessions.
The maximum number of sessions determines how many users can log into Information
Server applications at one time. A user, other than an Information Server administrator
logging into the Web Console, will be unable to log into an Information Server client after
the maximum has been reached. Users will receive a message that they are unable to log
in because the maximum has been reached.
If too many users are bumping into the maximum, you can try reducing the inactive
session timeout period. This will free additional sessions.

Student Notebook
Session details
Session
properties
User
attributes
6
Figure 5-5. Session details KM5021.0
Notes:
Select a session and then click Open to view details about it and the user logged into the
session. In this example, a user named dsadm is logged into the session. Information
about that user, including the authorization roles the user possesses is displayed.
Some information about the session is also displayed, including its duration and the
number cached objects, which indicates how many resources the session is consuming.
V7.0
Student Notebook
Uempty
Disconnecting sessions
To disconnect specific sessions:
From the Active Sessions tab, select the connections you want to
disconnect
Click Disconnect
To disconnect all sessions (including your own session)
Select Disconnect All
Disconnect
all users
Disconnect
selected users
7
Figure 5-6. Disconnecting sessions KM5021.0
Notes:
You can disconnect active sessions by selecting the sessions and then clicking
Disconnect. You can also disconnect all sessions by clicking Disconnect All. Note that
this will also disconnect your session in the Web Console as well as all others.

Student Notebook
Log Management
Figure 5-7. Log Management KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Log management
Logged events are accessed through views
A view filters events based on specified criteria
You can create as many views as you want
Logs events are stored in the Repository
The Web Console provides a central place to view logs across all
Information Server components
Click Administration>Log Management
Logging components
Represent Suite components that use the logging service
For example, the DataStage logging component represents DataStage
Logging configurations
Determine which logging messages get saved into the Repository
Each Suite component can have multiple configurations
But only one can be active at a time
9
Figure 5-8. Log management KM5021.0
Notes:
Information Server is capable of logging many different types of events, concerning many
different Information Server products and components. An Information Server administrator
can specify the types of events that are to be logged. Logged events are stored into the
Information Server Repository.
Logged events can be accessed through views. These views select a set of the logged
events in the Repository.
There are, then, two main tasks related to logging: Specifying which events are logged, and
creating views to access the stored events.
A logging component represents an Information Server component, such as DataStage, for
which events are logged. Logging configurations can be created for each logging
component. A logging configuration specifies the logging events that stored relative for this
logging component. There are be multiple configurations, but only one can be active at a
time.

Student Notebook
Managing configurations
DataStage
logging
component
Open
DataStage
component
configurations
Figure 5-9. Managing configurations KM5021.0
Notes:
Click Log Management>Logging Components to view the logging components that
exist. Select the component whose configurations you want to manage, for example,
DataStage. Then click Manage Configurations to open the configurations that are related
to DataStage.
Each logging component has a default configuration that is specified when Information
Server is installed. Alternative configurations can be created an made active.
V7.0
Student Notebook
Uempty
DataStage component configurations

Only one configuration can be active at a time
Click New Logging Configuration to create additional
configurations
In this example, DataStage.JOB.RUN is a new configuration that was
created to capture a subset of all the DataStage events, namely those
having to do with running jobs
Default
configuration
New
configuration
Figure 5-10. DataStage component configurations KM5021.0
Notes:
You can create a new configuration from scratch by clicking New Logging Configuration.
Alternatively, you can make a copy of an existing configuration and then modify it.
In this example, a copy of the DataStage.ALL configuration was copied and then modified.
The modification consisted of reducing the types of logging events that are saved to those
having to do with running DataStage jobs.

Student Notebook
DataStage.ALL configuration
The configuration lists categories of events
For each category of logging messages, the configuration specifies the severity level
of the messages to retain
Threshold refers to the event warning level floor
For example, Warn includes all events at the warning level and higher: Warn, Error, Fatal
Severity
level for
individual
events
Threshold
severity
level for all
events
Figure 5-11. DataStage.ALL configuration KM5021.0
Notes:
A configuration lists categories of events whose messages are to be stored. For each
category, a threshold severity level for the messages is specified. A thresh hold indicates a
floor. Any messages at the selected level or at a more severe level will be stored. For
example, if Warn is selected, then all messages at that level or higher will be stored,
namely, warning messages, error messages, and fatal error messages.
V7.0
Student Notebook
Uempty
Log views
Select messages based on specified criteria
Filters out a select set of the events that are captured into the Repository, based on the active
configurations
Click Administration>Log Management>Log Views
List of existing views
Click View Log to view the messages of a selected View
Click Open to display and edit the log view criteria
Click New Log View to create a new log view
Access can be shared with everyone or remain private to the view creator
Existing log
views
Figure 5-12. Log views KM5021.0
Notes:
Logging views are created to select a set of messages from those that are stored in the
repository based on specified criteria.
The Log Views tab lists existing log views. Click View Log to view messages of the
selected view. You can also create new log views.
To view the messages, select the log view and then click View Log in the right panel.

Student Notebook
Log view messages
Start
DataStage
job named
relMultInput
Environment
variable
settings the
job ran under
Figure 5-13. Log view messages KM5021.0
Notes:
This graphic shows the messages that were selected by an example log view. One
message informs us that a DataStage job has been started. Another lists the environment
variable settings for the job in effect at the time the job was started. To view the messages,
select the log view and then click View Log in the right panel.
The numbers of messages selected by a log view can be large. You can filter out the
messages you are interested in at the top of the window. Expand the Additional Filter
Criteria folder to reveal the full set of filtering conditions. A selected subset of messages
can then be viewed in a separate window.
V7.0
Student Notebook
Uempty
Creating a new log view

Name
Severity levels to include
Configuration categories to include
Specify context items
Criteria relevant to a specific category
Specify property and value
For example: DSJob = relMultInput
Specify table columns
Columns of information to include in the message
For example: Message, Timestamp, Severity level
Figure 5-14. Creating a new log view KM5021.0
Notes:
When you create a log view, you give it a name. They you specify the criteria for selecting
the messages to include. These criteria include the configuration categories of messages
to include, the severity levels, and additional context information relevant to a specific
category of messages. For example, you could specify that you only want information
related to a job named relMultInput.
In addition to the specifying the criteria for the information to include, you also need to
specify the columns of information to include in the message. A given message contains
several columns of information. You choose which columns of information you are
interested in.

Student Notebook
Example log view
Shared with
all users
Retrieve
messages with
all severity levels
Categories of
messages to
view
Message info
to display
Figure 5-15. Example log view KM5021.0
Notes:
This graphic shows an example of a newly created log view. It shows where you specify the
criteria and the information to display, as discussed on the previous pages.
At the top is the name given to the log view. In the Access box, Shared has been selected.
This means that the user who is creating this log view is willing to share it with all other
users. That is, other views can view the log using this log view.
In the Severity Levels panel, you filter the messages to view by severity level. In this
example, all severity levels are selected.
In the Categories panel, you add the categories of log messages to view. Click Browse to
add additional categories. To delete a category, select it and then click Remove.
In the Table Columns panel, you select from the log messages the columns of information
you want to view. In this example, several columns of information, including DSJob (the job
the message applies to) are selected.
V7.0
Student Notebook
Uempty
Reporting Administration
Figure 5-16. Reporting Administration KM5021.0
Notes:

Student Notebook
Reporting administration
Managed on the Information Server Web Console Reporting
tab
Reports can be created about Suite component activities and
administrative functions
Report formats include: HTML, PDF, RTF, TXT, XML
Access to reports, report templates, and report results can be
restricted
Reports are organized into folders
Folders can only be created by Information Server administrators
Figure 5-17. Reporting administration KM5021.0
Notes:
Information Server reporting is managed through the Information Server Web Console
Reporting tab. The Reporting tab, contains a folder of templates to build your reports, and
a set of folders you can use to store your reports. Access to reports, report templates, and
report results can be restricted.
Reports are stored and organized in folders. Folders can only be created by Information
Server administrators.
V7.0
Student Notebook
Uempty
Creating a report
Select a report template
Report templates are organized by Suite product or component
Example for Administration: List of users
Click New Report
Browse for report folder
Report settings
Name
Parameters
Vary depending on report type
Example: DataStage project, job name
Format: HTML, PDF
Settings include: Expiration, History policy
Figure 5-18. Creating a report KM5021.0
Notes:
There are a number of pre-build reports that can be run from within Information Server
products.
New reports can also be created on the Reporting tab. You begin by selecting a report
template. Information Server administrators have access to all of the report templates, but
not all templates are available to all users. Then you specify the report settings in the new
report.
When you create a report you specify the folder to store the report in. The folder must
already exist at the time you create the report.
Several output formats are supported, including: HTML, and PDF.

Student Notebook
Selecting the report template
New report
Selected template
Figure 5-19. Selecting the report template KM5021.0
Notes:
In this example, the selected report template is List of users from the
Administration>Security folder of templates. After you select the template click New
Report.
Notice that there are administration report templates as well as report templates for specific
Information Server products.
V7.0
Student Notebook
Uempty
Editing the report
Name
Report folder
Report parameters
Report format
(html) and settings
(hidden)
Figure 5-20. Editing the report KM5021.0
Notes:
In this List of users example, the Reports folder has been selected for its storage. This is
the root report folder.
Report settings are specific to the type of report being created. In this example, users with
product roles are being selected. The specific product is DataStage.
The output report format is a mandatory parameter. This parameter is not visible in the
graphic, but has been set as HTML.

Student Notebook
Running a report
Run reports
Can schedule to run
Access control
View report results
Specify access View results
Selected report
Run report
Figure 5-21. Running a report KM5021.0
Notes:
After a report is created it can be run or scheduled to run. The Reports>My Reporting
folder lists reports that have recently been created.
The report creator can specify who can run the report and view its results. Click Open
Access Control to specify who can view the report.
Click Run selected reports to run the reports selected in the list. Afterwards, click View
Report Result to view the report information.
V7.0
Student Notebook
Uempty
Sample report
Figure 5-22. Sample report KM5021.0
Notes:
The graphic shows an example of a List of Users report. In this example, the users and
their user attributes are listed. The criteria by which this list of users was chosen is
described in the bottom half of the upper panel. In this case, this report selects users who
have one or more DataStage product component roles.

Student Notebook
Report access control

Browse for users to give access to the report
Specify users access permissions: Update, Delete, Run,
Administration
Users with Information Server administration credentials can specify
access rights for other users
User
permissions Browse for
user to add
Figure 5-23. Report access control KM5021.0
Notes:
This window is displayed if you click Open Access Control on the Reports panel. In this
example, isadmin (the user who created the report) and other Suite administrators have
access to the report. There are several layers of access that can be allowed or restricted,
including the ability to read, update, delete, run, and administer the report. In this example,
only isadmin can administer the report, that is, specify access control. Other Suite
administrators can view, delete, and run it, but not administer it.
You can browse for users, groups, and roles to add to the access control list. Then you can
specify what authorizations they have. For example, you can add the DataStage and
QualityStage Developer role to the access list. They will then be able to view report
results. You can give them further authorizations as well, for example, to run reports.
V7.0
Student Notebook
Uempty
Information Server Locking
Figure 5-24. Information Server Locking KM5021.0
Notes:

Student Notebook
Locking overview
Locking occurs in two places within Information Server:
The Metadata Repository tier for design elements like job objects,
table definitions, mapping specifications, and so on
The Engine tier for files that will be used at job run-time
Exceptions (such as failed network connections or a user
forcefully killing a client application) can result in abandoned
locks
In most cases, if a user experiences a locking error, they
should retry their operation
It can take some time for a lock to be released
In instances where a lock is not cleared immediately,
Information Server provides mechanisms for both the
automatic and manual clearing of these locks
Figure 5-25. Locking overview KM5021.0
Notes:
Locking occurs in two tiers within Information Server. When design elements are opened in
a product, such as a DataStage job open in DataStage Designer, a lock is placed on that
object. Locks are also taken by the Engine tier by DataStage jobs when they run on
objects, such as files, they are using.
When the design object is closed or the DataStage job is finished, the locks are released.
However, sometimes the locks fail to get released. For example, sometimes when
DataStage job abort, some of their locks fail to get released.
Information Server has mechanisms for automatically and manually clearing locks. Some
of these mechanisms are discussed in the following pages.
V7.0
Student Notebook
Uempty
Clearing Repository locks

For locks that are tied to existing user sessions, one of these
procedures will clear the lock:
Stop the session
The user can log out, or
The IS Administrator can stop the session in the IS Web Console, or
When a session is inactive for the specified period, Information Server will end
the session and clear associated locks
If Information Server is restarted, all sessions are forced to close which will
also clear all related locks
Sometimes there are dangling locks
These are locks not tied to any existing session and need to be
cleared manually
Figure 5-26. Clearing Repository locks KM5021.0
Notes:
Most of the time, locks can be cleared by stopping and restarting the session or, more
drastically, by restarting Information Server. Some locks, however, are not tied to any
existing session and need to be cleared manually.

Student Notebook
Manually clearing locks

If there is a lock tied to an existing session that needs to be
cleared, and you cannot wait until the session expires from
inactivity:
Log into Information Server Web Console and manually stop the
session
For dangling locks (locks not tied to a session):

For DataStage related locks, you can use the Job / Cleanup
Resources option in DataStage Director
Run the cleanup_abandoned_locks.sh script
Located in /InformationServer/ASBServer/bin
Delete the lock record in the XMETALOCKINFO table in the
Information Server Repository database (XMETA)
Not recommended
Figure 5-27. Manually clearing locks KM5021.0
Notes:
The locks are stored in the XMETA database in the table XMetaLockInfo.
When an unconnected Session is left, locks can be cleared from the Information Server
Web Console. This can be done by disconnecting the relevant session using the
Administration>Session Management>Active Sessions>Disconnect option.
Alternatively there is a command line tool called cleanup_abandoned_locks in the
/IBM/InformationServer/ASBServer/bin directory that can be used to cleanup any
disconnected locks.
Restarting Information Server will also clear all locks.
There is also a session inactivity timeout specified in the Web Console. If this is set to
timeout, then locks are released when the session times out.
V7.0
Student Notebook
Uempty
Clearing Engine locks

Only certain users are allowed to release DataStage locks
DataStage administrator user
Owner of the lock
Ownership of locks is based on the Engine process user ID,
not on the operating system user ID
The DataStage user ID is based on the PID of the client process
(dsapi_slave).
When a client connects to the Engine tier, a new dsapi_slave process
is started
Locks taken will be associated to that process
Each client connection will have a different dsapi_slave process and
therefore a different DataStage User ID
Figure 5-28. Clearing Engine locks KM5021.0
Notes:
To clear a DataStage lock, you must be either a DataStage administrator or the owner of
the lock.
Ownership of locks is based on the Engine process user ID. When a client connects the
Engine tier, a new client process is started. Locks taken are associated to that process.

Student Notebook
Clearing locks in Director

On the Administrator client General tab, enable job
administration in Director
Log into Director as a DataStage administrator
Click Job>Cleanup Resources
In the Processes section, click Show All
Then select a process from list
In the Locks section, click Show by Process option, click
Release All
Figure 5-29. Clearing locks in Director KM5021.0
Notes:
Engine-held locks can be cleared in Director, if the Enable job administration in Director
option has been enabled in Administrator for the project. In Director, click Job>Cleanup
Resources. This opens the Job Resources window, which displays a list of the Engine
processes that are running and their PIDs. For a selected process, you can view the locks
taken by the process.
V7.0
Student Notebook
Uempty
Clearing logs in Director
Job process
Show locks by
process
Release locks
for process
Figure 5-30. Clearing logs in Director KM5021.0
Notes:
The top window displays job processes that are running. The bottom window displays locks
that have been taken by the job processes. You can select and release locks in these
windows, either directly or by logging out of the process.
To log out of a process, select the process and then click Logout. Click Release All to
release all the locks the process has taken.

Student Notebook
Checkpoint
1. What client would you use to stop Information Server
sessions?
2. True or False? A logging view determines what logging
messages or events get saved into the Repository.
3. What procedure would you use to clear a lock tied to an
existing user session?
4. What procedure would you use to clear a "dangling" lock, not
tied to an existing user session?
Notes:
V7.0
Student Notebook
Uempty
Exercises Unit 05
Manage active sessions
Manage logging configurations
Create a log view
View the log
Create an administrative report
Clear abandoned locks
Notes:

Student Notebook
Unit summary
Notes:
V7.0
Student Notebook
Uempty Unit 6. Engine Tier Architecture

This unit describes the Information Server Engine (DataStage)
compile and runtime architecture.

List all components in the Engine architecture
Describe DataStage compile and run time processes
Create and modify parallel configuration files
Use the DataStage job runtime Performance Analysis tool
Use the Resource Estimator tool
Navigate the Engine file hierarchy

Copyright IBM Corp. 2007, 2012 Unit 6. Engine Tier Architecture 6-1
Student Notebook
Unit objectives
Describe components in the Engine architecture
Describe DataStage job compile and run time processes
Create and modify parallel job configuration files
Use the Engine command line interface
Notes:
V7.0
Student Notebook
Uempty
Traditional batch processing
Write the last record to disk and read the next record from disk before each processing
operation
Sub-optimal utilization of resources
One record is processed at a time
Processing resources sit idle during I/O
Cannot scale up to large data volumes
Figure 6-2. Traditional batch processing KM5021.0
Notes:
Traditional batch processing consists of a distinct set of steps, defined by business
requirements. Between each step, intermediate results are written to disk.
This processing may exist outside of a database (using flat files for intermediate results) or
within a database (using SQL, stored procedures, and temporary tables).
There are several problems with this approach: Each step must complete and write its
entire result set before the next step can begin. Secondly, landing intermediate results
incurs a large performance penalty through increased I/O. In this example, a single source
incurs 7 times the I/O to process. Thirdly, with increased I/O requirements come increased
storage costs.
Student Notebook
Traditional approach to parallel processing
Establish parallelism by:

Manually splitting source data (pre-partitioning) and processing each partition
separately in a independent flow with the same logic
Partitioning remains constant throughout flow
May require landing to disk to change partitioning
Supported in DataStage, but DataStage has additional flexibility
Figure 6-3. Traditional approach to parallel processing KM5021.0
Notes:
The traditional approach to improve performance is by manually splitting the source data,
and running multiple copies of the same steps against each portion of the source data.
While this brute force approach can work in some instances, it generally has limited
usefulness with complex business requirements, which require related records to be
processed together. This also requires an extensive pre-processing effort to partition the
files properly.
V7.0
Student Notebook
Uempty
Data flow model of application design
Allows developers to create a sequential data flow visually

No landing of data required between each step/process
Applies, regardless of execution model
Real-time
Batch
Figure 6-4. Data flow model of application design KM5021.0
Notes:
When developers design their jobs by dragging stages (functional components) onto the
DataStage Designer canvas, they specify the data flow in sequential, non-parallel terms.
The parallelism that DataStage implements is not explicitly specified by the developer, but
is implemented by DataStage during the compile and runtime process.
DataStage employs a data flow model for application design, where data flows in memory
between sources, intermediate transformations, and targets without landing to disk.
Between operators uses special in-memory structures called data sets to pass data
between operators. These are similar in structure to physical data sets that can be created
and accessed using the Data Set stage.
This model works in both batch and real-time, service-oriented implementations.
Student Notebook
Data pipelining
Think of a conveyor belt moving the records from operator (stage)

to operator (stage}
Run each operator in parallel, passing data records from one operator
to the next
Transform, Enrich, and Load operators run simultaneously
Eliminates intermediate staging to disk
Utilizes all available processors busy
But pipelining alone still limits overall scalability
Figure 6-5. Data pipelining KM5021.0
Notes:
Data pipelining is the first step toward efficient parallel processing. Instead of waiting for all
rows to be processed by the previous step, records pass from step-to-step in memory just
like a conveyor belt in a factory assembly line moves physical products being built.
All parallel jobs developed with DataStage use data pipelining. It is a core feature of the
parallel framework and is always enabled.
Pipeline parallelism alone is not enough. There is a limit to the number of rows in that can
be in the pipeline, being processed, no matter how many resources (CPU processors,
memory) are available.
V7.0
Student Notebook
Uempty
Partition parallelism
Divide the incoming stream of
data into subsets to be separately
processed
Subsets are called partitions
Each partition of data is
processed by the same operation
If operation is Transform, each
partition will be transformed in
exactly the same way
Facilitates near-linear scalability
8 times faster on 8 processors
24 times faster on 24 processors
This assumes the data is evenly
distributed
Figure 6-6. Partition parallelism KM5021.0
Notes:
Partition parallelism, unlike pipeline parallelism, can scale up to take advantage of all
available resources (CPU processors, memory). And it facilitates near-linear scalability. If 8
processors are available, the job can run approximately 8 times faster than with 1
processor.
Partitioning breaks a data set into smaller sets that are each processed separately, in
parallel. This is a key to scalability. However, the data needs to be evenly distributed across
the partitions; otherwise, the benefits of partitioning are reduced.
It is important to note that what is done to each partition of data is the same. How the data
is processed or transformed is the same.
Student Notebook
Parallel engine combines partition and pipeline parallelism
Repartitioning occurs automatically

Partitioning occurs on a stage-by-stage basis
No need to repartition data when:
Processors are added
Hardware architecture changes
Broad range of partitioning methods are available
Figure 6-7. Parallel engine combines partition and pipeline parallelism KM5021.0
Notes:
DataStage combines data pipelining and partition parallelism to scale across all available
resources without landing intermediate results to disk. Within the parallel framework,
pipelining and partitioning are always on.
Data can also be re-partitioned from stage-to-stage, distributing data as required by the
business requirements, without landing to disk. This would be impossible in traditional
hand-coded approaches to parallel processing.
V7.0
Student Notebook
Uempty
Partitioning and collecting

Partitioners distribute rows of a
single link into smaller segments
that can be processed
independently in parallel
partitioner collector
Collectors combine parallel
partitions of a single link for
sequential processing
ONLY before sequential stages
Stage Stage
Stage
running in running
running in
Parallel Sequentially
Parallel
Figure 6-8. Partitioning and collecting KM5021.0
Notes:
Within a parallel job, one of two operations is performed before each stage/operator:
Partitioning, or collecting. Partitioners divide data into subsets which are processed
separately, in parallel; collectors merge parallel data streams back into a single stream.
This might be required, for example, when landing data to disk in a single file or when
performing operations that must be performed sequentially, for example a global count of
all the data.
The left graphic shows how partitioning works. A single stream is distributed into multiple
streams. Different algorithms can be used to perform the distribution. The right graphic
shows how collecting works. Multiple streams are collected into a single stream. Different
algorithms can be used to perform the collection.
Student Notebook
Partitioners
Partitioners distribute rows of a single link (data
set) into smaller segments that can be
processed independently in parallel
Partitioners exist before ANY parallel stage. The
previous stage may be running: partitioner
Sequentially
Results in a fan-out operation (and link icon)
Stage Stage
running running in
Sequentially Parallel
Stage
In Parallel Stage
running in
running in
If partitioning method changes, data is Parallel
repartitioned Parallel
Stage Stage
running in running in
Parallel Parallel
Figure 6-9. Partitioners KM5021.0
Notes:
Technically, the parallel framework does not require explicit partitioners before each
parallel stage. Because the Designer GUI makes no such distinction, it is easier to think of
all stages as having partitioners, where AUTO is a type of partitioner (that may or may not
generate a partition operator at runtime).
There are two types of partitioners. For keyless partitioning algorithms, rows are distributed
independently of data values. For keyed partitioning algorithms, rows are distributed based
on values in specified columns.
Icons on the DataStage Designer canvas indicate when partitioning and collecting is
occurring. The fan-out icon indicates that data in a single stream is being distributed into
multiple streams. The lower, butterfly icon indicates that data in multiple streams is being
redistributed across multiple partitions.
V7.0
Student Notebook
Uempty
Collectors
Collector method is defined between any stage

running sequentially when the previous stage is
running in parallel
Collectors combine partitions of a data set into a collector
single input stream, then into a sequential mode

execution Stage
Stage Stage
running in running Stage
Parallel Sequentially running
Sequentially
Figure 6-10. Collectors KM5021.0
Notes:
There are several collector algorithms. Auto eagerly reads any row from any input partition.
The output row order is undefined (non-deterministic). This is the default collector method.
Round Robin picks rows from input partitions in round robin order. This is slower than auto
and rarely used.
Ordered reads all rows from first partition, then the second, and so on. It preserves the
order that exists within partitions.
Sort Merge produces a single (sequential) stream of rows, sorted on specified key
columns, from input sorted on those keys. It does not sort. Row order is not preserved for
non-key columns.
Student Notebook
Parallel sorting
Many operations (joining, aggregating, removing duplicates) either
require sorting or perform optimally with sorting
In most cases, there is no need to globally sort data to produce a single
sorted sequence of rows
Instead, sorting is most often used to establish order within individual
partitions of data
Sorting for joining, aggregating, removing duplicates, and so on, can be done in parallel, for
high performance gains!
Global sorts, if desired, can be accomplished after parallel sorting, by collecting
the data into a single partition using the Sort-Merge collector
Figure 6-11. Parallel sorting KM5021.0
Notes:
It is sometimes thought that parallel sorting, though faster, is not very useful, because each
partition is separately sorting the data within that partition, and not sorting all the data. In
most cases, however, global sorts across all partitions are not needed. And global sorts, if
desired, can be accomplished after parallel sorting by collecting the data using the Sort
Merge collector algorithm.
V7.0
Student Notebook
Uempty
Parallel Job Compilation
Figure 6-12. Parallel Job Compilation KM5021.0
Notes:
Student Notebook
Parallel job compilation

DataStage Designer generates all code: Designer
Client
Validates link requirements, mandatory stage
options, transformer logic, and so on
Generates OSH-script representation of data
flow and GUI stages Compile
OSH is a scripting language composed of C++
operators and input/output specifiers between DataStage server
them
GUI stages in the job design are compiled into
OSH operators
GUI Transformer stages are compiled into C++
source code, which is then compiled into C+ + f
or
custom OSH operators Trans each
Executable
forme
r
Job
This is why DataStage requires a C++ compiler
DataStage also supports custom C++ stages,
called BuildOp stages, that are compiled
manually within the GUI, and then compiled into Transformer
Gene Components
custom OSH operators rated
OSH
Figure 6-13. Parallel job compilation KM5021.0
Notes:
What happens when a DataStage job is compiled? From the GUI design on the Designer
canvas, DataStage generates what is called OSH. OSH is a scripting language
composed of C++ operators and input/output specifiers.
Some stages, like the Transformer stage and Custom Build-Op stages, generate C++
source code that is then compiled into OSH operators. This is why DataStage requires a
C++ compiler on the Engine system.
The OSH code that is generated still represents the data flow as a sequential process. At
runtime, along with the configuration file (discussed later), the OSH is parsed into code that
implements the partition parallelism.
V7.0
Student Notebook
Uempty
Generated OSH
Enable viewing of generated
OSH in Administrator
Comments
Schemas describe
Operator the format of the
input and output
data to the OSH
Schema operators
Operator properties
Figure 6-14. Generated OSH KM5021.0
Notes:
You can view generated OSH through DataStage Designer. This provides an overview of
the OSH that will be executed. It is important to note, however, that this OSH will go
through some additional changes for optimization and execution.
In the top graphic, the Parallel tab in DataStage Administrator is displayed. Developers
can only view the OSH is the Generated OSH visible for Parallel Jobs... box is checked.
There are several places where the OSH can be viewed. In the lower graphic the OSH is
being viewed on the Generated OSH tab of the Job Properties window is DataStage
Designer.
Student Notebook
Parallel Engine Runtime Architecture
Figure 6-15. Parallel Engine Runtime Architecture KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Parallel engine runtime

Parallel jobs are independent of the actual hardware and degree of
parallelism used to run the job
The parallel configuration file provides a mapping at runtime between the
compiled job and the actual runtime infrastructure and resources
Processing nodes in the configuration file determine the degree of parallelness
At runtime, the parallel Engine uses the given job design and the
configuration file to compose the job Score
The Score maps operator processes to processing nodes
Figure 6-16. Parallel engine runtime KM5021.0
Notes:
At compile time the OSH is generated. It is not until runtime that the partition parallelism is
implemented. This is done by a series of start-up processes that occur whenever a parallel
job is run.
Since the parallelism is not implemented until runtime, the same compiled job can be run
with different degrees of parallelism, on different occasions. This is a major benefit of the
way DataStage implements partition parallelism.
The configuration file used to run the job determines the degree of parallelism, and the
resources (processors, disk, memory) used to run it. From this, and the OSH generated at
compile time, the Engine startup processes produce the Score, which specifies which
operators run on which processor nodes, and what resources they use when they do.
Student Notebook
Parallel engine runtime

It is only after the job Score and processes are created that processing
begins
Startup overhead of a parallel job
Job processing ends when either:
Last row of data is processed by the final operator
A fatal error is encountered by any operator
The job is manually stopped by an operator
Figure 6-17. Parallel engine runtime KM5021.0
Notes:
After the Score is produced, data processing beings. There is some overhead as operators
are distributed to the various nodes. Processing ends when the last row of data is
processed by the job, unless the job aborts.
As a job runs, messages are written to the job log. The lower graphic shows the last few
messages of a job that ran to completion without errors.
V7.0
Student Notebook
Uempty
Job execution: the process "orchestra"

Conductor Node
Conductor - initial process
C Composes the Score
Creates section leader processes (one per node)
Consolidates messages to DataStage log
Processing Node Manages orderly shutdown
SL Section leader process (one per node)

Forks player processes (one per operator)
Manages up and down communication
P P P
Processing Node
Players
The actual processes associated with operators
SL
(stages)
Sends stderr, stdout to section leaders
P P P
Establishes connections to other players for data flow
Cleans up upon completion
Default Communication:
SMP: Shared Memory
Cluster/GRID: Shared Memory (within hardware node) and TCP(across hardware nodes)
Figure 6-18. Job execution: the process orchestra KM5021.0
Notes:
The graphic displayed summarizes the start-up process that occurs in generating and
implementing the Score. One processor node is designated the conductor. This is a node
on the computer system where DataStage is installed. The processor node composes
(generates) the Score based on the OSH and configuration file. It then forks off section
leader processors to each processor node specified in the configuration file.
Each section leader process then generates the OSH operator player processes that will
run on that node, and sets up the communication between those processes. A player
process is an operator (stage) running on a node.
The player processes, which are running in parallel on each node, then perform the data
processing the job is designed to do.
Student Notebook
Runtime control and data networks

Control Channel/TCP
Conductor
Stdout Channel/Pipe
Stderr Channel/Pipe
APT_Communicator
Section Leader,0 Section Leader,1 Section Leader,2
generator,0 generator,1 generator,2
copy,0 copy,1 copy,2
$ osh generator -schema record(a:int32) [par] | roundrobin | copy
Figure 6-19. Runtime control and data networks KM5021.0
Notes:
Every player process has to be able to communicate with every other player that could
potentially receive some of its output data or provide some of its input data. This is because
data can potentially move from one player process on one node to another player process
on a separate node (possibly on a separate computer).
There are separate communication channels (pathways) for control, messages, errors, and
data. Note that the data channel does not go through the section leader or conductor, as
this would limit scalability. Data flows directly from upstream operators to downstream
operators.
The graphic depicts the communication process. Two player processes are shown on each
node: generator, copy. The dotted lines represent the flow of data. So, for example, data
can flow from generator,0 to copy,0 on the same node or from generator,0 to copy,2 on
another node.
Communication also occurs between the conductor and section leaders, and between
sections leaders are player processes. These are indicated by the solid lines.
V7.0
Student Notebook
Uempty
Understanding the job Score

Identifies degree of parallelism and node assignments for each
operator
Details the mappings between functional (stage/operator) and actual
operating system processes
Includes operators automatically inserted at runtime:
Buffer operators to prevent deadlocks and optimize data flow rates between stages
Sort and Partitioner operators that have been automatically inserted to ensure correct
results
Outlines connection topology (data sets) between adjacent operators
and/or persistent data sets
Defines number of actual operating system processes
Figure 6-20. Understanding the job Score KM5021.0
Notes:
The Score is an in-memory text file that can be view in the job log. The Score identifies the
degree of parallelism for each operator and the node or nodes that are assigned to each
operator, for it to run on.
It is important to note that the Engine may insert additional operators into the Score
(partitioners, sorts) beyond what was generated in the OSH. These include buffer
operators to prevent deadlocks and sort operators that are inserted because certain
operators required sorted input data.
Student Notebook
Viewing the job Score

Turn on the Score for job runs
Value set at the project or job level
The environment variable named $APT_DUMP_SCORE set to true
The Score is displayed in a message in the job log
Identify the message by heading: main program: This step
You wont see the word Score anywhere
Score
Figure 6-21. Viewing the job Score KM5021.0
Notes:
You can view the Score in the job log, if the $APT_DUMP_SCORE environment variable
has been turned on. Best practice is to have this variable turned on in both development
and production systems. The Score is a major debugging tool for DataStage developers.
And the Score is a major trouble-shooting tool for production teams.
The message in the log does not contain the word Score. Identify the message by looking
for the heading main program: This step has N datasets.
V7.0
Student Notebook
Uempty
Example job Score
Job scores are divided into

two sections
Datasets

partitioning and
collecting
Operators
node/operator mapping
Both sections identify
sequential or parallel
processing
Why 9 Unix processes?
Figure 6-22. Example job Score KM5021.0
Notes:
The Score yields a lot of useful information, including the number of operators (stages) and
the number of input and output data sets.
The Score also lists the number of player processes. In this example there are nine player
processes. One operator running on the Row Generator node. Four peek processes
running on all four nodes for the first Peek stage. And four peek processes running on all
four nodes for the second Peek stage.
An example Score is displayed in the top right corner for the job shown in the lower-left
graphic. Notice that the two Peek stages/operators each run, in parallel, on four processing
nodes. The Row Generator stage, which runs sequentially, runs on only a single node
(node1).
Student Notebook
Counting the total number of processes

One for the conductor process
One section leader process for each node
Four nodes = four processes
One player process for each operator running on a node
One operator running (sequentially) on one node = one process
One operator running (in parallel) on four nodes = four processes
Total number of processes = Conductor + Section Leader processes +
Player processes for all operators
Figure 6-23. Counting the total number of processes KM5021.0
Notes:
Processes consume resources, CPU and memory. The more processes, the greater the
impact on resources. You can determine the total number of processes a job will generate
from the Score. There is one process generated for the Conductor node. There is one
section leader process generated for each node. Each player process running on a node is
a separate process.
The Job Score does not include the runtime startup, overhead processes, since the
number is constant across all jobs.
V7.0
Student Notebook
Uempty
Parallel Job Configuration File
Figure 6-24. Parallel Job Configuration File KM5021.0
Notes:
Student Notebook
Configuration file
What is a configuration file?

Configuration file tells the parallel Engine how to exploit the
underlying computer system
Defines processing nodes and disk space connected to each node
that are allocated for use by the parallel Engine
Parallel Engine first reads the configuration file to determine what
system resources are allocated to it and then distributes the
application to those resources
The configuration file used by a job is specified by
$APT_CONFIG_FILE
There is not necessarily one ideal configuration file because of high
variability between the way different jobs work
Figure 6-25. Configuration file KM5021.0
Notes:
When a job runs, it runs using a configuration file. The configuration file number of nodes
determines the degree of parallelism of the job.
The configuration file tells the parallel Engine how to exploit the underlying computer
system or systems. What processor nodes should it use? What disk resources?
The $APT_CONFIG_FILE environment variable that is in effect for the job at the time the
job runs determines the configuration file that is used by the job. There is a project default
configuration file that is specified. The job, however, may override this default by including
the environment variable as one of its job parameters.
V7.0
Student Notebook
Uempty
Configuration file nodes
A node is a logical processing unit with corresponding resources

Need not match physical CPUs
File system resources include: Data set, scratch disk, buffer disk
Number of nodes does not have to match the number of CPUs in
your system or the number of machines in the configuration
You can define one processing node for multiple physical nodes in
your system or multiple processing nodes for each physical node
Figure 6-26. Configuration file nodes KM5021.0
Notes:
The number of nodes specified in a configuration file does not have to match the number of
physical CPUs in your system or systems. There can be more or there can be less. The
nodes specified in the configuration file are logical. For example, you can use a 4-node
configuration file when running a job on a computer with a single processor. And the job will
still run in parallel streams. It will not run in true physical parallelism. It will be the kind of
parallelism exhibited by a computer with a single processor running several applications at
one time.
True physical parallelism does not occur unless there are physical CPUs backing it up.
There is no need to connect the nodes in the configuration file to physical processors if
they exist. This occurs automatically, and you have no control over this.
Student Notebook
Sample configuration file

User assigned
{ logical name
node "node1" { Actual host/
fastname node1_css" server name for
pool "" "node1" node1_css" mnode" "sort" high speed network
resource disk "/dataset/d1" {pools }
resource disk "/dataset/d2" {pools bigds"}
interface
resource scratchdisk "/scratch1" {pools buffer"}
resource scratchdisk "/scratch2" {pools "sort"}
Four Data }
Partitions node "node2" {
defined in fastname node2_css" Pool names
this pool "" "node2" node2_css" pnode" for the node
resource disk "/dataset/d2" {pools }
config resource disk "/dataset/d1" {pools bigds"}
file resource scratchdisk "/scratch1" {pools buffer}
}
node "node3" {
fastname node3_css
pool "" "node3" node3_css" pnode
Permanent Storage
resource disk "/dataset/d3" {pools } location for
resource scratchdisk "/scratch1 {pools buffer} parallel data sets
}
node "node4" {
fastname node4_css"
pool "" "node4" node4_css" pnode sort Temporary storage
resource disk "/dataset/d4" {pools } location for
resource scratchdisk "/scratch1" {pools buffer}
resource scratchdisk "/scratch2 {pools sort}
processing work
} space
}
Figure 6-27. Sample configuration file KM5021.0
Notes:
This graphic shows a typical configuration file. The file defines four nodes: node1, node2,
and so on. The names given to the nodes is arbitrary. The fastname, on the other hand, is
not arbitrary. Its name must match the network name of the computer in which it exists.
Pools can be applied to nodes and other resources. Individual jobs or stages in a job can
be constrained to use a certain pool of nodes or resources. In this way you can direct the
job or stages in the job to use certain nodes or resources, and not others.
There are several different types of resources. A disk resource is used for storing data sets.
A scratchdisk resource is used by DataStage for temporary work space, for example, by
the sort operator.
V7.0
Student Notebook
Uempty
Factors affecting optimal degree of parallelism

CPU-intensive applications benefit from the greatest possible
parallelism
Stages with large memory requirements (example: Lookup stage)
can benefit from parallelism if they act on data that has been
partitioned and if the required memory is also divided among
partitions
I/O intensive applications that extract data from and load data into
database systems
May need to configure the system to prevent the database from
Redistributing the data (when loading)
Re-partitioning when extracting
Speed of communication among stages
Stages exchanging large amounts of data should be assigned to nodes where they
communicate by either shared memory or a high-speed link
Best overall performance of a parallel job can be achieved with
equal data partitioning
Figure 6-28. Factors affecting optimal degree of parallelism KM5021.0
Notes:
There are many factors that affect what the optimal configuration file would look like and
how many nodes it would have. The optimal degree depends on the application.
CPU-intensive applications and I/O-intensive applications vary in terms of what is optimal.
For production jobs that will be run repeatedly, you should test the job with different
configuration files. Start with a number of nodes, then start adding nodes as long
performance continues to improve. You should also experiment with reducing the number
of nodes. For some jobs, this may actually improve performance.
Student Notebook
Node pools
Node pools allow association of processing nodes based on their

characteristics
Certain nodes have large amounts of physical memory
You can designate them as compute nodes
Others may connect directly to a mainframe or some form of high-speed I/O
These nodes can be grouped into a node pool for I/O
By default, DataStage executes a parallel stage on all nodes defined
in the default node pool
Default node pool is identified by the syntax of (double quotes)
If a node pool has been defined in the configuration file, you can
constrain a stage (node constraint) to run only on that pool
That is, only on the processing nodes belonging to that node pool
Figure 6-29. Node pools KM5021.0
Notes:
Node pools in the configuration file can be used to separate processing nodes into different
categories based on their characteristics. These characteristics can include resources such
a memory or disk space or access to specific applications. This enables the job to use the
most efficient processing nodes on which to run its operators (stages).
By default, DataStage uses all the nodes defined in the default node pool. The default node
pool is identified by the syntax of empty double quotes. In a typical configuration file, all
nodes will be in the default pool. In some cases, nodes with special resources will exist
outside of the default pool, as part of a special pool. This would be for nodes that are only
to be used by a job in special circumstances.
V7.0
Student Notebook
Uempty
Node pools example

{ In this example, there are two
node "n1" {
fastname "s1" node pools (app1 and app2)
pool "" "n1" "s1" "app2"
resource disk "/orch/n1/d1" {} and the default pool (identified
resource disk "/orch/n1/d2" {"bigdata"}
resource scratchdisk "/temp" {"sort"}
by ).
}
node "n2" {
Nodes n2, n3, n4 belong to
fastname "s2"
pool "" "n2" "s2" "app1"
node pool app1 and node n1
resource disk "/orch/n2/d1" {} belongs to node pool app2
resource disk "/orch/n2/d2" {"bigdata"}
resource scratchdisk "/temp" {} All nodes (n1, n2, n3, n4)
}
node "n3" { belong to the default node
fastname "s3"
pool "" "n3" "s3" "app1" pool ( identified by )
resource disk "/orch/n3/d1" {}
resource scratchdisk "/temp" {}
}
node "n4" {
fastname "s4"
pool "" "n4" "s4" "app1"
resource disk "/orch/n4/d1" {}
resource scratchdisk "/temp" {}
}
}
Figure 6-30. Node pools example KM5021.0
Notes:
This is an example of a configuration file with defined node pools. In this example, nodes
n2, n3, and n4 all belong to the node pool named app1. Node n1 does not belong to this
pool.
All the nodes belong to the default node pool (identified by ). All operators can be
assigned to nodes in the default pool.
Student Notebook
Disk pools
Disk pools indicate the directories of the file systems available to the
node
Defined as options for resource disk and resource scratchdisk
Disks and Scratch disks may be grouped into pools
Disk pools reserve storage for a particular use
Example: holding very large datasets, sorting
Syntax
resource disk "disk_name" {pools "disk_pool"}
resource scratchdisk "s_disk_name" {pools "s_pool"}
Pools defined by disk and scratch disk are not combined
Two pools having the same name and belonging to both resource disk and resource
scratchdisk are defined as two separate disk pools
Each node on which a stage runs must have at least one disk in the
default disk pool
Figure 6-31. Disk pools KM5021.0
Notes:
Disk pools identify the file directories available to a node. Each node must have at least
one disk directory it can use. Disk pools can be used to reserve storage for a particular use.
For example, a particular disk directory might be reserved for jobs that will be creating very
large data sets.
Since a jobs operators always need access to some disk space, each node in the
configuration file must have at least one disk resource in the default pool.
V7.0
Student Notebook
Uempty
Sort resource usage

By default, each sort uses 20MB per partition as an internal memory buffer
Includes user-defined sorts (in-stage sort, Sort stage) and framework-inserted sorts
A different size can be specified for each Sort stage using the Restrict Memory Usage
option
Increasing this value can improve performance, especially if the entire (or group) data
partition can fit into memory
Decreasing this value may hurt performance, but will use less memory (minimum is
1MB per partition)
This option is unavailable for in-stage sorts
To change the amount of memory used by all Sort stages, set:
$APT_TSORT_STRESS_BLOCKSIZE = [mb]
Note that this overrides the per-stage memory settings
When the memory buffer is filled, sort uses temporary disk space in the
following order:
Scratch disks in the $APT_CONFIG_FILE sort named disk pool
Scratch disks in the $APT_CONFIG_FILE default disk pool (normally all scratch disks
are part of the default disk pool)
The default directory specified by $TMPDIR
The UNIX /tmp directory
Figure 6-32. Sort resource usage KM5021.0
Notes:
Sorting in DataStage jobs requires both memory resources and disk resources. Disk
resources are needed when there is not enough memory to perform the sort in memory. In
that case, some sorting operations must be done using disk resources.
In the configuration file, you need to specify scratch disk resources for sorting operations.
The sort keyword is used to identify the scratch disk to be used first.
The usage of disk resources can be prioritized. If multiple disk resources are listed, the
order from top to bottom determines their priority. You can prioritize certain disk resources
for sorting purposes by adding the resource to the sort pool.
Student Notebook
Buffer scratch disk pools

Under certain circumstances (for example, buffer overflow),
DataStage will use disk storage to buffer some records
Amount of memory defaults to 3MB per buffer per node (partition)
Amount of disk space per node defaults to the amount of available disk space
Specified in the default scratchdisk setting for the node
Scratch disk is used for both buffer overflow to disk and for other
temporary storage uses
When a buffer scratch disk pool is defined (key word buffer) in the
configuration file, DataStage uses that scratch disk pool rather than other
default scratch disks
Other scratch disk pools will then be used for other temporary storage
Figure 6-33. Buffer scratch disk pools KM5021.0
Notes:
Just like for sorting, buffering also takes place in memory, if there is sufficient memory to
perform the buffering tasks. If there is not enough memory, disk resources will be used. The
buffer pool can be used to prioritize the scratch disk resources in the configuration file to
be used for buffering.
V7.0
Student Notebook
Uempty
Buffer scratch disk pools example

In this example, each processing node has a single scratch
disk resource in the buffer pool (/scratch0)
Buffering will use /scratch0 before /scratch1
If /scratch0 were not in the buffer pool, /scratch0 would be used first
because it is the first listed
Either can be potentially used because both are in the default pool
Default disk pool is identified in this example with { }
{
Buffer
node node1 { scratch disk
fastname "node1_css"
pools "" "node1" "node1_css"
resource disk "/orch/s0" {}
resource scratchdisk "/scratch0" {pools "buffer }
resource scratchdisk "/scratch1" {}
}
node node2 {
fastname "node2_css"
pools "" "node2" "node2_css"
resource disk "/orch/s0" {}
resource scratchdisk "/scratch0" {pools "buffer }
resource scratchdisk "/scratch1" {}
}
}
Figure 6-34. Buffer scratch disk pools example KM5021.0
Notes:
In this example, the buffer pool is used to identify /scratch0 as the priority directory for
buffering operations when they spill over to disk. Since /scratch0 is listed before
/scratch1, it would be used first, even without being in buffer pool.
Student Notebook
Buffer resource usage

By default, each buffer operator uses 3MB per partition of
virtual memory
Can be changed through advanced link properties, or globally using
$APT_BUFFER_MAXIMUM_MEMORY
When buffer memory is filled, temporary disk space is used

in the following order:
Scratch disks in the $APT_CONFIG_FILE buffer named disk
pool
Scratch disks in the $APT_CONFIG_FILE default disk pool
The default directory specified by $TMPDIR
The UNIX /tmp directory
Figure 6-35. Buffer resource usage KM5021.0
Notes:
The environment variable $APT_BUFFER_MAXIMUM_MEMORY determines how much
memory is available for buffering. Some jobs may require more for good performance. In
this case, you can used properties in the job to increase the memory available for specific
operations.
When memory is exhausted, disk space is used for buffering. There is a defined order in
which disk space is used until its exhausted. Scratch space in the buffer pool is used up
first, after memory is exhausted.
V7.0
Student Notebook
Uempty
Configuration file guidelines

Parallelism (number of nodes) should be optimized rather than
maximized
Increasing parallelism (number of nodes) may better distribute the work load
but it also adds overhead due to increase in number of processes
Prepare multiple configuration files
There is not one ideal configuration file because of high variability between the way
different jobs work
Optimize overall throughput and match job characteristics to overall hardware
resources
Provide relative throttle for runtime resource usage on a per job basis
Figure 6-36. Configuration file guidelines KM5021.0
Notes:
It may seem that using a configuration file with the maximum number of nodes relative to
the number of available physical CPUs will yield the best performance. But this is not
necessarily true. Each node increases the amount of overhead as it adds additional
processes. And you need to keep in mind the other activity on the system.
The best way to determine the optimal number of nodes is through testing. Run the job
several times on the same set of test data using a variety of configuration files, with
different numbers of nodes and different resource allocations. Compare the results to
determine the optional configuration.
Student Notebook
Configuration file - the default.apt file

The default.apt file is created when Information Server is
installed
References subdirectories for data sets and scratch disk of the
Information Server install directory
There are two problems with this:
This may create project corruptions if the mount fills
It is likely that the performance on these file systems is not optimal
Consider removing this file and creating separate configuration
files
Can be referenced by the $APT_CONFIG_FILE setting in each
DataStage Project
At a minimum, consider editing the default.apt configuration file
to reference newly-created data and scratch file systems
Figure 6-37. Configuration file - the default.apt file KM5021.0
Notes:
A default configuration file, named default.apt, is created when Information Server is
installed. Depending on the version of Information Server, this configuration file may have
only one node. And it uses subdirectories of the Information Server install directory for
specified disk resources. At a minimum, you should create a configuration file that specifies
other disk resources. And probably you will want to use a configuration file with multiple
nodes.
V7.0
Student Notebook
Uempty
Configuration file - sizing the number of nodes

The optimal number of nodes (partitions) is dependent on system
configuration, resource availability, job design, and other applications
sharing the server hardware
For example, if a job is highly I/O dependent or dependent on external
sources or targets, it may be appropriate to have more nodes than
physical CPUs
Testing the performance using configuration files with different settings is
recommended
For typical production environments, a good starting point is to set
the number of nodes equal to the number of CPUs
For development environments, which are typically smaller and more
resource-constrained, create configuration files with smaller numbers of
nodes
At minimum, a 2-node configuration file should be used to verify that
job logic and partitioning will work when jobs are running in parallel
Figure 6-38. Configuration file - sizing the number of nodes KM5021.0
Notes:
This slide offers some guidelines for sizing the optimal number of nodes in a configuration
file. As mentioned earlier, testing your jobs with several different configuration files is
recommended.
And remember, configuration files that work well for one job may not work well for other
jobs, depending on the type of job and whether, for example, it is highly I/O dependent.
Student Notebook
Configuration file tuning

Use multiple machines to leverage additional resources
Adding more than one fastname to a configuration file can expand
the footprint of the jobs run time environment
Licensing and installation issues need to be considered when doing this
Repartitioning becomes much more costly because it involves data
moving across the network from one computer to another
{
node node1 {
fastname machine1
pools
resource disk /disk1/mypath" {pools ""}
resource scratchdisk /disk2/mypath" {pools "} }
node node2 { Node2 is on a
fastname machine2
different
pools
computer
}
Figure 6-39. Configuration file tuning KM5021.0
Notes:
DataStage jobs are not limited to running on a single system with its limited number of
CPUs. DataStage can be configured to run jobs on multiple systems networked together.
The fastnames identify the names of the different systems. In this example, there are two
different fastnames (machine1 and machine2). This indicates that node1 and node2 are
on different computers.
V7.0
Student Notebook
Uempty

Vary resource disks and resource scratchdisks across nodes
Including different mounts/spindles on your various disk specifications
ensures eliminates I/O conflict between nodes
Decreases latency and increases throughput
Each location
{ is on a
node node1 { different disk
fastname machine1
pools
node node2 {
fastname machine1
pools
}
Notes:
Spreading resource disks for nodes across different directories decreases latency and
increases throughput. Notice in this example that the resource disks for node1 and node2
are different. This insures that node1 disk operations will not contend with node2 disk
operations.
Student Notebook

Add multiple resource disks per node
When a node references multiple resource disks, data sets will be
alternately written to each one
Distributes writes within a node
If each disk location is on a separate mount/spindle, this can dramatically
increase I/O throughput.
Two separate
disk locations
can improve the
write
performance for
{
a data set
node node1 {
fastname machine1
pools
}
Notes:
Even with respect to a single node, resource disk usage can be spread across different
disks to avoid contention. If multiple resource disks are specified data sets will be written
alternately to each one, in the order in which the resources are listed.
In this example, node1 has two resource disk entries. The first entry refers to disk1 and
the second to a directory on disk2. The first data set will be created on disk1. The second
will be created on disk2. The third will be created on disk1, as the process starts over
again.
V7.0
Student Notebook
Uempty
Minimizing resource requirements

There are times when it is appropriate to minimize the resource
requirements for a given scenario, for example:
Batch jobs that process small volumes of data
Real-time jobs that process data in small message units
Environments running a large number of jobs simultaneously on the same servers
In these instances, a single-node configuration file may be
appropriate
Minimizes job startup time and resource usage requirements without
significantly impacting overall performance
Areas where a single-node configuration are appropriate include:
A small LPAR (logical partition) where DataStage is contending with other processes
for resources
An environment with lots of small jobs that would not benefit (or maybe would suffer)
from data-partition parallelism
Real time jobs
Figure 6-42. Minimizing resource requirements KM5021.0
Notes:
There are times when a single node configuration file is appropriate and can yield the best
performance. This may be true when you are running a batch of DataStage jobs in a job
sequence, and all the jobs process a small amount of data. The overhead of the additional
nodes will outweigh benefits of the additional nodes, which are not really needed because
of the small amount of data.
Real-time DataStage jobs process data in small message units and usually get their best
performance using one node configuration files.
Student Notebook
Editing a configuration file

In Designer, click Tools>Configurations
Edit and save
Named node
pool
Figure 6-43. Editing a configuration file KM5021.0
Notes:
Click Teleconferencing in Designer to create a new configuration file or edit an existing
one. The easiest way to add a node is to copy the first node and paste in copies for the
other nodes. All you are required to change is the name of the node. You may also, as
noted earlier, want to change the resource disks.
V7.0
Student Notebook
Uempty
Running a job with a non-default configuration file

Add APT_CONFIG_FILE as a job parameter
Open Job Properties window for the job
Click Parameters tab
Click Add Environment Variable
Optionally set the default value to another configuration file than the project
default
Parameter
default value
Add
Added environment
variable variable
Figure 6-44. Running a job with a non-default configuration file KM5021.0
Notes:
The $APT_CONFIG_FILE environment variable specifies the default configuration file to
be used by any job running in the project. Not all jobs have to run with that configuration
file. You can add $APT_CONFIG_FILE as a job parameter, so that the configuration file for
the job can be specified at runtime.
This graphic shows the Parameters tab of the Job Properties window. Click Add
Environment Variable to add any environment variable, including $APT_CONFIG_FILE,
as a job parameter. The values specified at runtime override the default values specified in
DataStage Administrator.
Student Notebook
Engine Command Line Interface
Figure 6-45. Engine Command Line Interface KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Engine command line interface

Executed at the Engine system command line or terminal window
Four groups of commands:
Commands for controlling DataStage jobs: dsjob
Stop and start jobs
Retrieve information about jobs
Access log files
Commands for administering projects: dsadmin
Configure DataStage projects and environment
Retrieve information about DataStage projects and environment
Commands for importing DataStage object (dsx) files: DSXImportService
Import jobs, table definitions, and other DataStage objects
Retrieve information about the contents of import files
Runs on both Client systems and Server systems where DataStage is installed
C:\IBM\InformationServer/ASBNode/bin/DSXImportService.bat
/opt/IBM/InformationServer/ASBNode/bin/DSXImportService.sh
Commands for checking and repairing DataStage projects: SyncProject
Runs on both Client systems and Server systems where DataStage is installed
C:\IBM\InformationServer/ASBNode/bin/SyncProject.bat
/opt/IBM/InformationServer/ASBNode/bin/SyncProject.sh
Figure 6-46. Engine command line interface KM5021.0
Notes:
Commands for administering DataStage, controlling and monitoring DataStage jobs, and
commands importing and exporting DataStage objects and projects can be executed from
the Engine server system from the command line. These commands fall into four groups.
The dsjob command can be used to control DataStage jobs. Jobs can be run from the
command line. And the job log messages generated from the job can be viewed.
The dsadmin command can be used to configure DataStage projects and to retrieve
information about the DataStage environment.
The DSXImportService command can be used import and export DataStage dsx (import)
file. This command runs on both the DataStage server as well as DataStage client
systems.
The SyncProject command can be used when DataStage project directories get out of
sync with the Repository. This command runs on both the DataStage server as well as
DataStage client systems.
Student Notebook
dsjob command
DataStage user credentials: -domain domainName user
userName password password server engineName
Running a job: -run projectName jobNameRet
Options include:
-mode [ NORMAL or RESET ]
-param parameterName=value
-stop
Use to stop a running job
List projects: dsjob lprojects
List jobs: dsjob ljobs projectName
Access job log files: dsjob logsum projectName jobName
Generate a job report: dsjob report projectName jobName
Figure 6-47. dsjob command KM5021.0
Notes:
When using the dsjob command, DataStage user credentials need to be specified in all
cases.
Use the -run parameter to run a job. The -run parameter is followed by the name of the
project and the name of a job to run.
You can use the -lprojects parameter to list the projects on the Engine.
You can use the -ljobs parameter to list the jobs in a project.
You can use the -logsum parameter to display the job log messages for a job. The
-logsum parameter is followed by the name of the project and the name of a job.
V7.0
Student Notebook
Uempty
dsjob command syntax
Figure 6-48. dsjob command syntax KM5021.0
Notes:
This graphic shows the syntax of the dsjob command. At the bottom of the graphic is the
list of command parameters that can be used in the dsjob command. All these options are
preceded by a dash.
Student Notebook
dsjob lprojects command example

Change to the /DSEngine directory
Run . dsenv to initialize the Engine environment
Enter the command
You can omit user and password
You will be prompted for their values
Figure 6-49. dsjob -lprojects command example KM5021.0
Notes:
In this example, the dsjob -lprojects command is executed. Before you run the command,
change to the /DSEngine directory, and then initialize the DataStage environment by
running the dsenv script. Then enter the command. The command is located in the
/DSEngine/bin directory.
In the graphic, the dsjob keyword is followed by the authentication credentials. In this
example student/student is used to log into the server (edserver.ibm.com).
V7.0
Student Notebook
Uempty
dsjob run command example

Following the run command option, specify the project
containing the job followed by the job name
Optionally, specify values to be passed to job parameters
Job parameters not specified run with their default values
Project
Specify value for

job parameter Job
Figure 6-50. dsjob -run command example KM5021.0
Notes:
In this example, the dsjob -run command is executed. Before you run the command,
example student/student is used to log into the server (edserver.ibm.com). The -run
parameter is followed by the -param option, which is used to pass a value to the
NumRows job parameter, defined in the job. This is followed by the name of the project
and job.
Student Notebook
dsjob logsum (log summary) command example

Returns the messages in the log for the specified job in the
specified project
Messages for multiple job runs are displayed if available
-logsum
Project Job
Job log messages
Figure 6-51. dsjob -logsum (log summary) command example KM5021.0
Notes:
In this example, the dsjob -logsum command is executed. Before you run the command,
example student/student is used to log into the server (edserver.ibm.com). The -logsum
parameter is followed by the name of the project and job.
V7.0
Student Notebook
Uempty
dsjob report (job report) command example

Returns a status report for the specified job in the specified
project
Report is for the last job run
-report
Project Job
Job status report
Figure 6-52. dsjob -report (job report) command example KM5021.0
Notes:
In this example, the dsjob -report command is executed. Before you run the command,
The -report parameter returns a report of the last job run.
example student/student is used to log into the server (edserver.ibm.com). The -report
parameter is followed by the name of the project and job.
Student Notebook
dsadmin command
Create a project: dsadmin createproject projectName
Set the value of an environment variable: dsadmin env
variableName value Value projectName
List projects: dsadmin listprojects
List environment variables: dsadmin listenv projectName
Figure 6-53. dsadmin command KM5021.0
Notes:
You can use the dsadmin command to execute various DataStage administrative
functions: create a project, set an environment variable, list projects, list environment
variables.
V7.0
Student Notebook
Uempty
dsadmin command syntax
Figure 6-54. dsadmin command syntax KM5021.0
Notes:
This graphic shows the syntax of the dsadmin command. At the bottom of the graphic is
the list of command parameters that can be used in the dsadmin command. All these
options are preceded by a dash.
Student Notebook
dsadmin command examples
-listprojects
-listenv
Environment
variable
settings
Figure 6-55. dsadmin command examples KM5021.0
Notes:
In this example, the dsadmin -listprojects and the dsadmin -listenv commands are
executed. Before you run these commands, change to the /DSEngine directory, and then
initialize the DataStage environment by running the dsenv script. Then enter the
command. The command is located in the /DSEngine/bin directory.
The -listproject parameter returns a list of projects.
The -listenv parameter returns a list of environment variables and their current settings.
In the graphic, the dsadmin keyword is followed by the authentication credentials. In this
example student/student is used to log into the server (edserver.ibm.com). The -listenv
parameter is followed by the name of the project.
V7.0
Student Notebook
Uempty
DSXImportService List command example

The command is located in the
/opt/IBM/InformationServer/ASBNode/bin directory
List contents
Import file type
Listings
Figure 6-56. DSXImportService -List command example KM5021.0
Notes:
This command is located in the /ASBNode/bin directory, on both the Engine server and
client systems. In this example, the DSXImportService keyword is followed by the -List
parameter. Then the type of import file is specified by the -DSXFile parameter. This
distinguishes the import file as a dsx type rather than an xml type. Then the path to the
import file is specified.
Notice that the output lists the type of DataStage object (parameter set, job, etc.) followed
by a list of the objects of that type contained in the input file.
Student Notebook
DSXImportService import command example

Use ISHost to specify the Information Server services host
Use ISUser and ISPassword to specify DataStage user ID
Import file
Import file type
Results
Figure 6-57. DSXImportService import command example KM5021.0
Notes:
This command is located in the /ASBNode/bin directory, on both the Engine server and
client systems. In this example, the DSXImportService keyword is followed by parameters
for specifying the domain host, and the user ID and password used to log into the host.
This is followed by the name of the project the file is to be imported into. The -DSXFile
parameter distinguishes the import file as a dsx type rather than an xml type. Then the
path to the import file is specified.
Notice that the output lists the type of DataStage object (parameter set, job, etc.) followed
by the name of the object imported.
V7.0
Student Notebook
Uempty
Checkpoint
1. What determines the degree of parallelness that a job runs
under?
2. What message in the job log lists the nodes that a stage
(operator) runs on?
3. What two types of parallelism are supported in DataStage
parallel jobs?
4. When you click the Compile button for a DataStage parallel
job, what type of script gets generated?
5. What determines the configuration file a job runs under?
Notes:
Student Notebook
Exercises Unit 06
Edit a configuration file
Run a DataStage job from the GUI
using the non-default configuration
file
Examine the OSH and Score
Run a job from the command line
Administer the Engine from the
command line
Use the DSXImportService
command to list the contents of a
DataStage import (dsx) file
Use the DSXImportService
command to import a DataStage
import (dsx) file
Notes:
V7.0
Student Notebook
Uempty
Unit summary
Describe components in the Engine architecture
Describe DataStage job compile and run time processes
Create and modify parallel job configuration files
Use the Engine command line interface
Notes:
Student Notebook
V7.0
Student Notebook
Uempty Unit 7. Engine Tier Configuration

This unit describes Engine tier administrative tasks.

Configure DataStage projects
Configure Engine environment variables
Manage data sets
Configure the Engine to gather and process operational metadata
Use the Multiple-Job Compile utility to compile batches of
DataStage jobs

Copyright IBM Corp. 2007, 2012 Unit 7. Engine Tier Configuration 7-1
Student Notebook
Unit objectives
Manage data sets
Configure the Engine to gather and process operational
metadata
DataStage jobs
Notes:
V7.0
Student Notebook
Uempty
DataStage Project Configuration
Figure 7-2. DataStage Project Configuration KM5021.0
Notes:
Student Notebook
DataStage project configuration

Primary configuration is done in Administrator
Runtime Column Propagation (RCP) settings
DataStage project user permissions
Job sequence settings
Key environment variables
Parameter sets and values files
Figure 7-3. DataStage project configuration KM5021.0
Notes:
Primary project configuration is done by a DataStage administrator in the DataStage
Administrator client. The DataStage Administrator client contains a number of tabs where
these tasks are performed.
On the General tab, you can configure Runtime Column Propagation (RCP) settings,
default operational metadata handling, and the default workload management (WLM)
queue.
On the Permissions tab, you can specify DataStage user permissions.
On the Parallel tab, you can specify OSH visibility and format defaults.
On the Sequence tab, you can specify job sequence default settings.
On the Logs tab, you can specify job log default settings.
V7.0
Student Notebook
Uempty
Administrator tabs
General tab
Enable job administration in Director
RCP settings
Access to environment variables
Generate operational metadata
Workload management default queue
Permissions: Specify user roles
Tracing: Enable server side tracing
Schedule: Specify user ID for scheduled jobs
Only enabled on Windows
Mainframe: Defaults for mainframe jobs
Tunables: Defaults for Server jobs
Parallel: Defaults for Parallel jobs
Sequence: Defaults for Job Sequences
Remote: Used for job deployment on a USS system
Logs: Logging defaults
Figure 7-4. Administrator tabs KM5021.0
Notes:
On the General tab, you can configure Runtime Column Propagation (RCP) settings,
default operational metadata handling, environment variable settings, and the default
workload management (WLM) queue.
On the Permissions tab, you can specify DataStage user permissions.
On the Parallel tab, you can specify OSH visibility and format defaults.
On the Sequence tab, you can specify job sequence default settings.
On the Logs tab, you can specify job log default settings.
In addition, there are several tabs for special purpose configuration. The Schedule tab is
used by the DataStage job scheduler. It is only enabled on Windows platforms. The
Mainframe tab is only enabled if support for DataStage mainframe jobs has been installed.
The Tunables tab specifies defaults for DataStage server jobs. The Remote tab specifies
defaults for job deployment on a USS system.
Student Notebook
Administrator Project Properties
RCP Administrator tabs
Operational
metadata Edit environment
variables
Workload
Management
Figure 7-5. Administrator Project Properties KM5021.0
Notes:
This graphic shows the Administrator client tabs. The tabs described previously are at the
top. The General tab is selected and displayed.
Click the Environment button to edit environment variables.
If Workload Management is enabled (not enabled in this example), the default Workload
Management (WLM) queue is specified in the Queue box. Workload Management is
discussed in a later unit.
V7.0
Student Notebook
Uempty
Runtime Column Propagation (RCP)

When RCP is turned on:
Columns of data can flow through stages in a DataStage job without
being explicitly defined in the stage
Target columns in a stage need not have any columns explicitly mapped
to them
No column mapping enforcement at design time
Input columns are mapped to unmapped columns by name
How implicit columns get into a job
Read a file using a schema in a Sequential File stage
Read a database table using Select *
Explicitly defined as an output column in a stage earlier in the flow
Benefits of RCP
Job flexibility
Job can process input with different layouts
Ability to create reusable components in shared containers
Component logic an apply to a single named column
All other columns flow through untouched
Figure 7-6. Runtime Column Propagation (RCP) KM5021.0
Notes:
When RCP is turned on columns of data can flow through stages in a DataStage job
without being explicitly defined in the stage. Although this can be used to create DataStage
jobs that can process data in more flexible ways, it can also lead to unpredictable results in
DataStage jobs, if not handled carefully.
For this reason, if RCP is to be enabled, it is recommended that you not turn it on by
default. That way, job developers can turn it on, but it will not be turned on without their
explicit decision to do so.
Student Notebook
Enabling Runtime Column Propagation (RCP)

Project level
DataStage Administrator General tab
Job level
Job properties General tab
Stage level
Link Output Column tab
Settings at a lower level override settings at a higher level
For example, not turned on at the project level, but turned on
for a given job
For example, enabled at the job level, but not turned on for a
given stage
Figure 7-7. Enabling Runtime Column Propagation (RCP) KM5021.0
Notes:
RCP can be turned on at any level: project, job, stage. Settings at a lower level override
settings at a higher, more global, level. Therefore, even if RCP is not turned on by default, it
can be turned on at the job level or, even more specifically, at the individual stage level
within a DataStage job.
V7.0
Student Notebook
Uempty
Enabling RCP at project level
Check to enable
RCP to be used
Check to make
RCP the default for
new jobs
Figure 7-8. Enabling RCP at project level KM5021.0
Notes:
In this example, RCP has been enabled, but the Enable Runtime Column Propagation
for new links as been left unchecked. This means that when a new DataStage parallel job
is created, it will not automatically have RCP turned on. Developers can, if they choose,
turn it on for the job or for individual stages of the job.
If the Enable Runtime Column Propagation for Parallel Jobs is not checked, then
developers will not be able to use RCP in any of the jobs they develop.
Student Notebook
DataStage project user permissions

DataStage roles include:
DataStage Administrator
Role assigned in the IS Web Console
Has full access to all areas of a DataStage project including protected projects
DataStage Developer
Has full access to all areas of a DataStage project (except protected projects)
DataStage Production Manager
Has full access to all areas of a DataStage project including protected projects
DataStage Operator
Permission to run and manage DataStage jobs
DataStage Super Operator
Permission to run and manage DataStage jobs and to view objects in Designer (read-
only)
Protected projects
Objects in the project cannot be changed or deleted
Production Managers and DataStage Administrators can import objects into the
project
Use the Protect Project button to protect a project
Figure 7-9. DataStage project user permissions KM5021.0
Notes:
Another important task of the DataStage administrator is to specify DataStage user
permissions. For any IS user ID given the IS DataStage User role, the DataStage
administrator can specify a DataStage project role. There are several different types of
roles that can be assigned.
The DataStage Administrator, DataStage Production Manager, and DataStage Developer
roles give developers full access to all areas of a DataStage project. DataStage
Developers do not, however, have access to protected projects. A protected project is a
read only project. Objects imported into the project can neither be edited or deleted.
The DataStage Operator and Super Operator roles are more limited. Operators can only
log into DataStage Director and run DataStage jobs. They cannot log into DataStage
Designer and view or edit DataStage jobs. Super operators can log into Designer and view
jobs, but cannot modify jobs.
V7.0
Student Notebook
Uempty
Permissions tab
Added user
DataStage
administrators
Drop-down list
of project roles
Add new
user
Figure 7-10. Permissions tab KM5021.0
Notes:
DataStage Administrators, created in the Information Server Web Console, show up
automatically in the user list. DataStage users, created in the Information Server Web
Console, can be added to the user list. Then a role can be selected from the User Role list
for the user.
To add a user and assign a role to the user, click the Add User or Group button and
browse for a user to add. Then select the users role from the User Role list.
Student Notebook
Data Sets
Figure 7-11. Data Sets KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Data sets
Binary data file
Preserves partitioning
Component data set files are written to each partition
Suffixed by .ds
Referred to by a header file
Managed by:
Data Set Management utility from GUI (Designer, Director)
orchadmin command from the command line
Represents persistent parallel data
Figure 7-12. Data sets KM5021.0
Notes:
Data sets represent persistent data maintained in the Engine framework internal format.
The key feature of data sets, which distinguishes them from, for example, sequential files is
that they are partitioned. This makes them very useful as temporary staging files between
multiple jobs. They yield much better performance over sequential files because the data is
not collected, but remains partitioned.
Data sets are created and accessed using the Data Set stage in parallel jobs. Once
created, they are managed using the Data Set Management utility, accessible in DataStage
Designer and DataStage Director, and using the orchadmin command at the command
line on the engine server.
Student Notebook
Data sets
Key to good performance for DataStage applications in set of linked
jobs (possibly in a job sequence)
No import / export conversions are needed
No repartitioning needed
Written to and read from in DataStage jobs using Data Set stages
Implemented with two types of components:
Descriptor file:
contains metadata, data location, but NOT the data itself
Data component files
contain the data
multiple files, one per partition (node)
Figure 7-13. Data sets KM5021.0
Notes:
As mentioned previously, the key feature of data sets, which distinguishes them from, for
example, sequential files is that they are partitioned. This makes them very useful as
temporary staging files between multiple jobs. They yield much better performance over
sequential files because the data is not collected, and remains partitioned.
They support this structure through two components: Data component files for each
partition and a descriptor file containing references to the data component files.
The descriptor file does not itself contain any actual data. It just contains pointers to
component files containing the actual data. For this reason you need to be careful when
attempting to delete a data set. If you delete the descriptor file, without also deleting the
component data files, you have deleted only the smallest portion of the data set.
V7.0
Student Notebook
Uempty
Job with Data Set stage
Data Set stage
Data Set stage

properties
Figure 7-14. Job with Data Set stage KM5021.0
Notes:
This graphic shows an example of a DataStage parallel job with a Data Set stage. The
Data Set stage has been opened to reveal its properties. The file path specified is to the
Testdata.ds data set file. Data sets must be created with the .ds extension. The path
shown is specifies where the descriptor file component of the data set will be created. The
data file component files will be created in folders specified in the configuration file.
Student Notebook
Data Set Management utility
Display schema
Display data
Display record counts

for each data file (one
per node)
Figure 7-15. Data Set Management utility KM5021.0
Notes:
The Data Set Management utility window is available from both Designer and Director. In
Designer, click Tools>Data Set Management to open this window. Use the icons at the
top to display its schema, which corresponds to a table definition, and its data, by partition.
In addition to viewing the data and format of the data set, you can use the Data Set
Management tool to copy and delete data sets. When used, these functions will
copy/delete all components of the data set, including its descriptor file and its component
data files.
V7.0
Student Notebook
Uempty
Data and schema displayed
Data Set viewer
Schema describing the

format of the data
Figure 7-16. Data and schema displayed KM5021.0
Notes:
This graphic shows examples of displaying the data within a data set and displaying its
schema. The schema describes the format of the data within the file, that is, its columns
and their data types.
Student Notebook
Application Data Set usage

Used when writing staging results between jobs
Can function as checkpoints
Stored in native internal format
No conversion overhead
Retain data partitioning and sort order
Provides end-to-end parallelism across jobs
Maximum performance through parallel I/O
Not intended for long-term or archival storage
Internal format is subject to change with new DataStage releases
Requires access to named resources
Node names, file system paths, and so on
Binary format is platform-specific
For fail-over scenarios, servers should be able to cross-mount
file systems
Can read a data set as long as your current configuration file defines the
same Node names
Use orchadmin x to recover data from a data set when node names are no
longer available
Figure 7-17. Application Data Set usage KM5021.0
Notes:
Although the internal format of data sets is subject to change it should be upward
compatible. That is, jobs built in future releases of DataStage should be able to read data
sets created using earlier versions. Nevertheless, data sets are not recommended for
long-term or archival storage, since they cannot be read outside of DataStage.
A data set is linked to the configuration file used to create it. That is, the number of nodes in
the configuration file determines the number of component data files. And the names of the
nodes and the paths to the data component files are referenced in the data file. This means
that if a job using a different configuration file than the one that was used to create the file
may not be able to read the data in the file.
V7.0
Student Notebook
Uempty
Using orchadmin command utility

Execute dsenv to initialize the DataStage environment
From the DataStage $DSHOME directory
(/IBM/InformationServer/Server/DSEngine): Run the orchadmin command
$APT_CONFIG_FILE variable needs to be set to the path of the configuration file
used to create the data set
This can be done by adding a line to the dsenv file
Execute orchadmin command

In /PXEngine/bin directory
orchadmin help to get documentation on parameters
orchadmin ll datafile.ds lists all the partitioning information, data files, and
schema of datafile.ds
Figure 7-18. Using orchadmin command utility KM5021.0
Notes:
The orchadmin utility is run on the DataStage Server system. It provides a command-line
interface to data set administration tasks.
Before you run the orchadmin utility you need to initialize the DataStage environment
using dsenv. In addition, the $APT_CONFIG_FILE variable needs to be set to the path of
the configuration file used to create the data set. This can be done by adding a line to the
dsenv file, as shown in the graphic. (The dsenv file, and how to edit it, is discussed in
more detail in a later unit.)
The orchadmin script is located in the /PXEngine/bin directory. It is a very powerful
command with more functionality than the Data Set Management utility in DataStage
Designer. You can use the orchadmin -help command to get documentation on its
parameters.
As an example, the following command lists all the partitioning information, data files, and
schema of a data set named datafile.ds: orchadmin II datafile.ds
Student Notebook
"orchadmin ll" command output
Run Directory with

orchadmin Initialize dsenv
command environment
Figure 7-19. "orchadmin ll"command output KM5021.0
Notes:
This graphic shows an example of using the orchadmin command with the II parameter.
First the environment is initialized using the dsenv command. Then the orchadmin
command is run.
To determine the number of records in a data set, you can also use dsrecords.
V7.0
Student Notebook
Uempty
Sample orchadmin ll data set report
Number of file
partitions
Number of
records in the
file partition
Figure 7-20. Sample orchadmin ll data set report KM5021.0
Notes:
This graphic shows an example data set report generated by the orchadmin II command.
The information includes the number of file partitions, the number of records in each file
partition, and the paths to the data component files of the data set.
Student Notebook
Environment Variables
Figure 7-21. Environment Variables KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Setting environment variables

Jobs inherit environment variables at runtime based on this order
of evaluation:
Environment variables defined in $DSHOME/dsenv
Shared by all projects on the DataStage server
Project-level environment variables defined by DataStage Administrator
Duplicate variables over-ride $DSHOME/dsenv
Job-level environment variables set in job parameters
Duplicate variables over-ride $DSHOME/dsenv and project-level
settings
An extensive list of environment variables for parallel jobs is

found in the Parallel Job Advanced Developers Guide
Figure 7-22. Setting environment variables KM5021.0
Notes:
There are three places where environment variable values can be specified. Those
specified in the dsenv file apply to all DataStage projects. Those set in Administrator apply
to a specific project. Those set in the job apply just to the job.
$DSHOME is a variable defined in the dsenv file that specifies the DataStage home
directory. By default, this is /InformationServer/Server/DSEngine.
Student Notebook
Environment variable settings in dsenv

The Engine inherits environment variable settings of the user
who starts the Engine and the environment variables settings
in dsenv
The dsenv file:
Used by the DataStage daemon at Engine start-up
The Engine needs to be restarted to apply any changes
Used to set the database and operating system environment for
DataStage jobs to inherit
Provides default settings globally, for all projects
Figure 7-23. Environment variable settings in dsenv KM5021.0
Notes:
The dsenv file specifies the DataStage environment. It is read by the DataStage daemon
at Engine startup. Environment variable settings in the dsenv file apply globally to all
projects.
The Engine inherits environment variable settings of the user who starts the Engine and the
environment variables settings in dsenv at the time the Engine is started.
V7.0
Student Notebook
Uempty
Minimum set of environment variables in dsenv

LD_LIBRARY_PATH
Path includes DSEngine/lib and PXEngine/lib
PATH
Path includes DSEngine/bin and PXEngine/bin
DSHOME
Path to the /DSEngine directory
APT_ORCHHOME
Path to the /PXEngine (Parallel Engine) directory
Add APT_CONFIG_FILE
Path to the default configuration file
Used by some utilities such as orchadmin
Add specific variables that are required by the DBMS client software
See the connectivity documentation on what environment variables are needed
for DBMS clients
Figure 7-24. Minimum set of environment variables in dsenv KM5021.0
Notes:
This lists some of the main environment variables that need to be set in the dsenv file in
order for DataStage to run.
The DataStage Engine consists of two separate engines: the parallel engine and the server
engine. /DSEngine is the home of the server engine. /PXEngine is the home of the parallel
engine.
Student Notebook
Project level environment variables

Overrides environment in dsenv
Specified in Administrator
Use the User Defined section to define new variables
For example, required DBMS client variables not specified in dsenv
Variable
setting
User Defined
variables
Figure 7-25. Project level environment variables KM5021.0
Notes:
Environment variables defined in Administrator apply to a specific project. They override
any settings in the dsenv file.
The User Defined section can be used to create and set variables that do not exist as part
of the standard system. This might include variables required for data resources or custom
stages.
V7.0
Student Notebook
Uempty
DSParams file
Stores project level
environment variables
for each DataStage
project
Gets entries from
Administrator
Should not be edited
Can be copied between
projects to deploy the
settings you have
configured
Figure 7-26. DSParams file KM5021.0
Notes:
The DSParams file is a DataStage system file used by DataStage to keep track of
environment variable settings.
In general, the DSParams file should not be directly edited; appropriate entries are
somewhat complex, and if you make a mistake you can possibly disable DataStage.
However, you can copy this file and then replace it when backing up, deleting, and
restoring a project.
Student Notebook
Operational Metadata
Figure 7-27. Operational Metadata KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Capturing operational metadata

Operational metadata must be generated in DataStage before it can be
captured
To generate operational metadata for a DataStage job
Run the job with Generate Operational Metadata box checked
Use DataStage Administrator to set this as the default
XML files are generated in the IBM/InformationServer/ASBNode/conf/etc/XmlFiles
directory (default)
To capture operational metadata
Use the Run Import utility
Change to /IBM/InformationServer/ASBNode/bin directory (default)
Execute RunImportStart.sh (RunImportStart.bat on Windows)
Before you run the Run Import utility the first time, it must be configured
Edit runimport.cfg file
Configuration file in /IBM/InformationServer/ASBNode/conf directory (default)
Minimally, specify Operational Metadata Administrator user/password, DataStage Server
host name, port number
Configure other parameters as needed
Figure 7-28. Capturing operational metadata KM5021.0
Notes:
Operational metadata describes events and processes that occur and objects that are
affected when a DataStage job is run.
Operational metadata must be generated before it can be captured. To generate
operational metadata for a DataStage job, run the job with Generate Operational
Metadata box checked.
Use the Run Import utility to capture the generated metadata. Capturing the metadata
refers here to loading the metadata into the Information Server Repository where it can be
viewed and analyzed using Information Manager products and tools, such as Metadata
Workbench.
Student Notebook
Operational metadata option in Administrator
Project default
Figure 7-29. Operational metadata option in Administrator KM5021.0
Notes:
You can specify that operational metadata is generated by default by selecting the
Generate operational metadata box in Administrator, as shown here.
V7.0
Student Notebook
Uempty
What is operational metadata?

Describes events and processes that occur and objects that are
affected when a DataStage job is run
After the job is run, a variety of information about the job run is
available, including:
Start, stop and elapsed time for a job execution
How many rows were read, written, or referenced
Tables and files that were read from, written to, or referenced
Stages and links in the job
Project the job was in
Parameters used by the job
Figure 7-30. What is operational metadata? KM5021.0
Notes:
When operational metadata is generated, XML files are created that contain the
operational metadata for the job runs. By default, these XML files are saved to the folder
/IBM/InformationServer/ASBNode/conf/etc/XmlFiles on the drive where you installed
Information Server.
To load the operational metadata in the Information Server Repository, so that it can be
viewed and analyzed, you run the Run Import utility. The Run Import utility imports the
contents all XML files in the XmlFiles folder into the Repository, and then deletes the files
(or moves them to a folder of your choice).
To study the operational metadata that you imported, you can create a report on the
operational metadata in the Reporting tab of IBM Information Server Web console.
When you no longer need the operational metadata, you can delete it from the Repository.
Student Notebook
Configuring Run Import (runimport.cfg) file
IS admin user
Password is
encrypted when the
file is saved
Repository host
name
Figure 7-31. Configuring Run Import (runimport.cfg) file KM5021.0
Notes:
Before you can execute the Run Import utility to load the generated operational metadata
into the Repository, the utility must first be configured. The runimport.cfg file is used to
configure the utility. The essential properties that need to be configured are highlighted i in
this graphic.
The configuration file is located by default in the /InformationServer/ASBNode/conf
directory.
You must specify the user ID and password the utility is to use to access the Information
Server Repository. In this example, isadmin is used. You must also specify the name of the
Repository host system (in this example, EDSERVER.IBM.COM) and the port number
used to connect to it (by default, 9080).
V7.0
Student Notebook
Uempty
Generated XML files
Directory with
configuration file
Directory with
generated XML
files
XML files with

operational metadata.
One per job, per job
run
Figure 7-32. Generated XML files KM5021.0
Notes:
This example shows an XML file that was generated when the desRowGenDataSet
DataStage job was run. Each run of a DataStage job produces an XML file. After the XML
file is generated, you can now run the Run Import utility to load this operational metadata
into the Repository.
Student Notebook
Executing the Run Import utility
Directory with utility
Run Import
Utility
After run, you can check whether the /XmlFiles directory is empty
The XML files containing the operational metadata are deleted after they
are imported into the Repository
Figure 7-33. Executing the Run Import utility KM5021.0
Notes:
The Run Import utility is by default located in the /IBM/InformationServer/ASBNode/bin
directory. First change to the directory containing utility, as shown, and then run the utility,
as shown. Review the messages output from the utility. In this example, the message tells
us that one XML file was successfully loaded into the Repository.
V7.0
Student Notebook
Uempty
Job run reports

Reports can be created on job runs where operational metadata has
been collected. Reports contain:
Design information
Job start and end times
Job duration
Parameter values job ran under
Figure 7-34. Job run reports KM5021.0
Notes:
Reports can be created on job runs after operational metadata has been collected. The
reports contain a variety of information including design information, start and end times,
job duration, and parameter values job ran under.
This graphic shows an example of one such report.
In addition, reports and analyses can be generated within Metadata Workbench. These
analyses can show the flow of data through a series of jobs and data resources.
Student Notebook
Deleting operational metadata

Operational Metadata accumulates in the repository as jobs are run
and consumes space
Operational Metadata may be deleted for:
Individual job runs
For all job runs within a specified date range
Deletions may scheduled
Deletions are configured in PurgeJobRuns.sh
Specific jobs are identified by their activity ID
Available in job run reports
Figure 7-35. Deleting operational metadata KM5021.0
Notes:
A large amount of operational metadata can accumulate in the Repository.
To delete operational metadata from the Repository do the following. In a text editor, open
the PurgeJobRuns.sh file. This file is in the opt/IBM/InformationServer/ASBNode/bin
directory. At the end of the text in the file, type the appropriate command to delete
operational metadata for one or more job runs:
To delete operational metadata for a single job run, type the -activityID command
followed by the activity ID of the run in quotation marks, for example -activityID
"multilink 2006-06-19 00:00:03". You can specify only one activity ID.
To delete operational metadata for all jobs that ran in a range of dates, type the
-beginDate command, followed by the beginning date of the range, in the format
YYYY-MM-DD, followed by the -endDate command, followed by the last date in the
range, for example -beginDate 2006-06-07 -endDate 2006-06-20. This command
deletes operational metadata for jobs that ran on the beginning date, ending date, and
all days in the range.
V7.0
Student Notebook
Uempty Just before the end of the text in the file, change the values for -user and -password to the
credentials for a user who has the Operational Metadata Administrator role.
From the command line, run the file. The operational metadata for the specified run or runs
will be deleted from the Repository.
Student Notebook
Multiple Job Compile Utility
Figure 7-36. Multiple Job Compile Utility KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Multiple job compile

After jobs are moved, through the export/import process, from
one project to another, it is sometimes necessary to recompile
them
Compiling jobs one at a time by opening them up and clicking the
Compile button could be very time-consuming
Multiple Job Compile utility allows you to specify a batch of
jobs to compile
In DataStage Designer, click Tools>Multiple Job Compile to
begin the process
Figure 7-37. Multiple job compile KM5021.0
Notes:
If you move DataStage jobs from one system to another it is recommended that you
recompile the jobs to make sure that they will run on the new system. This can be very time
consuming if you open and compile one job at a time in Designer. Fortunately, there is a
utility you can use to compile batches of DataStage jobs at one time.
To open the utility, in DataStage Designer, click Tools>Multiple Job Compile to begin the
process.
Student Notebook
Selection Criteria window

Specify what types of jobs to compile and other options
Figure 7-38. Selection Criteria window KM5021.0
Notes:
When you open the Multiple Compile utility the Selection Criteria window is displayed.
Select the types of jobs you want to compile. By default, all types of jobs are selected.
By default, only uncompiled jobs are selected for compile. If you are moving jobs to a new
system, it is a good idea to force a recompile of all jobs, so you should change this default.
V7.0
Student Notebook
Uempty
Selection Override window

Add and remove specific jobs from the compile
Figure 7-39. Selection Override window KM5021.0
Notes:
On the Selection Override window you can add or remove specific jobs from the compile
process. The selected jobs are displayed in the Selected items panel. Use the Add> and
<Remove buttons to add or remove jobs from the compile queue.
Student Notebook
Compile Process window

Lists jobs queued for compilation
Optionally, generate a report at the end
Click Start Compile to begin compiling
Queued jobs
Generate
Start compile report
Figure 7-40. Compile Process window KM5021.0
Notes:
On the Compile Process window you see the jobs queued for compile. Click the Start
Compile button to begin processing the queue.
A report is generated when the compile process is complete, identifying which jobs
compiled successfully, and which jobs failed to compile.
V7.0
Student Notebook
Uempty
Checkpoint
1. What do you need to do to configure a project to collect
operational metadata?
2. What tool can you use to view the data in a data set on a
partition-by-partition basis?
3. What is RCP (Runtime Column Propagation)?
4. What is a DataStage "protected project"?
Notes:
Student Notebook
Exercises Unit 07
Configure a DataStage project
View a data set using the Data Set
Management tool
Manage data sets from the command
line
Configure the Engine for operational
metadata collection
Generate operational metadata
View an operational job run report
Use Multiple-Job Compile tool
Notes:
V7.0
Student Notebook
Uempty
Unit summary
Manage data sets
Configure the Engine to gather and process operational
metadata
DataStage jobs
Notes:
Student Notebook
V7.0
Student Notebook
Uempty Unit 8. Engine Tier Database Connectivity

This unit describes how to establish connectivity between Information
Server and databases using direct API connections and ODBC
connections.

Configure the Engine to connect to databases using direct API
connections
Configure the Engine to connect to databases using ODBC drivers

Copyright IBM Corp. 2007, 2012 Unit 8. Engine Tier Database Connectivity 8-1
Student Notebook
Unit objectives
connections
Configure the Engine to connect to databases using ODBC
drivers
Notes:
V7.0
Student Notebook
Uempty
Enterprise Application Connectivity
Figure 8-2. Enterprise Application Connectivity KM5021.0
Notes:
Student Notebook
Engine database connectivity

Connectivity to databases is primarily provided in one of two ways:
ODBC connectivity: Wired or non-wired
Wired drivers connect directly to the database server
Do not require database client software
Non-wired drivers connect through the database client software
DBMS-specific API integration
Both share similar configuration requirements
Vendor connectivity software
File system permissions
Environment variables
Database permissions
Configuration
Figure 8-3. Engine database connectivity KM5021.0
Notes:
Connectivity to databases within a DataStage project and within Information Server
generally is established either through ODBC connectivity or DBMS-specific API
connectivity, configured in the Engine tier.
ODBC connectivity can be wired or non-wired. Connectivity that is wired does not require
database client software to establish the connection. The connection is wired directly to the
database. Non-wired connectivity requires database client software to be installed on the
Engine server system.
V7.0
Student Notebook
Uempty
Engine database connectivity, continued

DataStage Engine acts as a client to the database
DataStage Engine does not talk directly to the database server
Uses API or ODBC (as configured)
For API configuration and ODBC non-wired connections, database
client software is required and must reside on the DataStage Engine
server
All database specific environment variables must be set up for the
project or in the global environment file (dsenv file)
Environment variables are specific to vendor or ODBC provider software
$ORACLE_HOME, $DB2INSTANCE, and so on
Edit the $DSHOME/dsenv file for global environment variable settings
Enterprise and Connector database stages:
SELECT privileges on system tables (Ex: Oracle, DB2)
Environment variables set up for projects
Set up in DataStage Administrator
Figure 8-4. Engine database connectivity, continued KM5021.0
Notes:
The main difference between configuring ODBC connectivity and configuring database API
connectivity is in how it is done. API connectivity is set up using environment variables in
the project or in the global dsenv file. ODBC connectivity is set up in configuration files
stored in DataStage directories.
It is important to be aware that the connectivity established does not apply just to
DataStage, but to Information Server as a whole. Connections created in FastTrack and
Information Analyzer, for example, require that the connectivity has been established in
DataStage. DataStage acts as a client to the database for other Information Server
products.
Student Notebook
Information Server connectivity

Wide range of sources
Information
InformationSources
Sources&&Targets
Targets Enterprise applications
Mainframe, mini-computer and open systems
Flat files, hierarchical, relational and
proprietary databases
Message queues, EDI
PeopleSoft
XML, and programming languages
Web
Content Broad functionality
Native adapters, and protocols
SAP Multi-byte enabled
Optimized parallel RDBMS interfaces
Oracle
Standards-based
Batch, business objects, and data access
Legacy Common query mechanisms
data Integrates source metadata
Siebel
Extensive Changed Data Capture
Real-time/push and batch/pull
Active and archive log based
Files
Teradata Trigger and time/date stamp based
MQ, TCP/IP & FTP data delivery
Figure 8-5. Information Server connectivity KM5021.0
Notes:
Information Server supports a wide range of different types of data resources. This graphic
lists some of the main types. Not only does Information Server support connectivity to
database systems, such as Oracle and DB2, but it also supports connectivity from
enterprise applications, such as PeopleSoft and SAP.
Mainframe resources, such as COBOL VSAM files, are supported. Support is provided for
many different types of files, including flat files, hierarchical files, and XML files.
V7.0
Student Notebook
Uempty
Information Server supported connectivity

RDBMS General Access Standards & Real Time Legacy
DB2 (on Z, I, P or X series) Sequential File InfoSphere MQ Allbase/SQL
Oracle Complex Flat File Java Messaging Services C-ISAM
Informix (IDS and XPS) File / Data Sets (JMS) D-ISAM
Ingres Named Pipe Java Datacom/DB
MySQL FTP XML & XSL-T DS Mumps
Netezza Compressed / Encoded Data EBXML Enscribe
Progress External Command Call Web Services (SOAP) Essbase
RDB Parallel/wrapped 3rd party Enterprise Java Beans (EJB) FOCUS
RedBrick apps EDI IDMS/SQL
SQL/DS EMC InfoMover FIX ImageSQL
SQL Server Web logs SWIFT Infoman
Sybase (ASE & IQ) Unstructured: e-mail, docs, HIPAA KSAM
and so on
Teradata M204
Content Management
Universe Systems CDC / Replication MS Analysis
UniData Life Sciences DB2 (on Z, I, P, X series) Nomad
NonStopSQL Oracle Nucleus
And more.. Enterprise Applications SQL Server RMS S2000
JDE/PeopleSoft Sybase Supra
EnterpriseOne TOTAL
Informix
Oracle Applications TurboImage
IMS
PeopleSoft Enterprise Unify
VSAM
SAS And many more.
ADABAS
SAP R/3 & BI
IDMS
SAP XI
NonStopSQL
Siebel
Enscribe
Salesforce.com
JDA
Ariba
And more Copyright IBM Corporation 2007, 2012
Figure 8-6. Information Server supported connectivity KM5021.0
Notes:
For reference, this graphic gives a detailed list of major supported data sources organized
by type.
Student Notebook
Configuring Database Connectivity
Figure 8-7. Configuring Database Connectivity KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Database connectivity software requirements

Connecting to the database using the DBMS-specific API requires that
the DBMS client software be available
Software must be available on each server the DataStage Engine is running on
Connecting to the database using ODBC requires that the ODBC driver
be installed
Information Server includes a set of ODBC drivers for many enterprise DBMSs
ODBC wired drivers connect directly to the database server
Do not require additional database client software
ODBC non-wired drivers connect through the database client software
Require database client software
Figure 8-8. Database connectivity software requirements KM5021.0
Notes:
Connecting to a database using a database API requires client software for the database.
Information Server does not provide this client software.
Connecting to a database using ODBC requires ODBC drivers. Information Server installs
a set of ODBC drivers for many enterprise DBMSs. ODBC wired drivers connect directly to
the database server and do not require any additional client software. ODBC non-wired
drivers do require additional client software, because they use the client software to make
the connection.
Student Notebook
Common database software requirements

This table provides an overview of the DBMS software requirements for
many major databases
Database DBMS Software Requirement to Are alternative ODBC drivers
use the DBMS API included with Information Server?
DB2 DB2 client Yes
Oracle Oracle database client Yes
SQL Server MDAC (client available on Windows only) Yes

(allows access from Win & Unix)
Teradata Teradata tools and utilities (CLIv2 & Parallel Yes

Transporter)
Sybase Sybase Open Client Yes
Netezza Netezza client tools (including the ODBC No

driver) (available from Netezza)
Informix Informix CLI Yes
Figure 8-9. Common database software requirements KM5021.0
Notes:
This table provides an overview of the DBMS software requirements for several major
databases. The first column lists the databases. The second column identifies the client
software needed to use direct database connectivity. The third column identifies whether
ODBC drivers are provided in the Information Server installation package for the database.
V7.0
Student Notebook
Uempty
File system permission requirements

The user ID running a DataStage job or other Information Server
process must have adequate permissions to access the file system
If database client software is required, the user ID must have file
permissions adequate to access the client software
If ODBC drivers are being used to access a database, the user ID must
have permission to access the driver files
Some customers restrict read access to the database file system as a
security measure
This can lead to permission problems
Retaining the permission settings applied by the DBMS installer during database
installation can avoid such problems
Figure 8-10. File system permission requirements KM5021.0
Notes:
The user ID running a DataStage job or other Information Server process must have
adequate permissions to access the file system. This includes access to data resource
client software and driver files.
Some customers, as a security measure, restrict access to the database file system. Be
aware that this can lead to permission issues that can cause jobs to fail.
Student Notebook
Engine environment variable requirements

For all RDBMs: Set $LD_LIBRARY_PATH ($LIBPATH on some
Unix platforms) to the database library path
In addition, there are database-specific environment variables
Based on the DBMS vendor client software instructions
Set the environment variables in the DataStage Engine dsenv file if
you want the setting to apply to all projects
Stored by default in the $DSHOME directory
$DSHOME specifies the DataStage home directory:
/IBM/InformationServer/Server/DSEngine
The DataStage Server must be stopped and restarted for the new
dsenv file settings to take effect
Figure 8-11. Engine environment variable requirements KM5021.0
Notes:
The primary environment variable requirement for API database connectivity is setting the
$LD_LIBRARY_PATH ($LIBPATH on some UNIX platforms) to the database library path.
In addition, there are often additional database-specific environment variables that need to
be set. Some are optional and some are necessary.
Unless the connectivity will only be used for specific DataStage projects, the required
environment variable settings should be set in the DataStage Engine dsenv file. This file
initializes the Engine environment. It applies to all DataStage projects and sets the Engine
environment for other Information Server products, such as FastTrack and Information
Analyzer.
V7.0
Student Notebook
Uempty
Database-specific environment variables

Database Environment Variables
Home Dir Instance/DB NLS Setting Others
DB2 DB2DIR DB2INSTANCE/ DB2CODEPAGE INSTHOME

DB2DBDFT
Oracle ORACLE_HOME ORACLE_SID NLS_LANG TNS_ADMIN (if tnsnames.ora

in non standard location)
ODBC/ ODBCHOME n/a n/a defined in ODBCINI (path to odbc.ini file)

SQL Server .odbc.ini
Teradata TWB_ROOT (for COPERR, COPLIB,

ParallelTransporter TD_ICU_DATA
)
Sybase SYBASE n/a n/a - defined by the ASDIR (for IQ); SYBASE_OCS
OS locale (dir under $SYBASE for OCS)
Netezza NETEZZA n/a n/a defined in the NZ_ODBC_INI_PATH (points

load options to .odbc.ini file)
Informix INFORMIXDIR INFORMIXSERVER CLIENT_LOCALE INFORMIXSQLHOSTS
Figure 8-12. Database-specific environment variables KM5021.0
Notes:
This table lists some of the environment variables that need to be set for some common
types of database systems. The first column lists the database. The remaining columns list
some of the different types of environment variables that need to be set. There are
environment variables for specifying the database home directory, the database instance
(where applicable), the NLS coding system, and miscellaneous variables specific to the
database.
Student Notebook
Database permission requirements

The user ID used to SELECT, INSERT, UPDATE, or LOAD to a
database must have the required database permissions
Authentication rights
Administrative authorities
Object privileges
tables, partitions, indexes, space,
Some DataStage database stages may also require some degree of
database system catalog access
Requirements vary depending on the type of stage and the type of database
Figure 8-13. Database permission requirements KM5021.0
Notes:
DataStage jobs that access a database must have the required database permissions for
issuing the SQL statement or command used to access the data. Typically, the user ID
used to access the database is specified in the DataStage job stage used to access the
database. The user ID and password can be parameterized, and passwords can be
encrypted.
V7.0
Student Notebook
Uempty
Setting LD_LIBRARY_PATH in Administrator

On the General tab, click Environment
Select the General folder
Add the database library setting to the $LD_LIBRARY_PATH variable
LD_LIBRARY_PATH
Figure 8-14. Setting LD_LIBRARY_PATH in Administrator KM5021.0
Notes:
This graphic shows how to set the $LD_LIBRARY_PATH variable in DataStage
Administrator, for a specific project. In DataStage Administrator, open up the Environment
Variables window. The $LD_LIBRARY_PATH variable is located in the General folder.
Student Notebook
Operator specific environment variables

Examine APT_DB2INSTANCE_HOME and APT_DBNAME
Variables are specific to DB2
APT_DB2INSTANCE_HOME identifies the DB2 instance home
directory
APT_DBNAME specifies the default DB2 database
Figure 8-15. Operator specific environment variables KM5021.0
Notes:
There are, similarly, other sets of environment variables specific to the type of database
system. For example, $APT_DB2INSTANCE_HOME and $APT_DBNAME are
environment variables specific to DB2. Generally, these variables are found in the
Operator Specific folder.
V7.0
Student Notebook
Uempty
Setting LD_LIBRARY_PATH in the dsenv file

The dsenv file is used to initialize the Engine environment
It is executed during the Engine startup
It can be executed at the Engine server command line or terminal
window to initialize the session environment for running Engine
commands
For example, you need to execute dsenv before running the orchadmin
command
Editing the LD_LIBRARY_PATH in the dsenv file makes
these settings available to all DataStage projects and to all
Information Server products and components that use the
Engine settings
Connectors are used in several products (FastTrack, Information
Analyzer) to connect to data sources and to import metadata
These Connectors may use database library settings configured within
dsenv
Figure 8-16. Setting LD_LIBRARY_PATH in the dsenv file KM5021.0
Notes:
The dsenv file is used to initialize the DataStage Engine environment. It is executed
automatically during the Engine startup. This establishes the environment for all DataStage
projects as well other Information Server products and components that use the Engine.
This file can also be executed at the Engine server command line or terminal window to
initialize the session environment for running Engine commands. For example, you need to
execute dsenv before running the orchadmin command.
Editing the $LD_LIBRARY_PATH in the dsenv file makes these settings available to all
DataStage projects and to all Information Server products and components that use the
Engine settings. Connectors are used in several products (FastTrack, Information
Analyzer) to connect to data sources and to import metadata. These connectors may use
database library settings configured within dsenv.
Student Notebook
dsenv file
Located in $DSHOME (/IBM/InformationServer/Server/DSEngine)
Initializes variables: $DSHOME, $APT_ORCHHOME, $ODBCINI,
$LD_LIBRARY_PATH, $APT_CONFIG_FILE
Edit it to add additional variables and database library settings
LD_LIBRARY_PATH
DB2 library
Parallel Engine
library
Global environment
variable setting
Figure 8-17. dsenv file KM5021.0
Notes:
The dsenv file is located in $DSHOME (/IBM/InformationServer/Server/DSEngine). Part
of its initialization involves setting various environment variables, some of which are shown
here. You can edit this file to add additional environment variable settings.
Be careful when editing this file. DataStage will not run if this file becomes corrupted.
The orchadmin command, which was used in an earlier unit to describe a data set,
requires that $LD_LIBRARY_PATH be set to the parallel engine library path and that the
$APT_CONFIG_FILE variable be set. Before running orchadmin, edit the dsenv file to
include these settings and initialize the command session by running the dsenv file.
Also highlighted in the graphic is the DB2 library path that has been added to
$LD_LIBRARY_PATH.
V7.0
Student Notebook
Uempty
ODBC Setup
Figure 8-18. ODBC Setup KM5021.0
Notes:
Student Notebook
ODBC drivers
Data Direct ODBC drivers for DataStage are installed as part of the
Information Server installation
Installed in the ODBCDrivers subdirectory
DataDirect documentation on the drivers is in the
IBM/InformationServer/Server/branded_odbc folder
odbcref.pdf has documents all the drivers
Additional information is contained in the other PDFs in the folder
Figure 8-19. ODBC drivers KM5021.0
Notes:
Data Direct ODBC drivers for DataStage and QualityStage are installed as part of the
Information Server installation. The Data Direct documentation on the drivers is in the
IBM/InformationServer/Server/branded_odbc folder.
V7.0
Student Notebook
Uempty
ODBC architecture
ODBC Architecture
Datastage Server
Driver Manager
ODBC Driver Datastage Server
Wired drivers Client Library Software

Sybase Open Client Non-wired drive only
Oracle SQL*Net
Database Server Database Server
Figure 8-20. ODBC architecture KM5021.0
Notes:
This graphic describes the ODBC architecture. DataStage accesses the ODBC driver
through the ODBC driver manager. If the driver is non-wired, then the driver accesses the
database server through the client software. Otherwise, it accesses the database server
directly.
Student Notebook
Configuring ODBC connections

Two files need to be set up for ODBC connections
.odbc.ini
Information needed for connecting to the databases
Not needed on Windows systems because Windows Data Source manager
stores this information
uvodbc.config
Entries for ODBC DSNs (Data Source Names)
These files are located by default in the $DSHOME directory
Path to the /InformationServer/Server/DSEngine directory
uvodbc.config is also copied to each project folder
Setup information is different for wired and non-wired ODBC drivers
Non-wired drivers require information about database client software
Environment variables required by the database client software
Database home directory
Database library directory
The PATH environment variable
Wired drivers require information about the database itself
No changes are required to the dsenv file
Figure 8-21. Configuring ODBC connections KM5021.0
Notes:
Two files need to be configured to establish ODBC connections. The .odbc.ini file is
needed for connecting to the databases. The uvodbc.config contains entries for the
ODBC data source names, so that these are available in drop-down lists within DataStage
and Information Server products and components.
Both configuration files are located in the $DSHOME directory. uvodbc.config is copied to
each DataStage project directory (/InformationServer/Server/Projects/ProjectName)
when the engine is started, so that the settings will apply to all projects. You can also edit
the uvodbc.config files in the project directories.
V7.0
Student Notebook
Uempty
Sample database settings to add to dsenv
LD_LIBRARY_PATH
setting
Export variable
DB2INSTANCE
setting
Export variable
Figure 8-22. Sample database settings to add to dsenv KM5021.0
Notes:
Environment variables settings can be specified in the dsenv file. This graphic shows
some examples of how to do this. The top graphic shows some environment variable
settings for Sybase and Informix databases. The bottom graphic shows some environment
variable settings for DB2.
Student Notebook
.odbc.ini file
For wired drivers, gives information about connecting to the database
server
For non-wired drivers, gives information about connecting to the
database client
Environment variables required by the database client software
Database home directory
Database library directory
The PATH environment variable
Location of the file is specified by the ODBCINI environment variable
By default in dsenv file: ODBCINI=$DSHOME/.odbc.ini
Entry in dsenv
.odbc.ini file contains sample entries for most databases

First make a copy of the entry and then modify it as necessary
Add new data source to data source list at the top of the .odbc.ini file
Figure 8-23. .odbc.ini file KM5021.0
Notes:
For wired drivers, the .odbc file gives information about connecting to the database server.
For non-wired drivers, it gives information about connecting to the database client.
The .odbc.ini file contains sample entries for most databases. First make a copy of the
entry and then modify it as necessary. Also add the new data source name to the list at the
top of the .odbc.ini file.
The location of the .odbc.ini file is specified in the dsenv file. The ODBCINI environment
variable specifies its location. In this example, the location is specified as $DSHOME, that
is, /InformationServer/Server/DSEngine.
V7.0
Student Notebook
Uempty
Sample .odbc.ini entry
Sample settings
for connecting to
the DB2 server
using the DB2
wired ODBC
driver to connect
to DB2 SAMPLE
database
Figure 8-24. Sample .odbc.ini entry KM5021.0
Notes:
To create this entry, copy and paste the sample entry in the .odbc.ini file headed [DB2
Wire Protocol]. Then modify the text as necessary. In this example, the name of the
database (SAMPLE), the logon ID and password (db2inst1/db2inst1), and the TCP port
number (50000) were specified.
Student Notebook
.odbc ODBC data source listing

At the top of the .odbc.ini file is a listing of ODBC data sources
Entries in the list show up in IS client drop-down lists in various places
Add additional entries to this list when you define new data
sources in the .odbc.ini file
Entry for
SAMPLE data
source
Figure 8-25. .odbc ODBC data source listing KM5021.0
Notes:
At the top of the .odbc.ini file is a listing of ODBC data sources. This list shows up in
drop-down lists in DataStage and Information Server components. Add additional entries to
this list as you define new data sources in the .odbc.ini file.
In this example, the SAMPLE entry has been added.
V7.0
Student Notebook
Uempty
uvodbc.config
Contains entries of each DSN to be accessed through Information
Server
There are multiple copies of the uvodbc.config file
One copy is in the $DSHOME directory
A copy can also exist in each project directory
(/InformationServer/Server/Projects)
The project uvodbc.config file, if it exists, takes precedence over the $DSHOME
copy
Entries have the form:
<Data source name>
Must match the name specified in the .odbc.ini file
DBMSTYPE = ODBC
Figure 8-26. uvodbc.config KM5021.0
Notes:
The uvodbc.config file contains entries for each DSN to be accessed through Information
Server. The data source name in the entry must match the name specified in the .odbc.ini
file. For example, recall that on a previous page a data source named [SAMPLE] was
created. The uvodbc.config file must contain a matching entry named <SAMPLE>.
The entry specifies the type of DBMS and the type of network connection used. An
example is provided on the next page.
Student Notebook
Sample uvodbc.config file
ODBC data source

name
Figure 8-27. Sample uvodbc.config file KM5021.0
Notes:
The graphic shows an example of a uvodbc.config file. It contains entries for two ODBC
data sources. One is for a Universe database used by DataStage. The other is for the
<SAMPLE> ODBC data source that was defined in the example .odbc.ini file shown
earlier.
V7.0
Student Notebook
Uempty
Testing ODBC connections

Execute the dssh command in the $DSHOME/bin directory
The environment needs to be set up
On Unix, execute the dsenv file
On Windows, you will be prompted to prepare the environment when you
execute the dssh command
Execute LOGTO project name at the dssh prompt
Log on to the project you want to test
Execute: DS_CONNECT
Retrieve a list of data source names recognized in the project
Execute: DS_CONNECT <data source name>
Test the data source connection
Figure 8-28. Testing ODBC connections KM5021.0
Notes:
There are a number of ways to test the ODBC connections after you have specified them.
On the server, you can use the dssh command. This command allows you to log into a
DataStage project and then connect to a data source. If you can connect, then you
probably configured things correctly.
Before you run the dssh command you must initialize the DataStage environment by
executing the dsenv file. After you execute the dssh command, the dssh prompt is
displayed. At the prompt you can enter the LOGTO and DS_CONNECT commands.
Student Notebook
Running the dssh command

Move to
$DSHOME
Set up
Run dssh DataStage
environment
Retrieve list of
data sources from
uvodb.config
See if you can
connect to data
source
Figure 8-29. Running the dssh command KM5021.0
Notes:
This graphic shows an example of running the dssh command. Before you can use it you
have to set up the DataStage environment by running the dsenv file. In the example, we
first changed to the $DSHOME directory and than executed the dsenv file. Then we
executed the dssh command. The dssh prompt (>) is displayed. At the prompt, we logged
into the DataStage project named DSProject. Then we ran the DS_CONNECT command
to connect to the SAMPLE database.
The SAMPLE database prompt is then displayed. This establishes that we have properly
configured the ODBC connection to SAMPLE.
V7.0
Student Notebook
Uempty
For non-wired ODBC drivers

Ensure that the database client software is installed on the DataStage
Server machine
Make sure that the version of the client software is correct and supported by the ODBC
drivers loaded with Information Server
Test your connection to the database server outside of Information
Server
If the client software cannot connect to the database server, then the non-wired driver
that uses it will not be able to connect
Figure 8-30. For non-wired ODBC drivers KM5021.0
Notes:
Non-wired drivers require the database client software to be installed. Test your client
software connection to the database server outside of Information Server. If the client
software cannot connect to the database server, then the non-wired driver that uses it will
not be able to connect.
Student Notebook
Database Connectivity
Figure 8-31. Database Connectivity KM5021.0
Notes:
V7.0
Student Notebook
Uempty
DB2 DataStage configuration

Grant access to DB2 system tables
Modify DataStage environment variables
dsenv in DataStage Engine or
Project variables
LD_LIBRARY_PATH
Add DB2 library path
APT_DB2INSTANCE_HOME
Path to DB2 home directory
APT_DBNAME
Optionally specify a default database name
Figure 8-32. DB2 DataStage configuration KM5021.0
Notes:
This slide lists the main tasks for specifying DB2 environment connectivity. The user ID
used to connect must have access to the DB2 system tables.
The primary environment variables are listed and described. Use $LD_LIBRARY_PATH to
specify a path to the DB2 library. Use $APT_DB2INSTANCE_HOME to specify the path to
the DB2 home directory. Use $APT_DBHOME to optionally specify a default database.
Student Notebook
DB2 configuration example
DB2 library
DB2 instance
home
Default DB2
database
Figure 8-33. DB2 configuration example KM5021.0
Notes:
This graphic shows a DB2 configuration example. It shows example settings for the DB2
environment variables described on the previous page. Here, the variables are being
configured in DataStage Administrator for a specific project. These settings can also be
made in the dsenv file.
V7.0
Student Notebook
Uempty
Oracle configuration
Grant access to Oracle parallel server
Modify environment variable APT_ORACLE_NO_OPS
Create and set user-defined variable ORACLE_HOME
Create and set user-defined variable ORACLE_SID
Add ORACLE_HOME TO PATH
Add the path to the Oracle library to LD_LIBRARY_PATH
Set privileges on certain system tables
See Information Server Planning, Installation, and Configuration
guide for details.
Figure 8-34. Oracle configuration KM5021.0
Notes:
This graphic lists the main considerations in configuring the Oracle environment variables.
The primary environment variables are listed and described. Consult the Information
documentation for details.
User-defined variables can be created in DataStage Administrator or in the dsenv file.
They are variables that do not natively exist in DataStage, but can be added for special
purposes. In DataStage Administrator, they are created in the User Defined folder in the
Environment Variables window.
Student Notebook
Teradata configuration
Teradata tools and utilities installed on nodes that run parallel
jobs
Set environment variables in /etc/services
Add same environment variables to dsenv
Create a Teradata user
See Information Server Planning, Installation, and
Configuration Guide for details
Figure 8-35. Teradata configuration KM5021.0
Notes:
This graphic lists some of the main considerations in configuring the Teradata environment
variables, to give you an idea of what is involved. Consult the Information documentation
for details.
V7.0
Student Notebook
Uempty
Checkpoint
1. What two DataStage files do you need to edit to configure
ODBC data source connections?
2. What is the difference between wired ODBC drivers and non-
wired ODBC drivers?
3. What environment variable is used to specify the database
library path?
4. What Information Server client is used to set this
environment variable?
Notes:
Student Notebook
Exercises Unit 06
Enable a DataStage project to access
DB2
Globally enable access to DB2
Setup ODBC data source connections
Test ODBC connectivity using the dssh
command on the Server
Test ODBC connectivity using
DataStage Designer client import utility
Notes:
V7.0
Student Notebook
Uempty
Unit summary
connections
Configure the Engine to connect to databases using ODBC
drivers
Notes:
Student Notebook
V7.0
Student Notebook
Uempty Unit 9. Engine Tier Monitoring

This unit describes Engine tier monitoring. Monitoring can be
performed in DataStage Designer or Director using the job log. It can
also be monitored using the DataStage and QualityStage Operations
Console.

Monitor the DataStage job log
Use the DataStage and QualityStage Operations Console
Manage workload
Use the Performance Analyzer tool

Copyright IBM Corp. 2007, 2012 Unit 9. Engine Tier Monitoring 9-1
Student Notebook
Unit objectives
Manage workload
Notes:
V7.0
Student Notebook
Uempty
Monitoring DataStage jobs

When DataStage jobs and job sequences run, messages are
written to the job log
Log contains error messages, warning messages, and information
messages
Log messages can be monitored from the GUI, using the
DataStage Director client
Messages from a job open in DataStage Designer can also be viewed
in Designer
Information about DataStage jobs, including log messages,
can be retrieved from the command line using the dsjob
command
Information about DataStage jobs, including log messages,
can be retrieved using the Operations Console
Figure 9-2. Monitoring DataStage jobs KM5021.0
Notes:
When DataStage jobs and job sequences run they generate messages that are written to a
job log and stored in the Information Server Repository. These messages include many
different types of information, including error messages, warnings, row processing
statistics, and general information.
There are several ways in which you can view the generated log messages, some in real
time. DataStage Director and DataStage Designer both contain tools for viewing messages
in real time.
Using the Operations Console, you can not only monitor the messages generated by the
job in real time, but you can also monitor its resource usage as it is running.
Log messages can also be retrieved from the command line using the dsjob command and
its various options.
Student Notebook
Monitoring job sequences

A job sequence is a master controlling job that controls the execution of
a set of subordinate jobs
Passes values to the subordinate job parameters
Controls the order of execution (links)
Specifies conditions under which the subordinate jobs get executed (triggers)
Specifies complex flow of control
Performs system activities
Email
Execute system commands, scripts, and applications
To fully monitor a job sequence, it is necessary to monitor both the
sequence and the jobs it controls
Figure 9-3. Monitoring job sequences KM5021.0
Notes:
DataStage runs both individual jobs and organized batches of jobs called job sequences.
Since a job sequence is also a job, it generates log messages just like other DataStage
jobs. But monitoring the messages from a job sequence is more complex, because in order
to fully understand what is going on, it is necessary to view the messages of the jobs
running in the sequence, as well as the messages from the sequence itself.
V7.0
Student Notebook
Uempty
Job sequence example

Wait for file Execute a
command
Run job 1 Send email

Run job 2
Run job 3
Handle
exceptions
Figure 9-4. Job sequence example KM5021.0
Notes:
This graphic displays an example of a job sequence. It contains many of the different types
of available stages, which are highlighted.
In this example, the sequence is running three different DataStage jobs: Job_1, Job_2,
and Job_3. A job sequence can also run other types of activities. In this example, there is a
stage that is executing a system command or running a script file (top right). There is also a
stage that is sending an email.
Monitoring this job sequence would therefore involve monitoring the messages from
Job_1, Job_2, and Job_3.
Student Notebook
Monitoring job messages in Director

Status View shows the status of selected jobs
seqJobs is the job sequence
seqJob1, 2, and 3 are jobs controlled by the sequence
seqJobs is running
Has not
started Finished running
Status View
Running
Running
Figure 9-5. Monitoring job messages in Director KM5021.0
Notes:
There are three views that can be selected in Director. This graphic shows the Status view,
in which the status of running jobs and job sequences is displayed. The status can be
Compiled, Finished, Running, and so on.
In this example, notice that the job sequence named seqJobs is running. This job
sequence, runs three jobs named seqJob1, seqJob2, and seqJob3. In this example,
notice that seqJob2 is currently running. seqJob1 has already run, seqJob3 is waiting to
run.
V7.0
Student Notebook
Uempty
Sequence job log

The Log view shows the log messages for the job or job
sequence selected in the Status view
Messages for a job sequence
Show when each job starts and stops
Gives a summary report Log view
Waiting for
seqJob2 to
start
Summary
report
Figure 9-6. Sequence job log KM5021.0
Notes:
Click the Log View icon for a selected job or job sequence to display the job messages it
generates as it runs. In this example, we are looking at the messages generated by the job
sequence, rather than the individual jobs it is running.
Notice that many of the messages indicate when a particular job the sequence is running
starts, when it finishes, and its status when it finishes.
There is a summary message at the end that lists the activities that ran and their statuses.
Student Notebook
Operations Console
Notes:
V7.0
Student Notebook
Uempty
Operations Console
Monitor DataStage jobs that are running or have run
Information about the job, job activity, and resource usage
View jobs running on any engine system in the domain
Information is stored in the operations database
Operations Console client
Thin client, accessible from Internet Explorer and Firefox
URL: http://domain:port/ibm/iis/ds/console/login.html
Login with a DataStage user ID
Supported DataStage project roles include: DataStage Operator,
Super Operator, Developer, Administrator
Only information about projects the user ID has access to will be
displayed
DataStage Administrators can view information about all projects on
all engine systems
Notes:
With the Operations Console, you can monitor DataStage jobs and job sequences in real
time. In addition to viewing job messages, you can also get job status information, and
information about the system resources available while the job is running, including CPU
usage and free memory.
In the Operations Console, you do not just see jobs running in a single project, like you do
with the DataStage clients. You can get information about jobs running on any engine
system in any project.
You access the Operations Console through a web browser. This web browser can be
running on the servers as well as the clients.
Student Notebook
Configuring the Operations Console

By default, the Operations Console database is part of the
Information Server XMETA database
Console database objects use a different schema (default DSODB)
User ID and password are specified during installation
DSODBConnect.cfg file defines the connection
The Operations Console monitoring is configured in the
DSODBConfig.cfg file
Located in /InformationServer/Server/DSODB folder
Set DSODBON=1 to enable monitoring data collection
Enable collection
Figure 9-9. Configuring the Operations Console KM5021.0
Notes:
The operational metadata displayed in the Operations Console is stored in tables in a
database. By default, it is part of the XMETA database, but it uses a different schema.
Operations Console monitoring is configured using the DSODBConfig.cfg file located in
the InformationServer/Server/DSODB folder. There are a number of configuration
options, including whether operational data collection takes place at all. These options are
documented in the configuration file.
V7.0
Student Notebook
Uempty
Starting the Operations Console services

In DSODBConfig.cfg, set DSODBON=1
Run /DSODB/bin/DSAppWatcher.sh start
Use stop to stop the services
Use status to check whether the services are running
DSAppWatcher.sh can be set up to run automatically when
the DataStage engine is started
Command is added to engine startup script
(/DSEngine/sample/ds.rc)
Start the
services
Figure 9-10. Starting the Operations Console services KM5021.0
Notes:
The Operations Console uses several services for collection, monitoring, and display. By
default, these services do not run automatically. To start or stop the services, you run the
DSAppWatcher.sh script. This script can be set up to run automatically when the
DataStage engine is started.
Student Notebook
Operations Console GUI Dashboard tab

Job Activity section: Monitor job activity
Jobs running within the current time range
Summaries of jobs recently completed.
Click on the Last: N minutes links to specify time ranges
Operating System Resources section: Displays CPU usage
and the amount of available memory
Engine Status section: Specifies the status of the engine
services, including the Operations Console services
To the right of each section heading is a refresh icon
Use it to refresh the display for the most current information
Figure 9-11. Operations Console GUI - Dashboard tab KM5021.0
Notes:
The Operations Console opens to the Dashboard tab, which contains three sections of
information. The Job Activity section shows which jobs are currently running and their
statuses within a time range, for example, last 10 minutes.
The Operating System Resources section displays the CPU usage and free memory that
is currently available within a time range.
The Engine Status section displays the current status of engine services, including the
Operational Console services and WLM (Workload Management).
V7.0
Student Notebook
Uempty
Dashboard GUI
Dashboard Job activity Engine status
CPU usage Free memory
Figure 9-12. Dashboard GUI KM5021.0
Notes:
This graphic shows the Dashboard tab. The sections described on the previous page are
highlighted.
Notice the Refresh icon located in the top right corner of each section. The information
displayed is updated at a certain interval, which is configurable in the DSODBConfig.cfg
file. Click the Refresh button to manually refresh the display.
Student Notebook
Operations Console GUI Projects tab

Navigation section: Lists projects for the currently selected
engine
If your domain contains multiple engines, you can select which one to
display
You can select which projects to display for the engine
Select a project to display information about the project
Contents statistics
Environment variable settings
Select a specific job to display information about the job
Job runs
Select a job run to view details about it, including its job log messages
Select multiple job runs to compare details about them, including
resource usage and performance
Figure 9-13. Operations Console GUI - Projects tab KM5021.0
Notes:
There are several other tabs in addition to the Dashboard tab. You use the Projects tab to
display information about DataStage projects for a selected engine in the domain. You can
view the contents of the Repository window for each project, which displays the objects
the project contains. You can also get some statistical information about these objects, for
example, number of jobs in the project.
The environment variables and their current settings are also displayed.
You can get additional information about an object, for example a DataStage job, by
selecting the object. The information is then displayed in the right panel.
V7.0
Student Notebook
Uempty
Projects GUI
Run the job Projects filter
Selected job sequence Previous job runs

in the project
Figure 9-14. Projects GUI KM5021.0
Notes:
You can also run DataStage jobs from the Operations Console. In this example, the
seqJobs job sequence has been selected. In the bottom panel, the previous job runs are
listed. The top panel provides information about the selected job sequence, including
information about its last job run.
Click the View Job Design button at the top to view the job diagram from the Operations
Console. Click the Run button at the top to run the job from the Operations Console. You
will be prompted to specify the jobs parameters.
Student Notebook
Example Run and monitor a job sequence

On the Projects tab select a job sequence, seqJobs
Click the Run button, to display the Run Job window
Edit the job parameter values
Click Run
Run
Parameters
Figure 9-15. Example - Run and monitor a job sequence KM5021.0
Notes:
In this example, we will run the seqJobs job sequence and monitor it as it is running from
the Operations Console. After editing the job parameters as desired, click the Run button
to start the job. Next move to the Dashboard tab to view its activity and it resources. This is
shown on the next page.
V7.0
Student Notebook
Uempty
View the job activity on the Dashboard

View the job activity spike
View the finished job runs Job activity spike
View the resource usage
List of jobs
CPU spike
Figure 9-16. View the job activity on the Dashboard KM5021.0
Notes:
Notice that the activity spiked as the job sequence and the jobs it contains ran. The bar
graph at the bottom of the Job Activity panel indicates that all jobs within the current time
period have finished without errors or warnings. You can click on the Finished link for
details about the jobs that finished.
Notice that the CPU activity also spiked at the times the jobs were running. According to
the graph CPU usage went up to about 12%.
Although its not visible in this graphic, you can also view the amount of free memory that
was available at the time the jobs ran. The graph depicts both free physical memory as well
as free virtual memory.
Student Notebook
Job run details

View details of
job run
Log messages
Figure 9-17. Job run details KM5021.0
Notes:
The top graphic lists the jobs that finished during the current time period. This graphic was
displayed by clicking the Finished link. Click the View Details link next to a job, for
example, seqJobs, to view details about the job run. The Run Details window for
seqJobs is shown in the bottom graphic. The window has several tabs. Shown here is the
Log Messages tab, which displays the job log messages that were generated when the job
ran. The Full Messages box has been checked to display the full set of messages.
The Performance tab displays information similar to what you see on the Dashboard tab,
including CPU and free memory usage.
V7.0
Student Notebook
Uempty
Workload management
Enabled in the DSODBCConfig.cfg file
Set WLMON=1
The maximum number of running jobs can be prioritized
When the maximum number of running jobs is reached, jobs wait in queues until
slots are available
Queues are prioritized:
High priority queues: Jobs in this queue have the highest priority of getting the
next available slot
Medium priority queues
Low priority queues: Jobs in this queue have the lowest priority of getting the next
available slot
Special queues exist for Information Analyzer (IA) and Information Services
Director (ISD)
The priority of jobs running in these queues can be specified: Low, Medium, High
When jobs are run, a priority queue can be selected
The default queue is specified in DataStage Administrator
Figure 9-18. Workload management KM5021.0
Notes:
Workload management (WLM) is also managed through the Operations Console.
Workload management is enabled in the DSODBCConfig.cfg file. To enable it, set
WLMON=1.
When WLM is turned on, the maximum number of running jobs can be set and prioritized. If
too many jobs are running at one time, then the resources (CPU, memory) are exhausted,
and none of the jobs run efficiently. By setting the maximum number of jobs low enough,
this situation is prevented.
The maximum number of jobs running can also be constrained by CPU usages and
memory usage. For examples, CPU usage can be constrained so that jobs will only run
when CPU usage is below 80%.
Jobs that cannot run because the maximum number has been reached wait in queues until
run slots become available. These queues can be prioritized. Jobs that are waiting in the
high priority queue have the greatest likelihood of getting the next available run slot.
When a job is run, the queue that it will wait in if necessary is selected.
Student Notebook
Workload Management tab

Two jobs are waiting in a medium priority queue
In addition to maximum job count, system limits can be placed:
CPU usage
Memory usage
Job start speed
Maximum
number of
running jobs
Queued jobs
Figure 9-19. Workload Management tab KM5021.0
Notes:
This graphic shows the Workload Management tab. In this example the maximum number
of running jobs has been set (artificially low) to 1. This means that only one job can run at a
time. Two jobs are waiting to run in a medium priority queue.
Notice in the graphic the list of available queues. Notice that some of these queues are
special purpose queues. There is a queue for Information Analyzer (IA) jobs, one for
Information Services Director (ISD) job, and one for Data Click jobs, as well as the three
general queues with different priorities.
V7.0
Student Notebook
Uempty
Queue Management tab

Specify queue priorities: Low, Medium, High
Specify queue priority rule
Priority Weight (default): Priority based on queue priority and time in the queue
Elapsed Time: Maximum time in queue before running
Job Run Ratio: Specified ratios between priority queues
Example: High to Medium = 3 to 1, meaning 3 high priority jobs run for each medium
priority job
Figure 9-20. Queue Management tab KM5021.0
Notes:
You can use the Queue Management tab to specify the queue priorities. Different priority
rules can be used. In this example the queues are weighted according to the Priority
Weight rule. This rule bases priority on queue priority and time in the queue. This means
that if two jobs have been waiting in a queue for the same amount of time, and one of the
jobs is in a Low priority queue and one is a Medium priority queue, then the job in the
Medium priority queue will get the next available job slot.
Student Notebook
Performance Analysis
Figure 9-21. Performance Analysis KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Performance analysis in the past

Use the Director monitor to watch the throughput (rows/sec)
during a job run
Compare job run durations
Turn on APT_PM_PLAYER_TIMING and
APT_PM_PLAYER_MEMORY to report player calls and
memory allocation
How this fails you

Long running jobs could not be watched for record
throughput changes throughout the job run
The job monitor didnt allow recording for playback
Job monitor throughput rates included time waiting
for data
Could not determine what was happening on the
machines Copyright IBM Corporation 2007, 2012
Figure 9-22. Performance analysis in the past KM5021.0
Notes:
The DataStage Director client contains a performance monitoring tool. To run it, select a
job, for example seqJob2, and then click Tools>New Monitor. As the job runs, the monitor
will display row throughput (rows/sec) for each stage in each partition.
There are several difficulties in using the Director Monitor to monitor the performance of
jobs: One major difficulty is when monitoring long-running jobs. The row throughput may
vary significantly over the course of the job run. It may be high in the beginning, but slow
down dramatically at a later time. It would be nice to have a record of these changes
throughout the job run that could be reviewed.
Another limitation of the Director Monitor is that it does not measure the system resources
while the job is running.
Student Notebook
Performance Analyzer
Visualization tool that provides insight into job runtime behavior
Offers several categories of visualizations:
Record throughput (rows/sec)
CPU utilization
Job timing
Job memory utilization
Physical machine utilization
Performance data to be visualized can be:
Filtered in selected ways, including
Hide startup processes
Hide license operators
Hide inserted operators
Isolated to selected stages (operators), partitions, and phases
Charts can be saved and printed
Figure 9-23. Performance Analyzer KM5021.0
Notes:
Performance Analyzer is a visualization tool that provides insight into job runtime behavior.
In addition to record throughput, it measures CPU utilization, job timing, memory utilization,
and physical machine utilization. Several different types of graphs are available for viewing
these statistics.
V7.0
Student Notebook
Uempty
Enabling performance data recording

Open the job in Designer
Select Record job
performance data in Job
Properties
Run your job.
Performance collection
has little impact on
overall job performance
To view the results, click
the Performance
Analysis icon in
Designer
Figure 9-24. Enabling performance data recording KM5021.0
Notes:
To measure the performance of a job, open the job in Designer. On the Execution tab of
the Job Properties window, select Record job performance data in Job Properties. This
tells DataStage to collect performance data when the job runs. (This option can also be
selected on the General tab of the Job Run Options window.)
When the job runs, the performance data is collected. This collection has little impact on
the overall performance of the job.
After the job runs click the Performance Analysis icon. This opens the Performance
Analysis window for the job. The job can be run multiple times for comparison. The data
from each run is separately collected and stored.
Student Notebook
Example job
Figure 9-25. Example job KM5021.0
Notes:
This shows an example job. It has three input Row Generator stages going to a Funnel
stage, then a Sort stage, then a Remove Duplicates stage, then to a Switch stage to write
the data out to two Data Set stages.
V7.0
Student Notebook
Uempty
Job timeline chart

Job timeline
chart
Stages in Time stage

job operated
Figure 9-26. Job timeline chart KM5021.0
Notes:
This graphic shows the Job Timeline chart.
The Job Timeline chart breaks down the chart in terms of how long job processes take.
Here we see how long the each player process takes. A player process is a process
associated with an operator (stage) running on a node (partition).
In this example we are viewing the operators running in partition 0. There are tabs at the
top of the window to toggle from one partition to another.
The timeline covers the total time the job runs. Here we see that some stages ran for the
duration of the job; others ran for a portion of the time. In particular, the three Row
Generator stages ran for just a portion of the job run.
Student Notebook
Viewing by partition
Notice that the Row Generators stages are not displayed
Because they are running sequentially only in Partition 0
View by
partition
Figure 9-27. Viewing by partition KM5021.0
Notes:
In this example, the second partition has been selected. Notice that the Row Generator
stages are not displayed. This is because the Row Generator stages run sequentially, and
therefore in only one partition. By contrast, Sort stage operators run in both partitions in
parallel.
V7.0
Student Notebook
Uempty
Record throughput
Place the mouse cursor over a line at a particular point to
display the name of the stage and its throughput at that point
Run mouse
Rows per over line to
second identify the
stage
represented
Figure 9-28. Record throughput KM5021.0
Notes:
Select the Record Throughput chart to view the record throughput (rows/sec) of each
operator (stage) in each partition. Individual lines represent individual operators. You can
run your mouse over a line to display the name of the stage and the throughput at that point
in time.
Notice that we can view how the throughput of a stage changes over the job run. Some
stages have a fairly constant throughput; others change dramatically over the course of the
job run.
Student Notebook
Stage CPU usage

Percentage of CPU relative to each stage
Notice that the Sort stage uses more CPU than any of the other
stages
Sort stage
CPU usage
Figure 9-29. Stage CPU usage KM5021.0
Notes:
There are different types of charts you can use to display the data.
This shows CPU usage on a pie chart. This shows the amount of CPU usage of each stage
as a percentage of the total CPU usage. Notice that in this example the Sort stage uses
more of the CPU than the other stages.
This kind of information is invaluable when attempting to improve the performance of a job
with a different design. Clearly removing unnecessary sorts will have a major impact on
performance.
V7.0
Student Notebook
Uempty
Displaying selected stages
Select stages
in a partition
to display
Select partitions
to display
Select the
stages to
display
Figure 9-30. Displaying selected stages KM5021.0
Notes:
In the Stages folder you can select just the stages whose throughput you want to display.
Here just the Remove Duplicates stage is displayed. Stage selection can be done for any
chart. By default all stages are displayed.
You can also use the Job Tree and Partitions tab to select the results to display. The Job
Tree tab allows you to select stages in partitions to display. The Partitions tab allows you
to select partitions to display.
Similarly, the Phases folder (not shown) allows you to display what phases of a process to
display or filter out: Initialization, RunLocally(), and Post processing.
Student Notebook
Filters
By default, the activity of a number of processes and operators
are hidden
Allows you to focus on the comparable performance of the stages
Figure 9-31. Filters KM5021.0
Notes:
This graphic shows the Filters folder. By default all filters are enabled so that the activity of
a number of startup and overhead processes and operators is hidden.
The performance impact of these startup processes is less for longer running jobs and for
jobs processing large amounts of data. Comparisons of different job runs on different
amounts of data are more accurate if the impact of these processes is hidden.
V7.0
Student Notebook
Uempty
Resource Estimator
Figure 9-32. Resource Estimator KM5021.0
Notes:
Student Notebook
Resource Estimation tool

Estimate and predict resource utilization of parallel job runs
Models
Estimate the system resources for a job
Scratch space
Disk space
CPU
Data set throughput
Two types of models:
Static
Based on a generated data sample from the column definitions in the job design at compile
time
Limited to estimates of scratch and disk space
Dynamic
Based on a sampling of the actual input data at run time
Input projection
Estimates the size of all data sources in a job
Figure 9-33. Resource Estimation tool KM5021.0
Notes:
Use the Resource Estimation tool to estimate and predict resource utilization of parallel job
runs. The tool creates models to estimate the system resources for a job. There are two
types of models: Static and Dynamic. The former is based on a generated data sample
from the column definitions in the job design at compile time. The later is based on a
sampling of the actual input data at run time.
V7.0
Student Notebook
Uempty
Creating a model
Open a job in Designer
Open the Resource Estimation window
To create a model, click the Click Resource Model toolbar
button, then specify:
Name
Type of model: static or dynamic
For dynamic models, specify the data sampling method:
Automatic: Based on a set sample size according to stage type
Data range: Based on a specified number of records
You can also look at the actual resource usages for the input used
Called the actual model
Click Generate
Figure 9-34. Creating a model KM5021.0
Notes:
A resource estimation consists of a model of estimated resources. To create a model for a
job, first open the job in Designer. Then open the Resource Estimation window. You can
create either a static model or a dynamic model. After the model is generated, it will be
listed in the Models folder on the left panel of the window.
Student Notebook
Information the model contains

Disk space, Scratch space
Static model estimates are based on worst-case scenarios
Maximum values are used: For example, the maximum of a Varchar() field is used in
the calculation
CPU utilization
Not estimated in the static model
Number of output records
Static model estimates are based on best-case scenarios
Assumes no records are dropped anywhere
Input records reach every other stage in the job, that is, arent filtered out
Dynamic model estimates are based on how records are processed in the sample
Records can get dropped or filtered
Record size
Static model estimates are based on the column definitions
Dynamic estimates are based on the actual record size in the sample
Figure 9-35. Information the model contains KM5021.0
Notes:
The model contains several pieces of resource information. The model estimates both disk
space and scratch space. The static model estimates are based on worst-case scenarios.
For example, suppose the job writes rows of data out to a file. The size of the row that is
physically written may vary depending on the actual data written out in variable length
fields. The static model bases its estimates on the maximum possible size of the data. The
dynamic model, on the other hand, would base its estimates on a sample of the data it
runs.
CPU utilization cannot be determined unless the job is run on a sample of data. So CPU
utilization is not estimated in the static model.
The static model bases its estimates of the number of output records on the best-case
scenario given the size of the input (number of input records). For example, suppose there
are 1000 input records. In an actual job run, some of these records may not make it to the
output file. A constraint in a Transformer might filter some of these rows out. The static
model assumes that every input row makes it through the job. A dynamic model would
base its results on what actually happens during a job run.
V7.0
Student Notebook
Uempty
Projections
Estimate based on a specified size of the input data sources
within the context of a given model
Projections are applied to all existing models (except the
actual model
Creating a projection:
Click the Projection button in the Resource Estimation toolbar
Name
Specify the input size
Number of records
Megabytes
Use previous projection numbers
Figure 9-36. Projections KM5021.0
Notes:
The question often arises as to how much disk space will be needed to run this job? How
much will be needed if our current number of input records is multiplied tenfold? Projections
can be used to help answer these questions.
A projection estimates resource usage based on a specified size of the input data sources
within the context of a given model. The variable you can change is the amount of input.
You can specify an input size based on number of records or megabytes of input data.
Student Notebook
Resource Estimation window
Models folder
Automatically
generated
static model
Sampling type. Auto based on a

standard set for a type of stage
Figure 9-37. Resource Estimation window KM5021.0
Notes:
This graphic shows the Resource Estimation window. In the Models folder is the static
model that was automatically generated for the job when the Resource Estimation
window was opened.
The Model Overview window lists the input data size the model is based on. The sampling
type is listed for the three input Row Generator stages. The sampling type is listed as Auto.
Each type of stage has a standard sampling method that is used. This type indicates that
the standard type for the stage was used.
V7.0
Student Notebook
Uempty
Input Projections folder
Projected number of
input records
Figure 9-38. Input Projections folder KM5021.0
Notes:
The Input Projections folder contains the generated projections. Here the projection
projects the number of input records that will be processed by each input stage given its
type and property settings.
Student Notebook
Job Tree folder
Total usage
Job stages or Stage usage estimates

components
Figure 9-39. Job Tree folder KM5021.0
Notes:
This graphic displays the Job Tree folder. The Job Tree folder lists all the components in
the job and their estimated resource usage.
In this example, the model projects that the Sort stage will consume roughly 175,000 MB of
scratch disk space. The model also projects that the target Data Set stages will each
consume a little over 100,000 MB of disk.
Notice also the reference to DataSet1 and DataSet2 in the stage list. These do not refer to
the target Data Set stages that the job is writing to. These are in-memory data sets that are
used internally by the job. Since they are in-memory, they do not consume any disk
resources.
V7.0
Student Notebook
Uempty
Stages folder
Resource estimates
by partition
Select stage
Throughput sizes based on
data size or number of
records
Figure 9-40. Stages folder KM5021.0
Notes:
On the Stages folder you can select particular stages on which to view the estimates. In
this example, the Sort stage has been selected. The top right panel lists its resource usage
(scratch disk usage) by partition. The lower right panel lists input and output throughput by
partition. In other words, this lists the amount of data the stage processes during input and
during output.
Student Notebook
Charts folder
Disk requirements Data Set stage requirements
Figure 9-41. Charts folder KM5021.0
Notes:
In the Charts folder, you can select a particular chart that you want to view. Here the Disk
Requirements chart has been selected as an example.
V7.0
Student Notebook
Uempty
Creating a model
Here we are creating a Dynamic model based on samples of
actual data
Auto lets the tool decide the sample
Uncheck to specify your own sample
Generate
Clear to specify Model name

sampling range
Model type
Figure 9-42. Creating a model KM5021.0
Notes:
Click the Create Resource Model icon in the toolbar to create a new model, either static or
dynamic. In the Model Name folder, specify a name for the new model. Then select its type
(static, dynamic) in the Model Type box.
In this example, the Dynamic model type has been selected. By default, the sampling
method is Auto. Remove the check to manually specify a sampling range. In this example,
the sample input for the third Row Generator stage consists of the first 500 records.
Student Notebook
Creating a projection
A projection allows you to estimate resource usage of stages
running in a partition based on specified input numbers
Projection name
Input units:
Amount of MB or Num
input records
Figure 9-43. Creating a projection KM5021.0
Notes:
A projection allows you to estimate resource usage based on a projected amount of input
data. To create a projection specify the name of the projection and the input unit type. You
can specify the input units as megabytes or number of records.
V7.0
Student Notebook
Uempty
Checkpoint
1. What is the difference between a job sequence and an
ordinary DataStage job?
2. What command is used to start the Operations Console
services?
3. If Workload Management is turned on, what determines the
job's priority in taking the next available slot to run?
4. You can view the throughput (rows/sec) of a job on the
Designer canvas as it runs or in Director. What is the
advantage of monitoring the throughput of a job using the
Performance Analyzer tool?
Notes:
Student Notebook
Exercises Unit 09
Monitor jobs in DataStage Director
Start the Operations Console
services
Monitor jobs using the DataStage
Operations Console
Explore Workload Manager
Use Performance Analyzer to
analyze the performance of a job
Estimate the resources of a job
Notes:
V7.0
Student Notebook
Uempty
Unit summary
Manage workload
Notes:
Student Notebook
V7.0
Student Notebook
Uempty Unit 10. Metadata Asset Management

This unit describes Information Server tools for managing metadata
assets, including istool, Information Server Manager, and Metadata
Asset Manager.

Manager
Import metadata assets using Metadata Asset Manager
Browse metadata assets using Metadata Asset Manager
Manage duplicate metadata assets using Metadata Asset Manager

Copyright IBM Corp. 2007, 2012 Unit 10. Metadata Asset Management 10-1
Student Notebook
Unit objectives
Manager
Manage duplicate metadata assets using Metadata Asset
Manager
Notes:
V7.0
Student Notebook
Uempty
Asset Interchange
Figure 10-2. Asset Interchange KM5021.0
Notes:
Student Notebook
What is asset interchange?

Export of metadata from an Information Server repository followed by
the import of this exported metadata into the same or another repository
You specify a set of related assets in the source repository
Then export them to the file system and create an archive
This archive is used to import the group of related assets into the target repository
Relationships to other assets in the source can be optionally carried over to the
target repository
istool can be used for asset interchange on both Client and Engine tiers
On Client, invoke IS Command Line Interface client
On Server, istool is located in /InformationServer/Clients/istools/cli directory
istool commands are available to export metadata assets produced by
all IS products
Istool commands are also available for administrative and management
metadata
Security user / group roles
Reports
Figure 10-3. What is asset interchange? KM5021.0
Notes:
Asset interchange consists of the export of metadata from an Information Server repository
followed by the import of this exported metadata into the same or another repository. You
specify a set of related assets in the source repository to export to an archive file. For the
import you specify a set of related assets to import from an archive file.
The istool can be used to perform the interchange.
V7.0
Student Notebook
Uempty
Uses of asset interchange

Moving projects from development to test
Moving just a subset of assets, rather than your entire project
Moving projects from test to production
Manage archives in source control applications
You can build the Asset Interchange commands into scripts to
facilitate the routine back-up or movement of large groups of
assets
Figure 10-4. Uses of asset interchange KM5021.0
Notes:
There are many uses for asset interchange. Some major uses are listed here.
The uses can be divided into two categories. One type of use involves moving metadata
assets from one repository to a different repository. These include moving assets from a
test system to a production system or from a development system to a test system.
Another type of use involves moving metadata assets from a repository to a file system and
then later back into the same repository. This might be done to backup a set of assets for
later recovery, or it might be done for archiving or versioning.
Student Notebook
Invoking the asset interchange

Command-line interface
Syntax of the istool command is:
<command> <authentication_parameters> <archive> [ archive
parameters ]
[ generic_params ][ command specific_parameters ]
istool commands: export, import, build package, deploy package
Generic parameters: -help, -verbose, -silent
Authentication parameters: -domain, -username, -password
GUI interface for DataStage
Information Server Manager
Figure 10-5. Invoking the asset interchange KM5021.0
Notes:
The istool utility is very powerful. It supports four basic commands: export, import, build
package, deploy package. The build package and deploy package functionality has
been captured into the Information Server Manager tool. This tool is discussed later in this
unit. Our focus in this topic is on the import and export functionality.
There are two common parameters in the istool command. You will always need to specify
authentication, that is, the services domain you are logging into and the user ID and
password you are using to do so. Secondly, you will always be specifying a path to the
archive file. The archive file is where the exported assets are or will be stored on the file
system, during an import or export.
V7.0
Student Notebook
Uempty
Asset interchange archive

Uses an archive format called ISX
Java archive that can be viewed with the jar utility provided with the
Java SDK, WinZip, and other archiving tools
Archive contains a manifest file and a set of files containing the
serialized assets
Figure 10-6. Asset interchange archive KM5021.0
Notes:
The istool command uses an archive format called ISX. The archive contains a manifest
file and a set of files containing the serialized assets.
The archive file is a compressed, non-proprietary file. Its contents can be viewed by
standard tools such as WinZip and the Java SDK.
An archive consists of a manifest file, which describes the contents, and a set of files that
contain the assets.
Student Notebook
DataStage export / import

istool export <connection parms> -datastage '<ds_options>
<ds_operands>'
istool import <connection parms> -datastage
'<target_project_id>
Comparison with DataStage DSX export/import, invoked in
Designer
ISX maintains shared table relationships
Shared table relationships are lost in DSX exports
ISX can export and import BuildOp executables

BuildOps are custom DataStage stages
DSX only supports the interchange of BuildOp design components
Not executable components
Figure 10-7. DataStage export / import KM5021.0
Notes:
In this unit we will examine the istool import and export commands for DataStage. The
commands will be similar for other IS products. However, different product commands
support different parameters and options.
The -datastage keyword is used when importing and exporting DataStage assets. It is
followed by options and parameters specific to DataStage surrounded by single quotes.
DataStage Designer supports a type of export/import using a propietary dsx format. In
many cases, this type of import is sufficient, but it only available for DataStage, and istool
has some additional options. One limitation is that shared table relationships are lost in dsx
imports. Table definitions, that describe the format of files and tables, in DataStage can be
stored locally to DataStage or they can be made shared, to be available to other
Information Server products. Shared table relationships are not preserve across dsx
imports and exports.
V7.0
Student Notebook
Uempty
Specifying DataStage assets in istool

An asset path identifies the assets to be exported
Format: host:portnumber/project/folder//folder/asset_type
Asset types include:
Parallel job: pjb
Server job: sjb
Table definition: tbd
Parameter Set: pst
Wildcard characters
Use asterisk (*) in element names: 0 or more characters
Examples:
/server/project/folder/*.pjb: All parallel jobs in folder
/server/project/folder/*seq.pjb: All parallel jobs in folder ending with seq
/server/project/folder/*.*: All objects in folder
Figure 10-8. Specifying DataStage assets in istool KM5021.0
Notes:
In the istool export or import commands, you specify an "asset path" to identify the
assets to be exported.
Different keywords are used identify different types of assets. For example, the pjb
keyword identifies DataStage parallel jobs. The path can also include the asterisk (*) as a
wildcard character. So, for example, *.pjb would refer to all parallel jobs within the path
folder. The path identifies the DataStage server, the project hosted by the server, and a
folder within the project.
Student Notebook
Security export / import command

istool export <connection_parameters> [generic parameters]
<archive > -security [security specific parameters]
Use to export IS users and groups
Users and groups must be exported using separate commands
Users and groups are exported by name
Can include related metadata such as credential mappings
Figure 10-9. Security export / import command KM5021.0
Notes:
The istool command can also be used to import and export security assets, including
users and groups and their authorization roles. The -security keyword is used in the istool
command to specify users and groups to import or export as part of the archive. Related
metadata such as credential mappings can also be included.
V7.0
Student Notebook
Uempty
Example: Exporting parallel jobs in a project folder

Exports all parallel jobs in project DSProject found in the
_Training_ISAdmin/Jobs folder
*.pjb designates all parallel jobs All parallel
jobs
Istool export
command Export file
Figure 10-10. Example: Exporting parallel jobs in a project folder KM5021.0
Notes:
In this example, the istool command is used to export parallel jobs in a DataStage project
folder named ISAdminFiles. The folder is in a project named DSProject, hosted by the
Engine system edserver.ibm.com. *.pjb identifies all parallel jobs in that project folder.
Here, the command is used to export to a file identified by the -archive parameter. The
archive path is specified in the string following the -datastage parameter.
Student Notebook
Import example for DataStage assets

Imports assets in archive file into the specified DataStage
project
Use istool import command
-replace option is used to overwrite if the assets already exist
Istool import
command
Archive file -replace option Import

project
Figure 10-11. Import example for DataStage assets KM5021.0
Notes:
In this example, the istool command is used to import an archive file into a DataStage
project. Key parts of the command are highlighted in the graphic.
Here, the command is used to import to a file identified by the -archive parameter. The
DataStage project to import into is specified by the string following the -datastage
parameter.
V7.0
Student Notebook
Uempty
Example: Exporting security assets

Exports user student along with his or her credentials and
roles
Requires Common Metadata Importer Suite role
-userident values can include wildcards
Istool export Export file Security

command Users to export
export
Figure 10-12. Example: Exporting security assets KM5021.0
Notes:
In this example, the istool command is used to export security assets. Key parts of the
command are highlighted in the graphic.
Here, the command is used to export to a file identified by the -archive parameter. The
security assets are specified in the string following the -security parameter. In the
command, the -securityUser -userident identifies the name of the user to be exported.
The related assets include the users roles and credentials.
Student Notebook
Figure 10-13. Information Server Manager KM5021.0
Notes:
V7.0
Student Notebook
Uempty

Used to move, deploy, and manage DataStage / QualityStage
assets
Create packages of assets in one Repository (Development / Test)
that can be deployed on a different Repository (Production)
Packages can be built and deployed on an iterative basis
Perform export / import of DataStage / QualityStage assets
Select objects for export
Specify archive
Figure 10-14. Information Server Manager KM5021.0
Notes:
The istool command can be used to build and deploy assets. However, for DataStage
assets, Information Server Manager provides a GUI tool for doing this. Using Information
Server Manager, you can create packages of assets in one repository (Development / Test)
that can be deployed on a different repository (Production).
You can also use Information Server Manager to import and export DataStage assets using
the isx format.
Student Notebook
Deploying packages
Selecting the assets
Select the domain
To add a domain, right-click in the Repository window
Log into the domain with IS Administrator ID
Right-click over Packages and then click New>Package to open a
new package
Building the package
Select the assets for the package
Drag them to the Package window
Click Build in the Package window
Deploying the package
Click Deploy in the Package window
Figure 10-15. Deploying packages KM5021.0
Notes:
There are two steps involved in deploying a package of DataStage assets: Build the
package, and then deploy the package.
To build the package, you select the assets from the Repository window. Within
DataStage Designer, you only see the assets in a single project. In Information Server
Manager, you can view assets from any projects within the domain.
When you create a build, the set of selected assets are saved and available for
deployment. You can create any number of builds as more assets become available.
Any build can be deployed in any project in any Engine server in the domain. You can also
back out of a deployment by deleting the objects in the project, and then deploying an
earlier build in its place.
V7.0
Student Notebook
Uempty
Information Server Manager packages

Build
package
Drag
assets to
Package
package
panel
Figure 10-16. Information Server Manager packages KM5021.0
Notes:
To add a DataStage domain, right-click in the Repository window. Then log into the domain
with an IS Administrator user ID.
To specify the package, drag the DataStage assets from the Repository window to the
Package window. Notice that the package can include any and all types of DataStage
objects, including jobs, sequences, table definitions, parameter sets, and so on.
After you define the package, click the Build button to add the package to the list of builds.
Student Notebook
Deploying the package
Deploy
Select
Engine
project
Select
Build
Figure 10-17. Deploying the package KM5021.0
Notes:
To deploy a build, select the build in the list. Click the Deploy button, and then select the
Engine project in which to deploy the package. In this example, the package named
ISAdmin_Build2 is being deployed to a DataStage project named DSProject on the
EDSERVER.IBM.COM engine.
V7.0
Student Notebook
Uempty
Incremental builds
When a package changes you can create new builds
Any build can be deployed
Can rollback to previous builds DataStage
project
Latest
Build
Earlier
Build
Figure 10-18. Incremental builds KM5021.0
Notes:
You may at any time modify an existing package, by adding and removing assets, and
saving it as a new build. You can then deploy the new build or, if needed, rollback to a
previous build.
Suppose, for example, that Build1 is working well in production. Some enhancements are
made to some of the jobs and a new build, Build2, is created. When Build2 goes into
production, some problems occur. While those problems are being fixed, you can roll back
production to Build1.
Student Notebook
Exporting and importing engine assets

Select assets
Right-click and then click Export
View Export Archive and then click Export button Archive
contents
Export
Select
objects
Figure 10-19. Exporting and importing engine assets KM5021.0
Notes:
You can also use Information Server Manger to import and export DataStage assets.
Information Server Manager provides a GUI interface to the import export functionality of
istool.
The export process is similar to creating a build. You select the assets for the package from
the Repository window. Then click Export to browse for a file location for the archive file.
V7.0
Student Notebook
Uempty
Metadata Asset Management
Figure 10-20. Metadata Asset Management KM5021.0
Notes:
Student Notebook
Metadata asset management

Information Server metadata assets are stored in the XMETA
Repository (also called the Metadata Repository or Shared
Metadata Repository)
Metadata assets include assets produced and consumed by
Information Server products and components
Produced assets include: DataStage jobs, FastTrack mapping
specifications, Business Glossary terms, Information Server reports
Consumed assets include: table definitions, file descriptions, logical
model entities and attributes, BI tool metadata
Repository metadata stores different types of metadata
Business metadata: business terms, business rule descriptions,
mapping specifications, stewards
Technical metadata: DataStage/QualityStage jobs and their
components
Operational metadata
Figure 10-21. Metadata asset management KM5021.0
Notes:
The Information Server Repository (XMETA) stores several different types of metadata,
including business metadata, technical metadata, and operational metadata. Some of the
metadata is metadata produced by Information Server products, for example, DataStage
jobs, which are produced by DataStage. Other metadata is consumed by is by Information
Server products, such as file descriptions of files read by DataStage jobs.
V7.0
Student Notebook
Uempty
Common Model and its extensions

Defines the metadata assets that can be stored in the IS
Repository
Common Model is described in Metadata Workbench on the
Advanced>Model View tab
Categories include:
Common Model: Core model
Business Intelligence: Extension
Mapping Project: Extension
Mapping Specification: Extension
Transformation: Extension
Operational Metadata: Extension
Common Model describes both metadata produced by IS
applications and metadata consumed by IS applications
Figure 10-22. Common Model and its extensions KM5021.0
Notes:
There is a metadata model, called the Common Model that defines the metadata assets
that can be stored in the Information Server Repository and their relationships to other
metadata assets.
You can view the Common Model within Metadata Workbench, on the Advanced>Model
View tab. Here, the objects in the Common Model and its extensions are listed and
documented.
The Common Model consists of a core model of objects and a number of extensions to
define and capture objects not found in the Common Model. Some of these extensions are
specific to Information Server products such as DataStage (Transformation model) and
FastTrack (Mapping Specification model). Others, such as the Business Intelligence
model, apply to objects that can be imported into the Repository for consumption by
Information Server products.
Student Notebook
External metadata
Common Model describes both metadata produced by IS
applications and external metadata consumed
Integrated with IS-produced metadata following the Common
Model format
Source of external metadata
Many types of external metadata can be imported into the IS
Repository using Metadata Asset Manager
Functionality within IS products
Hosts (systems that manage databases and other data resources) can
be imported into the IS Repository in FastTrack
Databases, database tables, schemas can be imported into the IS
Repository in FastTrack
Data files and structures can be imported into the IS Repository in
DataStage
Business categories and terms can be imported into the IS Repository in
Business Glossary
Figure 10-23. External metadata KM5021.0
Notes:
The Common Model defines the metadata assets that are recognized by Information
Server, and these can include metadata assets that are produced by Information Server,
and it can include metadata that is imported into the Information Server repository to be
consumed by Information Server products.
There are many sources of this external metadata. Some of this external metadata can be
imported into the Repository using functionality with Information Server products. For
example, Hosts (systems that manage databases) and database objects can be imported
in FastTrack and Information Analyzer. Business categories and terms can be imported in
Business Glossary.
Metadata Asset Manager can also be used to import external metadata, and there are
types of metadata assets that can only be imported using Metadata Asset Manager.
V7.0
Student Notebook
Uempty
Metadata Workbench Model View tab
Model View
Common Model
Host asset Details displayed
Figure 10-24. Metadata Workbench Model View tab KM5021.0
Notes:
This graphic shows the Advanced > Model View tab in Metadata Workbench. In the left
panel you see a list of the Common Model and its extension models. Expand the model
folder to display the metadata assets defined in the model. In this graphic, the Common
Model objects are listed in the left panel. Select an object to display its definition in the right
panel.
In this example, the Host asset has been selected. Its definition is displayed in the right
panel. This includes a description of the class, and a list of its properties and relationships.
Student Notebook
Data resource metadata asset examples

Host
Computer that hosts a database or file
Database
A storage collection of data, organized into subsets of data called
schemas
Contains database tables
Includes information about the database and DataStage jobs that
access it
Data File
A storage collection of data organized into data structures of fields
Includes information about the main properties of the data file as well
as information about the DataStage jobs that read from it
BI Report
A two-dimensional formatted report containing business information
Includes information about database tables and other objects the
report is bound to
Figure 10-25. Data resource metadata asset examples KM5021.0
Notes:
To give you an idea of what is in the model, here are a couple of examples of metadata
assets defined in the Common Model. These are examples of assets that are consumed,
not produced, by Information Server products.
A Host is a computer that hosts databases or files. A Database contains database tables. A
Data File is collection of data organized into data structures of fields. In this respect, Data
Files are similar to database tables. Both of these assets are stored under Hosts, and
consumed by Information Server produced assets, such as DataStage jobs.
A BI Report contains information about physical and logical tables, among other objects.
Like database tables these objects can be consumed by Information Server assets, such
as DataStage jobs.
V7.0
Student Notebook
Uempty

Manage Repository metadata assets
Import metadata assets into the Repository, to be shared with
Information Server products
Metadata assets can be imported using engine Connectors and Bridges
Connectors are defined on the engine server system
Bridges are defined on engine client systems
Metadata Interchange Servers are used to exchange metadata assets between
the engine client and server systems that have the bridges and connectors with
the IS services system
Metadata Interchange Servers are installed and configured when the engine client and
server software is installed
New Metadata Interchange Servers can be added
Search and browse Repository metadata assets
Limited to external metadata assets
Can view all assets in Metadata Workbench
Manage potential duplicates and disconnected assets
Figure 10-26. Metadata Asset Manager KM5021.0
Notes:
InfoSphere Metadata Asset Manager (IMAM) is the primary Information Server product for
managing external metadata assets, those consumed, but not produced, by Information
Server products. Like with Metadata Workbench, you can browse and search metadata
assets in the Repository, but IMAM is limited to external metadata.
IMAM also has import/export capabilities with respect to external metadata assets. In this
respect, it complements Metadata Workbench which does not have these capabilities.
Student Notebook
Logging into InfoSphere Metadata Asset Manager (IMAM)
Log into the Information Server Web Console

Open Internet Explorer and enter the IMAM address:
http://edserver.ibm.com:9080/ibm/imam/console
The user ID requires Common Metadata Administrator, Common
Metadata User, or Common Metadata Importer Suite role
Common
metadata roles
Figure 10-27. Logging into InfoSphere Metadata Asset Manager (IMAM) KM5021.0
Notes:
To log into Metadata Asset Manager (IMAM), open Internet Explorer and enter the IMAM
address: http://edserver.ibm.com:9080/ibm/imam/console. The user ID used to log into
IMAM must possess either the Common Metadata Administrator role, Common
Metadata User role, or the Common Metadata Importer role.
The Common Metadata User role allows the user to use the search and browse
functionality in IMAM.
The Common Metadata Importer role allows the user to create import areas and to import
metadata into the Repository.
The Common Metadata Administrator role enables the user to do anything in IMAM.
V7.0
Student Notebook
Uempty
Metadata Interchange Servers

Defined on the Administration tab
Configured during Information Server installation
Engine client with

installed bridges
Engine server
with installed
connectors
Figure 10-28. Metadata Interchange Servers KM5021.0
Notes:
Metadata Interchange Servers are defined on the Administration tab. In this graphic two
Servers are enabled. These Servers were configured when the Information Server Engine
clients were installed. In this example, EDCLIENT is the host name of the client system
and edserver.ibm.com is the name of the Information Server Engine system.
Metadata Interchange Servers are used to exchange metadata assets between the engine
client and server systems that have the bridges and connectors with the IS services
system. This enables BI metadata assets imported on my client system, using bridges and
connectors that only exist on my client system, to be saved into the Repository.
Student Notebook
Importing metadata assets

Create an import area
Select metadata interchange server
Then select a bridge or connector
Specify import parameters
Path to import file
File can exist on local system or metadata interchange server system
Select the parameter to display documentation about it
Imported metadata assets can be viewed first in a staging area
before they are shared to the Repository
Called a Managed import
Express imports share without staging first
Depends on import settings
Figure 10-29. Importing metadata assets KM5021.0
Notes:
Metadata assets are first imported into a staging area. To create a new import staging area,
click New Import Area on the Import tab. Specify a name for the import area, and then
select the metadata interchange server you are using to import the metadata. The
metadata assets, and the bridges and connectors available to import the assets, will vary
depending on the metadata interchange server. For example, DB2 and DB2 connectors
may be installed on one server but not the other. Some engine client systems may have BI
metadata available that is not available on other engine client systems.
After you select the metadata interchange server, select the connector or bridge you will
use to import the metadata assets. For example, select the CA ERwin4 Data Modeler
bridge to import logical data models and physical data models from a CA AllFusion ERwin
4 file.
Click Next to move to the Import Parameters page. Here, in the case of an ERwin file, you
would browse for the file on the metadata interchange server system. Select a parameter to
display documentation about it.
V7.0
Student Notebook
Uempty
Import settings
Specify staging area requirements, either:
All imports
Imports where assets are merged
When the import contains duplicates
Imports with duplicates can be blocked
Staging area
requirements
Allow
duplicates?
Figure 10-30. Import settings KM5021.0
Notes:
There are a number of settings that determine how imports will be handled. A Common
Metadata Administrator can change these settings. One setting determines the conditions
under which the user is required to view the metadata assets in the staging area before
they are imported to the repository. In this example, one of the conditions is if the metadata
assets may contain duplicates. This enables the user to examine the possible duplicates
before deciding whether to do the import.
Student Notebook
Creating a new import area

Name of import area
Name of import
Select metadata interchange server area
Select bridge or connector
Metadata
interchange server
Bridge
Figure 10-31. Creating a new import area KM5021.0
Notes:
In the Import area name, specify a name for the new import area. Optionally, add a
description. Then select the metadata interchange server you will be using for the import.
Different sets of metadata assets are accessible to different metadata interchange servers.
Choose the server that has access to the metadata assets you want to import.
In this example, EDCLIENT is the name of the metadata interchange server. This is a
DataStage client system where the BI bridges have been installed, including the CA Erwin
bridge.
V7.0
Student Notebook
Uempty
Import parameters
Select location of the import file
Specify path to import file
Configure other parameters as needed
Import file
location
Path to import
file
Figure 10-32. Import parameters KM5021.0
Notes:
In this example, the Erwin metadata assets are contained in an XML file located on the
EDCLIENT metadata interchange server system. The Metadata interchange server radio
button has been selected to indicate this. And a path to the file has been specified in the
File box.
There are a number of additional optional parameters that can be specified. Specify these
as needed.
Student Notebook
Select type of import

Express import: Automatically share if import settings
requirements are satisfied
Managed import: Preview metadata assets in a staging area
Figure 10-33. Select type of import KM5021.0
Notes:
On this page you choose the type of import to perform. You can choose either an express
import or an managed import. An express import automatically imports the metadata
assets that have been loaded into the staging area into the Information Server Repository,
if all import settings requirements have been satisfied.
A managed import loads the assets into the staging area for you to preview, before you
decide to import the assets into the Repository. In this example, a managed import has
been selected.
V7.0
Student Notebook
Uempty
View results in the staging area

Click Analyze to analyze assets
Click Share to Repository to import to Repository
Disabled if import settings requirements are not satisfied; for example,
assets contain potential duplicates
Figure 10-34. View results in the staging area KM5021.0
Notes:
After the metadata assets have been loaded into the staging area, you can perform an
analysis of the assets and preview them. Click the Analyze button to initiate the analysis.
The analysis generates a set of statistics about the assets, displayed in the lower left panel.
At the right panel, you can browse through the assets that have been loaded into the
staging area.
Click the Share to Repository button to import the assets into the Information Server
Repository. This button is not enabled until you perform the analysis and preview.
Student Notebook
Browsing metadata assets

Only a subset of the total metadata assets in the Repository
can be viewed in IMAM
Does not include Information Server produced assets, such as
DataStage jobs
Figure 10-35. Browsing metadata assets KM5021.0
Notes:
In addition to importing BI metadata assets into the Repository, you can also browse the BI
metadata assets that are already in the Repository. Be aware that not all metadata assets
that are in the Repository can be viewed in IMAM. For example, DataStage jobs stored in
the Repository cannot be view from within IMAM. Only those types of assets that can be
imported using IMAM can be viewed in IMAM. To view all types of assets, use Metadata
Workbench.
The Browse Assets folders lists the types of metadata assets that can be viewed in IMAM.
These assets include BI metadata, data models of data resources, as well as physically
implemented data resources. With respect to the latter, for example, you can connect to a
database system and import metadata for its databases and database tables.
V7.0
Student Notebook
Uempty
Browse logical data models

Select on a folder or asset to display information about it in the
right panel
Browsed Asset
assets information
Figure 10-36. Browse logical data models KM5021.0
Notes:
In this example, we are browsing through a logical data model of assets that were
contained in the XML file that was imported earlier. This particular model contains a
number of different entities, for example, an Accounting Unit entity.
Information about the assets you select in the middle panel is displayed in the right panel.
Student Notebook
Checkpoint
1. What commands can you invoke with istool?
2. What GUI tools can you use to import and export DataStage
objects?
3. In Metadata Asset Manager, what is a "metadata interchange
server"?
4. In Metadata Asset Mangager, what is the difference between
an express import and a managed import?
Notes:
V7.0
Student Notebook
Uempty
Exercises Unit 10
Export DataStage assets using istool
Import assets using istool
Export security assets using istool
Create, build, and deploy a package
using Information Server Manager
Export assets using Information Server
Manager
View the DataStage assets in an
existing archive
Import metadata assets using Metadata
Asset Manager (IMAM)
View metadata assets using Metadata
Asset Manager (IMAM)
Manage duplicates
Notes:
Student Notebook
Unit summary
Manager
Manage duplicate metadata assets using Metadata Asset
Manager
Notes:
V7.0
Student Notebook
Uempty Unit 11. Information Services Console

Configuration

This unit describes how to configure the Information Server clients
accessible through the Information services Console. This includes
Information Analyzer and Information Services Director

Configure Information Analyzer
Configure Information Services Director

Copyright IBM Corp. 2007, 2012 Unit 11. Information Services Console Configuration 11-1
Student Notebook
Unit objectives
Notes:
V7.0
Student Notebook
Uempty
Information Analyzer Product Configuration
Figure 11-2. Information Analyzer Product Configuration KM5021.0
Notes:
Student Notebook
Architecture
Product Overview
Information Server Console
InfoSphere
Application DataStage
Server IS Console Web Console
Domain Agent Engine
DB2
Xmeta IADB
Information Analyzer database

Used by IA
Figure 11-3. Architecture KM5021.0
Notes:
The Information Server Console is the Information Analyzer and Information Services
Director front-end. The Information Server Web Console gives you access to security
controls for Information Server clients, including Information Analyzer and Information
Services Director.
Information Analyzer uses the DataStage Engine, also known as the Information Server
Engine for this reason, to run data analysis jobs. The resulting analysis data is loaded into
the Information Analyzer database (IADB).
Information Services Director also used the DataStage Engine as one of its service
providers.
XMETA is also, of course, used by Information Analyzer and Information Services Director
to store their objects.
V7.0
Student Notebook
Uempty
Post Information Server installation steps

Create ODBC data source connection to IADB
Set Information Analyzer user permissions in the IS Web Console
Three roles:
Information Analyzer Data Administrator
Import metadata, analysis settings, system sources
Information Analyzer Project Administrator
Can configure and administer IA projects: create, delete, modify
Information Analyzer User
Set the analysis options for the Analysis Database (IADB) and the
Analysis Engine (DataStage)
Figure 11-4. Post Information Server installation steps KM5021.0
Notes:
After Information Server, along with Information Analyzer, is installed, some additional
configuration is needed for Information Analyzer. This includes creating an ODBC data
source connection to IADB and configuring Information Analyzer users and groups.
You also need to set the configuration options for the Analysis Database (IADB) and the
Analysis Engine (DataStage).
Student Notebook
ODBC data source connection to IADB

Edit .odbc.ini file
Edit uvodbc.config file for ANALYZERPROJECT
This DataStage project is used by Information Analyzer
Created during IS installation
.odbc.ini file
entry
Figure 11-5. ODBC data source connection to IADB KM5021.0
Notes:
An earlier unit discussed how to create ODBC data source connections. The same
procedure described earlier is used to define an ODBC connection to the IADB database.
The graphic shows how the DB2 IADB database entry is specified in the .odbc.ini file. The
main properties to configure are the Database (IADB), the IpAddress (host name of
services tier system), the LogonID and Password properties for connecting to IADB, and
the TcpPort used to connect to DB2 (50000).
V7.0
Student Notebook
Uempty
Setting user permissions in the Web Console

Configuration
Figure 11-6. Setting user permissions in the Web Console KM5021.0
Notes:
Information Server user IDs with Information Analyzer authorization roles are created in the
Information Server Web Console, as discussed in a previous unit. This graphic shows the
applicable roles in the Web Console.
Student Notebook
Analysis Engine settings

Configuration
User ID with
DataStage
credentials
Check Settings
Figure 11-7. Analysis Engine settings KM5021.0
Notes:
The Analysis Settings tab contains several sub-tabs. This graphic shows the Analysis
Engine sub-tab. As mentioned earlier, Information Analyzer uses the DataStage parallel
Engine to perform its analyses. Here you specify DataStage credentials for the Engine.
That is, you specify the operating system user ID and password of a user on the Engine
system.
By default, when Information Analyzer is installed a DataStage project named
ANALYZERPROJECT is created. The DataStage jobs used by Information Analyzer are
created in this project.
Click the Validate Settings button after to check the settings.
V7.0
Student Notebook
Uempty
Analysis database settings

Configuration
Check Settings
Check Settings
Figure 11-8. Analysis database settings KM5021.0
Notes:
The Analysis Settings tab contains several sub-tabs. This graphic shows the Analysis
Database sub-tab. Check the values in all the fields to ensure they reflect the actual values
of the systems configuration. In particular, pay attention to User Name, Password and
Analysis Connector DSN, since these values are the most likely to be changed during
installation. The User Name and Password boxes refer to the DB2 account created to log
into the IADB database.
Student Notebook
Data source configuration

The source of the data to be analyzed must be defined for
Information Analyzer
If ODBC is used, create data source name for the source database to
be analyzed
This data source must be available to ANALYZERPROJECT, where
the Analyzer jobs are running
Define an entry for IADB in the projects uvodbc.config file
Within Information Analyzer, import table definitions for source
data tables
Figure 11-9. Data source configuration KM5021.0
Notes:
The IADB database contains tables used to store analysis results. It does not contain the
tables that contain the data to be analyzed. A connection to the source data tables must
also configured in Information Analyzer.
If an ODBC connection to the source database is to be used, then this ODBC connection
must also be configured, following the same procedure as for IADB. This data source must
also be available to the ANALYZERPROJECT DataStage project, just as for IADB. That is,
an entry must be made in the uvodbc.config file for that project.
Once the ODBC connection is created, a new data source connection within Information
Analyzer can be defined.
Table definitions will also need to be imported in Information Analyzer be the data in those
tables can be analyzed.
V7.0
Student Notebook
Uempty
Define source
Basic Tasks
To connect to the source

database, first define where
the data is
Data source New data

host store
Figure 11-10. Define source KM5021.0
Notes:
This graphic shows how to define a new data source (data store) in Information Analyzer.
Click Configuration>Sources in the Home pillar menu to open the Sources tab, shown in
the lower graphic. Select the host that owns the data source. In this graphic,
EDSERVER.IBM.COM is a host that is already defined in the Information Server
Repository. If the host of the data source is not listed, click New Host Computer to add it to
the Repository.
Click New Data Store to define the new source.
Student Notebook
Define source
Basic Tasks
Connector
name
Connector
Name of information
data store in
the Check
Repository connection
Figure 11-11. Define source KM5021.0
Notes:
In this example, there is a DB2 database named SAMPLE. An ODBC connection to it has
been created. The ODBC connection is also named SAMPLE. Although this ODBC
connection has been created, it is not yet defined within the Information Server Repository.
The name of the data store is the name you want it to be known as in the Information
Server Repository. Best practice suggests that this name should match the physical name
of the database, but this is not required. For this reason, the data store is named SAMPLE
to match the name of the database.
We also need to specify how to connect to the data store. This is done in the middle panel.
The data connection (also called SAMPLE) is defined. It is an ODBC connector and its
connection string (DSN) is SAMPLE.
Metadata defining both the data store and the connector are now loaded into the
Repository. This information will be available to other Information Server products, such as
FastTrack.
V7.0
Student Notebook
Uempty
Importing table definitions for source tables

Basic Tasks
Import
metadata
After expanding levels,

you can import table Expand
definitions for selected levels
tables
Figure 11-12. Importing table definitions for source tables KM5021.0
Notes:
Once a data store has been defined, table definitions for tables in it can be imported into
the Repository. This is required before the data in those tables can be analyzed.
To import the table definitions, from the Home pillar menu select Metadata Management,
and then select Import Metadata. Expand the levels of the data source until you reach the
level for import. Select the tables, and then click Import.
Student Notebook
Creating a project
Basic Tasks
Project
type
New
project
Figure 11-13. Creating a project KM5021.0
Notes:
Like many of the Information Server products, before work can be done in Information
Analyzer, an Information Analyzer project must be created to do the work in. Multiple
projects can be created, each accessible by different sets of users.
To create a new project, first click New Project from the My Home tab. Give the project a
name and select its type, that is, Information Analyzer. Recall that the Information Server
Console is an interface to two kinds of projects: Information Analyzer projects and
Information Services Director projects. Be sure you select the correct type.
V7.0
Student Notebook
Uempty
Associate metadata with the project

Basic Tasks
Data
Source tab
Make imported
metadata available to
the project
Figure 11-14. Associate metadata with the project KM5021.0
Notes:
When you create a project, the Project Properties tab is opened with a number of
sub-tabs. On these sub-tabs you can configure the various properties of the project.
On the Data Sources tab you can select which data sources are available to the project. In
this example, the SAMPLE data store imported tables have been made available to the
project.
Student Notebook
Add users to project
Users tab
Browse for
users to add Specify project
to project roles for users
Figure 11-15. Add users to project KM5021.0
Notes:
On the Users tab you specify the users that have access to the project. These can include
any users that have been give Information Analyzer product roles in the Web Console.
Click on the Browse button to add and configure users for the project. In this example,
student has been added. In addition to adding users, you can specify their roles within the
project.
Different Information Analyzer users can have different roles within the project. The next
page defines these roles.
V7.0
Student Notebook
Uempty
Information Analyzer project roles

Business Analyst
Reviews analysis results
Data Steward
Gets read-only views of analysis results
Drill down user
Can drill down into source data if drill down security is enabled
Data Operator
Manages data analyses and logs
Can run Analysis jobs
Figure 11-16. Information Analyzer project roles KM5021.0
Notes:
Different roles have different authorizations. A user can be given multiple roles.
Student Notebook
Information Services Director Configuration
Figure 11-17. Information Services Director Configuration KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Information Services Director (ISD) configuration

Access to ISD is through the Information Server Console
Same as for Information Analyzer
Create an ISD project
Similar procedure as with Information Analyzer, except you select
Information Services as the project type
Specify project users and their project roles
Information Services Director Designer: Edit services and operations
Information Services Director Project Administrator: Configure projects, edit
applications
Create an ISD application
Click Develop>Information Services Application
Define information services connections
Figure 11-18. Information Services Director (ISD) configuration KM5021.0
Notes:
Just as for Information Analyzer, access to Information Services Director (ISD) is also
through the Information Server Console. Just as for Information Analyzer, work is also done
in ISD projects.
Beyond configuring the project, the main task is to create ISD applications and to define the
information service connections for each.
Student Notebook
ISD users
Click Browse to add users to the project
Select roles for the users
Figure 11-19. ISD users KM5021.0
Notes:
The process of adding users to a project is the same as for Information Analyzer. For each
user, you can select one or more project roles. The Project Administrator role authorizes
the user to create and edit project properties and to create and delete applications.
The Designer role authorizes the user to add, delete, and edit services within an
application.
V7.0
Student Notebook
Uempty
Creating an ISD application

Click Develop>Information Services Application
Enter the name of the application
An application can contain one or more services
Figure 11-20. Creating an ISD application KM5021.0
Notes:
An application can contain one or more services. Once an application has been created, an
ISD Designer can create, delete, and edit services within the application.
Student Notebook
Configure an information services connection

DSServer is created during ISD installation, but it is not
configured
Select DSServer then click Open to edit the connection
Figure 11-21. Configure an information services connection KM5021.0
Notes:
Information services connections are used to connect to service providers. Service
providers implement the logic that the service provides its consumers. A number of
different service providers can be used, including DB2, Federation Server, and DataStage.
DSServer is created during installation to connect to DataStage. Select the connection and
then click Open to edit the connection.
V7.0
Student Notebook
Uempty
Configuring the Datastage service provider

For the user, specify a DataStage administrator or developer
with DataStage credentials
DataStage
user ID
Figure 11-22. Configuring the Datastage service provider KM5021.0
Notes:
The primary thing needed is to specify a DataStage user ID. This user ID requires
DataStage Administrator or developer authorization, and must have DataStage credentials.
DataStage providers consist of a special type of DataStage job, one which has one or both
an ISD Input stage and an ISD Output stage. The former is used to pass values from the
service to the DataStage job. The latter is used to return output from the job to the service,
to be passed back to the service consumer.
Student Notebook
Configuring a DB2 service provider

Select DB2 as the provider type
Specify the services and engine hosts (edserver.ibm.com)
Specify the DB2 database to connect to
Provider
type
DB2
database
Figure 11-23. Configuring a DB2 service provider KM5021.0
Notes:
When you configure a DB2 or Federation Server connection, you specify the type (DB2 or
Federation Server), the database host (edserver.ibm.com), and the database
(SAMPLE). This will enable, for example, DB2 SELECT statements within the SAMPLE
database to be used as service providers.
V7.0
Student Notebook
Uempty
Checkpoint
1. What client do you log into to gain access to Information
Analyzer?
2. What tasks do you need to do after IS installation to
configure IA?
3. Name two types of Information Services Director service
providers.
4. What makes a DataStage or QualityStage job the type of job
that can be used as a service provider?
Notes:
Student Notebook
Exercises Unit 11
Configure Information Analyzer settings
Configure an Information Analyzer data
source
Import table definitions for source data
tables
Create an Information Analyzer project
Configure an information services
application
Notes:
V7.0
Student Notebook
Uempty
Unit summary
Notes:
Student Notebook
V7.0
Student Notebook
Uempty Unit 12. Installation and Deployment

This unit describes the installation and deployment of Information
Server.

Install fix packs and patches
Backup and restore Information Server
Describe the Engine High Availability option

Copyright IBM Corp. 2007, 2012 Unit 12. Installation and Deployment 12-1
Student Notebook
Unit objectives
Notes:
V7.0
Student Notebook
Uempty
Information Server Deployment
Figure 12-2. Information Server Deployment KM5021.0
Notes:
Student Notebook
Deployment models
One system for everything (only possible with Windows Server)
Domain Server
Engine
Windows
DB Server Client
Domain
Machine
Figure 12-3. Deployment models KM5021.0
Notes:
When Information Server is installed, its tiers (Client, Repository, Services, Engine) can be
deployed in different configurations. This graphic shows one Information Server
deployment option.
All Information Server components are installed on one computer system. This is only
possible on a Windows platform, because the Client tier only runs on Windows.
V7.0
Student Notebook
Uempty
Deployment models
Metadata Server, Repository, and Engine are on one system
Domain Server
Engine
Windows Client
DB Server
Domain
Machine
Notes:
In this deployment option, all the tiers are installed on one machine except for the Client
tier, which is installed on a Windows system. The Server system can be either a UNIX or
Windows system.
Student Notebook
Deployment models
Different machine for Engine. Same machine for Repository and Services (WAS)
Domain Server
Engine
Windows Client
DB Server
Machines must run the same operating system

Domain
Machine
Notes:
In this deployment option, the Engine is separated from the system containing the
Repository and Services tiers. The Client tier must be a Windows system. The system
containing the Repository and Services tiers can be either Unix or Windows.
Shown in this graphic is one Engine on one computer system. Also possible are multiple
Engines on either a single computer system or on separate computer systems.
V7.0
Student Notebook
Uempty
Deployment models
Multiple Engine machines. Same machine for Repository and Services (WAS)
Engine
Domain Server
Windows Client
Engine
DB Server
Domain
Machine
Notes:
Within a single Information Server domain, there can be multiple Engines. Although this
graphic shows two different computer systems, these multiple Engines can be on either
separate systems or be on a single system.
Student Notebook
Linux Installation Example
Figure 12-7. Linux Installation Example KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Suite installer
Installs all the products as part of a single Suite installation
All the tiers (Client, Engine, Repository, Services) are available in the
Suite installer
You select which tier or tiers you want to install on the system you are
currently on
You can select a subset of the products to install
Supports graphical installer on all platforms
Supports silent installation on all platforms
Supports console based installation on all platforms
Figure 12-8. Suite installer KM5021.0
Notes:
All of the tiers (Client, Engine, Repository, Domain) are available in the Suite installer. You
select which tier or tiers you want to install on the system you are currently on. For
example, if you are deploying to two systems, a Windows client system and a Linux server
system, you would run the installer on the Windows system to install the clients, and run
the installer on the Linux system to install the other tiers.
Student Notebook
Installation steps - 1
Acquire the Information Server installation package
Copy the package to the computer you are installing on
In this example, there is a Linux Server and a Windows Client
Run the install on the Server first
In a terminal window, move to the location of the
uncompressed installation file (is-suite), then open the is-suite
folder
Enter the command shown to start the installation script
Start the install
Install URL
Figure 12-9. Installation steps - 1 KM5021.0
Notes:
This and subsequent pages go through the steps of the installation process. Begin by
copying the installation package to the computer you are installing of. In this example, the
Sever is Linux and the Client is Windows. All tiers except the Client tier are installed on a
single Linux system.
Begin by running the setup command. Output from the command is a URL that you paste
into a web browser. The rest of the installation process is done in the browser.
V7.0
Student Notebook
Uempty
Copy and paste the URL into a Web browser session
Mozilla on Linux GUI used in this example
Click the Login button.
The installation Getting Started window is displayed
Click Next to move to the Firewall Requirement window
Click Next to go to the Early Requirements Check window
Be sure your system passes all requirements
Click Next to go to the Installation Directory window
Click Next to go to the Installation Type Selection window
For this example, we click New installation, the default
Other selections are: Add products, Add tiers
Notes:
The installation wizard then guides you through a set of pages. The first several pages are
listed and described here.
Student Notebook
Click Next to go to the Tier Selection window
Select the tiers to be installed on the system
Here, we select all three (non-client) tiers: Metadata repository,
Services, and Engine
Notes:
On the Tier Selection window you specify what tiers you want to install on the system you
are running the installation package on. Depending on your deployment option, this could
be one or more tiers. In this example, the Metadata Repository, Services, and Engine tiers
are installed on this one system. The Client tier is not available in this example because it
cannot be installed on a Linux system.
V7.0
Student Notebook
Uempty
Click Next to move to the Product Selection window
In this example, we have selected all products
Notes:
This graphic shows the Product Selection page where you select the products you want
to install on the current system.
As you can see in this graphic, components of individual products may be installed on
multiple tiers. For example, if you install Metadata Workbench, it has components that get
installed on the Engine tier and the Services tier.
Student Notebook
Click Next to move to the Software License Agreement
window
Click Next to move to the DataStage Installation Options
window
Choose the IBM InfoSphere DataStage option to develop parallel
jobs and server jobs
Notes:
The graphic here shows the DataStage installation options. There are three types of jobs
that can be created in DataStage: parallel jobs, server jobs, and mainframe (MVS) jobs. In
this example, both server and parallel jobs can be developed, but not mainframe jobs.
V7.0
Student Notebook
Uempty
Click Next to move to the High Availability Server Cluster
Configuration window
Select Server cluster configuration to deploy a cluster
Specify the virtual host name that will float to the current active
server
Notes:
The High Availability options are discussed later in this unit.
Student Notebook
Installation steps - 7 - WAS

Click Next to move to the Application Server Options window
Choose to install WAS or use an existing WAS installation
Click Next to specify the WAS directory
Click Next to configure the WAS port assignments
Click Next to specify the WAS administrator user ID (default,
wasadmin)
Click Next to specify the Information Server administrator user
ID (isadmin)
Figure 12-15. Installation steps - 7 - WAS KM5021.0
Notes:
Given your tier selection, you now specify options for the WebSphere Application Server
(WAS), the database manager, and Information Server. These include user IDs and
passwords and port information.
V7.0
Student Notebook
Uempty
Installation steps - 9 - Repository database

The next series of pages configure the database manager (default,
DB2)
Install DB2 or use an existing installation?
If an existing installation is used, you must have already run the IBM-
supplied scripts to create the Information Server databases
Specify the DB2 installation directory
Specify the DB2 instance user (default, db2inst1) and instance port number
(default, 50000)
This user ID and other system IDs can be created before the installation or
the installation program can create it
Specify the DB2 fenced user (db2fenc1)
Fenced user-defined functions and stored procedures run under this user
Specify the DB2 administrator (dasusr1)
Specify the XMETA database owner (xmeta)
Specify the owner of the staging area of the XMETA database (xmetasr)
Specify the owner of the DataStage Operations Console tables (dsodb)
By default the XMETA database is used
Figure 12-16. Installation steps - 9 - Repository database KM5021.0
Notes:
The next series of pages are used to configure the database manager, which by default is
DB2. You can use either an existing DB2 installation or the installer can install DB2. Other
existing databases, such as Oracle, are supported.
The Operations Console uses a set of database tables. By default these tables will be
created in the XMETA, Repository database. Optionally, you can specify a separate
database for these tables.
Student Notebook
Click Next to specify the ASB agent port number and logging
agent port number
Notes:
On the Agent Ports Configuration window, you specify the ASB agent port number and
the logging agent port number.
V7.0
Student Notebook
Uempty
Click Next to specify the Information Analyzer database (iadb)
and database owner (iauser)
Notes:
If Information Analyzer is installed, then a database that Information Analyzer uses will also
be installed. On this page, you specify the name of the database (iadb, by default) and the
database owner.
Student Notebook
Installation steps 12 - DataStage

Click Next to specify the DataStage Job Monitor ports
Click Next to specify the ITAG and RPC port numbers for this engine
tier
These numbers apply uniquely to this engine
This is only required if you are installing more than one engine tier in the domain
Click Next to specify the DataStage administrator (dsadm)
Figure 12-19. Installation steps - 12 - DataStage KM5021.0
Notes:
The DataStage administrator user ID is by default dsadm. You can either create this user
ID, along with several other user IDs, on the operating system in advance of the
installation, or you can choose to have the installer create this idea.
V7.0
Student Notebook
Uempty
Installation steps - 13 - DataStage

Click Next to optionally install globalization support
Click Next to optionally install the legacy WebSphere MQ Plug-in
This stage has been replaced by the MQ Connector stage
Click Next to optionally install a legacy SAS configuration
Click Next to install additional DataStage projects
By default one test project (dstage) is installed
Click Next to configure the QualityStage Standardization Rules
database and database owner
By default, the XMETA database is used
Figure 12-20. Installation steps - 13 - DataStage KM5021.0
Notes:
Listed here are a series of installer pages used to configure DataStage and QualityStage.
One option to pay attention to here is the globalization support option, since this option
cannot be configured after installation.
By default one DataStage project named dstage is installed. You can optionally choose to
install additional projects. It is, however, not necessary to create additional projects during
installation, since these can be created after installation, in DataStage Administrator.
Student Notebook
Installation steps 14 System Requirements

Click Next to open the System Requirements Check window
Be sure to address any issues that are raised before continuing the installation
Figure 12-21. Installation steps - 14 - System Requirements KM5021.0
Notes:
Prior to beginning the actual installation, the installation wizard then initiates a number of
tests to check whether the system requirements have been met for installing Information
Server.
If you get warnings, as shown above, open up the messages to see what specifically needs
to be done. You may get warnings about kernel parameter settings. Change these as
necessary. In Linux, you can make changes to kernel parameters by editing the
/etc/sysctl.conf file. Increase the values as suggested in the warning messages. Run
/sbin/sysctl -p to apply the changes.
If the requirements are satisfied, click Next to begin the installation.
V7.0
Student Notebook
Uempty
Client Installation
Figure 12-22. Client Installation KM5021.0
Notes:
Student Notebook
Client installation steps - 1

Acquire the Information Server Windows client installation
package. You have two choices:
Use the Windows installation file
This contains both the server and client installation software
If you choose this option you should select only the client tier to install
Use the client-only installation file
This contains only those components needed to install the Information
Server clients
This file is smaller than the full installation file
Copy the installation file to the Client system and unzip
It unzips to a folder named is-client
Figure 12-23. Client installation steps - 1 KM5021.0
Notes:
The client installation is similar, but simpler.
V7.0
Student Notebook
Uempty

Open the is-client folder and then open setup.exe
The installation program will open a web browser and load a URL which opens to
the Login window
Repeatedly click Next to move through the installation windows
Many of the windows are similar to windows you viewed during the IS 8.7 Server
install
Eventually, you will reach the Product Selection window
Select the clients for any products you installed
Notes:
Run setup.exe in the installation folder to begin the installation. This loads the installation
URL into a web browser.
Click Next repeatedly to move through the installation windows. Eventually, you will reach
the Product Selection window, shown in the graphic. Select the clients for any products
you installed on the Server.
Student Notebook

Click Next to move to the Software License Agreement window
Click Next to move to the Metadata Interchange Agent Ports
Configuration window
Enter the name of the services host system
Enter the Information Server administrator user ID and password
Notes:
You can optionally choose to register your client system as a Metadata Interchange Agent.
Recall that these agents are used to import business intelligence (BI) metadata into the
Repository in Metadata Asset Manager. In order to perform the registration, the installer
must connect to the services system as an Information Server administrator. On this page,
you specify the name of the host, the port used to communicate with it, and the user ID and
password of the Information Server administrator.
V7.0
Student Notebook
Uempty

Click Next to move to the Software License Agreement
window
Click Next to move to the Desktop Shortcut Creation Option
window
Select Create desktop shortcuts
Click Next to move to the System requirements window
Evaluate any warnings
If possible fix the situation
You also have the option to ignore any warning and continue, but doing
this of course is risky
Click Next to move to the Response File Review window
Click Next to start the installation
Notes:
Like for the Server installation, just before the actual installation begins, the installation
package will check that the system requirements have been met. Fix any errors and
evaluate any warnings before continuing with the installation.
Student Notebook
Testing the Install
Figure 12-27. Testing the Install KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Version.xml file
Located in the /IBM/InformationServer directory on client and
server systems
Documents the installation history, the products installed, and
the status of the installation
Look for status=SUCCESS
Look for list of products installed and their versions
Figure 12-28. Version.xml file KM5021.0
Notes:
After you complete the Information Server installation on the client and server, you should
check whether it installed correctly. There are a number of checks that you can do.
First examine the version.xml file on both the server and client systems. This file
documents the products that are installed and gives a status for each. Verify the list of
products installed and verify that they installed successfully.
Student Notebook
Sample server version.xml file
Figure 12-29. Sample server version.xml file KM5021.0
Notes:
This graphic shows an example of a server version.xml file. Notice that it states that
Information 9.1 has been installed and that its status is SUCCESS. Notice also that it
lists the products that were installed.
V7.0
Student Notebook
Uempty
Sample client version.xml file
Figure 12-30. Sample client version.xml file KM5021.0
Notes:
This graphic shows an example of a client version.xml file. Notice that it states that
Information 9.1 has been installed and that its status is SUCCESS. Notice also that it
lists the products and components that were installed.
The lists of installed products can differ between the client and server. Some products,
such as Blueprint Director, only exist on the client. Similarly, some products or components,
such as IS Recovery, exist only on the server. (IS Recovery is discussed later in this unit.)
Student Notebook
Client tests
Verify that you can ping the services Server
Confirms that there is connectivity between the client and server
systems
Verify that the Information Server (IS) Web Console Login
window appears
Test the Engine
In the IS Web Console, create a DataStage administrator user ID
Set up Engine credentials for the DataStage administrator
Verify that you can log into the DataStage test project (dstage1) in the
DataStage Designer client
Figure 12-31. Client tests KM5021.0
Notes:
On the client, first verify that you have connectivity with the server. Verify that you can ping
the server.
Next, open the Information server Web Console. If the Login window does not come up,
then either Information Server is not running or you are not able to connect to it.
It is also important to test the Engine. In the Web Console, create a DataStage
administrator ID and set up Engine credentials for the ID. Then verify that you can log into
DataStage Designer. You might also create a simple DataStage parallel job with a
Transformer stage and see if it compiles. This will test whether the server system has the
correct C++ compiler installed and configured.
V7.0
Student Notebook
Uempty
Server tests
If the Client tests fail, it may be because Information Server is not up
and running
To test whether the server is up, change to the WAS /InfoSphere/bin
directory, then run the serverStatus.sh script
You may be required to enter your WAS administrator user ID and password
Figure 12-32. Server tests KM5021.0
Notes:
If you cannot open the Web Console on the client, it may be that Information Server is not
up and running. To check this, run the serverStatus.sh script on the server. Verify that
server1 is started. If server1 is not started, check the WAS log files to determine what the
problem is. This was discussed in an earlier unit.
Student Notebook
Installing Information Server Fix Packs and

Patches
Figure 12-33. Installing Information Server Fix Packs and Patches KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Information Server updates

Base product installs a specific release (for example, 9.1)
Fix packs are a cumulative set of updates for a particular release
Include previous fixes
You only need to install the latest
Download from IBM Fix Central. Select:
Product Group = Information Management
Product = Information Server
Installed Version =
Platform =
For Information Analyzer rollup patches, apply the complete set of
product-specific fixes since the last fix pack:
Accumulation of fixes for Information Analyzer only
Shorter release schedule
Does not contain Suite-wide fixes
Fixes for specific issues may be available from IBM Support
Apply the fix using the latest Update Installer
Can download the installer form Fix Central
Figure 12-34. Information Server updates KM5021.0
Notes:
Fix packs are a cumulative set of updates for a particular release. You only need to install
the latest fix pack, as it includes previous fixes. Fix packs are available from IBM Fix
Central.
Student Notebook
Information Server update installer

For all patches and fix packs:
Download the latest version of the Update Installer from Fix Central
Documentation is available when you download the patch
http://www.ibm.com/support/docview.wss?uid=swg24024048
Run updateImage.sh to install the latest Update Installer on your current IS
installation image
Update Installer consists of:
Native launcher (Updater.exe on Windows, Updater on Unix/Linux)
Update installer for all platforms (updater.jar)
Read Me file
Patches and fix packs are platform-dependent and consist of two files:
Read Me file with instructions
*.ispkg file with contents of the patch package
For older fix packs and Installers, always use the new Update Installer
Do not use the updater.jar bundled with the FixPack or the patch
Figure 12-35. Information Server update installer KM5021.0
Notes:
Be sure to use the latest version of the Update Installer. Since the Update Installer changes
frequently, you should check each time you install a fix pack or patch.
A fix pack consists of two files. The Read Me file provides instructions for installing the
pack. The actual pack consists of an *.ispkg file.
V7.0
Student Notebook
Uempty
Fix Pack and Patch install prerequisites

Always review fix pack Release Notes
Install instructions
Known issues and workarounds
Log in as root
Update Installer can be run in graphical or command-line mode
Syntax provided in Release Notes
Figure 12-36. Fix Pack and Patch install prerequisites KM5021.0
Notes:
You can run the Installer in either graphical or command-line mode. You should be logged
in as root whenever you install a patch. Be sure to review the Read Me file accompanying
the patch before you perform the install.
Student Notebook
Patch install workflow

Shut down IS processes
DataStage daemon
ASBNode
Services (WAS)
Metadata Repository
Backup Information Server environment (all tiers)
Start up IS processes
Metadata Repository
Services (WAS)
ASBNode
DataStage daemon
Ensure no users are active / connected to Information Server
Unless specifically noted in the Release Notes, apply fix packs to all tiers in the
following order:
Services
Engine
Client
Verify fix pack installations
Figure 12-37. Patch install workflow KM5021.0
Notes:
It is recommended that you shut down and restart Information Server before applying a fix
pack to ensure that no Information Server processes that could affect the installation are
running. Generally, fix packs are applied to all tiers and should be applied in the order
shown here. If there are exceptions, this will be noted in the Read Me file.
V7.0
Student Notebook
Uempty
Verifying the fix pack installation

Ensure that all IBM Information Server client applications start
and run properly
Verify that the Version.xml file includes Status="Success
Path: /opt/IBM/InformationServer/Version.xml
When the entry shows Status="PartialSuccess" or there is no entry
for the patch that you installed, the patch installation did not succeed.
Locate the Version.xml file
In Version.xml, the entry for the last patch installed will be at
the end of the history section
Figure 12-38. Verifying the fix pack installation KM5021.0
Notes:
After you install the Fix Pack, you should verify it. Start up each of the clients to verify they
work. Check in the Version.xml file that the pack was installed and that it has a Success
status.
Student Notebook
Information Server Backup and Restore
Figure 12-39. Information Server Backup and Restore KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Backing up and restoring Information Server

Use the isrecovery tool to back up the services tier, the engine tier, and
the metadata repository tier
The installation software and patches are not backed up
Can simply be reinstalled
Some components need to be manually backed up
Because of interdependencies between tiers, it is necessary to backup
all tiers in one session
All Information Server services and components must be shut down
before the backup takes place
Before you backup, ensure that there are no active client connections, and place
the server in maintenance mode
Prevents users (other than IS administrators) from logging into Information Server
clients
Information Server administrators can still log into the Web Console
The isrecovery tool forces a shutdown
During a recovery, all tiers must be restored in one session (before any
attempt to restart Information Server)
Figure 12-40. Backing up and restoring Information Server KM5021.0
Notes:
You can use the isrecovery tool to back up and restore Information Server. It is important to
note that the isrecovery tool does not back up the Information Server software. To restore
Information Server, it would be necessary to re-install Information Server and any fix packs
and patches that have been added before you attempt the restore operation. Additionally, it
is important to note that the isrecovery tool does not backup the Information Server clients.
As discussed earlier, Information Server tiers can be installed on multiple systems. When
attempting to backup Information Server, it is necessary to backup all the tiers in the same
session. While the backup is taking place, there can be no active client connections and
Information Server must be placed in maintenance mode.
Student Notebook
Placing Information Server in maintenance mode

Use the SessionAdmin.sh command
Located in /ASBServer/bin directory
The SessionAdmin.sh command can be used to close all
user sessions: -kill-user-sessions
The SessionAdmin.sh command can be used to place
Information Server in maintenance mode: -set-maint-mode
ON
-set-maint-mode OFF turns off maintenance mode
-get-maint-mode returns the current mode
Command syntax: SessionAdmin user <userName> -
password <password> -set-maint-mode ON
Figure 12-41. Placing Information Server in maintenance mode KM5021.0
Notes:
Before you place Information Server in maintenance mode, you should close all user
sessions. You can use the SessionAdmin.sh command with the -ill-user-sessions option
to do this. After all sessions have been closed, you use the -set-maint-mode ON option to
place Information Server in maintenance mode. While Information Server is in
maintenance mode, non-administrative users will not be able to log into Information Server
clients.
V7.0
Student Notebook
Uempty
Backup procedure
Run the SessionAdmin.sh command to stop all Information
Server user sessions
Run the SessionAdmin.sh command to put Information
Server in maintenance mode
Run isrecovery.sh to open backup wizard
Follow the instructions in the wizard
Creates a response file
Contains Information Server system information needed for the backup
Documents what is to be backed up
Run isrecovery.sh resp <responseFile>
Backup must be performed on all domain systems where
software tiers are installed
Figure 12-42. Backup procedure KM5021.0
Notes:
After Information Server is in maintenance mode, you can run isrecovery.sh to start the
backup process. Using the isrecovery.sh backup wizard, you first specify how you want to
perform the backup. This information is put into a response file. Afterwards, you can run
isrecovery.sh with the -resp option to initiate the backup.
Student Notebook
Backup and restore wizard

Click Get Started under Back Up to begin the backup process
Collects parameters needed to backup IS
Stores parameters in a response file
Start backup Start restore

procedure procedure
Figure 12-43. Backup and restore wizard KM5021.0
Notes:
In the GUI, there are two sections: the Back Up sections and the Restore section. Click
Get Started in the Back Up section to begin generating a response file for a backup.
V7.0
Student Notebook
Uempty
Backup wizard parameters - 01

Archive and work directories
Specify paths to archive and work directories
Store archive and temporary files produced by the wizard
Archive directory must be empty
Information Server administrator credentials
Specify IS admin user ID and password
Metadata Repository (XMETA) database options
Choose whether to back up XMETA automatically or manually
If automatically, then provide XMETA database owner ID and password
XMETA database must be on same system as services
If manual, scripts are generated for the backup
Script directory (/Recovery/DatabaseSupport/Metadata) must be empty
Optionally choose whether to backup Information Analyzer
database
If, so choose whether to back it up automatically or manually
Figure 12-44. Backup wizard parameters - 01 KM5021.0
Notes:
As you move through the backup wizard pages, you are prompted to specify different
backup options and to provide information necessary to perform the backup.
Two system folders are used by the IS Recovery tool. Both folders must be empty. The
archive directory is the location of the generated backup archive files. The work directory is
a directory used by the backup process.
Two databases can be backed up: the XMETA repository database and the Information
Analyzer database. You can choose whether to let the tool perform the backups or whether
to allow you to manually perform the backups. It you choose the latter, scripts will be
generated and put into the /Recovery/DatabaseSupport/Metadata folder.
Student Notebook
Backup wizard parameters - 02

Engine tier credentials
Provide the operating system user (dsadm) that owns the DataStage engine
Additional files to backup
Provide a list of files to backup
Full paths to files are listed in a text file
Specify path to text file
Additional files might include:
Log files
QualityStage reference files
Source sequential files accessed by DataStage jobs
Response file
Specify name and path of the generated response file
After the response file is generated, you can exit the wizard and run the
isrecovery.sh resp /Recovery/recovery_backup.xml command
Figure 12-45. Backup wizard parameters - 02 KM5021.0
Notes:
The IS Recovery tool backs up the set of crucial Information Server files. You can in
addition have the tool backup additional files you consider important. These might include
log files, QualityStage reference files, and sequential files used by DataStage jobs. The
additional files are listed in a text file. Each line of the text file provides a path to one of the
files. In the IS Recovery wizard, you specify the name and path to this text file.
The IS Recovery tool wizard generates a response file. It does not itself perform the
backup. After the response file is generated, you can exit the wizard and run the
isrecovery.sh resp /Recovery/recovery_backup.xml command to perform the actual
backup.
V7.0
Student Notebook
Uempty
Restore wizard parameters - 01

Click Get Started under Restore to begin the recovery
process
Collects parameters needed to restore IS
Stores parameters in a response file
Restore pre-requisites include:
Target computers must the same operating system and general
configuration as the source (backup tier computers)
Same relational database software must be used for XMETA and
IADB
Information Server installation must be the same version and have the
same fix packs, and so on
Information Server installation must be unconfigured
Information Server deployment topology must be the same
Specify paths to archive and work directories
Specify path for response file
Figure 12-46. Restore wizard parameters - 01 KM5021.0
Notes:
The restore procedure works in a similar procedure. Click Get Started under Restore to
begin the recovery process. Just as for the backup, the IS Recovery tool wizard generates
a response file. It does not itself perform the restore. After the response file is generated,
you can exit the wizard and run the isrecovery.sh resp
/Recovery/recovery_restore.xml command to perform the actual restore.
The wizard collects the information needed to perform the restore. Before you perform the
restore, the computers in which the recovery is performed and the Information Server
installation software must match what it was at the time of the initial installation, plus any
additional fix packs and patches that have been installed.
Student Notebook
Restore wizard parameters - 02

Specify the directory for the generated scripts for the restoration of
XMETA
This directory must be empty
Specify the directory for the generated scripts for the restoration of IADB
This directory must be empty
Engine tier credentials
Provide the operating system user (dsadm) that owns the DataStage engine
Specify where the DataStage project directories are to be restored
You can specify the installation default or choose another existing directory
Specify the location of the generated restore response file
After the response file is generated, you can exit the wizard and run the
isrecovery.sh resp /Recovery/recovery_restore.xml command
Figure 12-47. Restore wizard parameters - 02 KM5021.0
Notes:
The restoration will configure Information Server as it was configured at the time of the
backup, and it will restore the objects in the XMETA and Information Analyzer repositories
at the time of the backup. Additional files you listed for backup will also be restored.
After the response file is generated, you can exit the wizard and run the isrecovery.sh
resp /Recovery/recovery_restore.xml command to perform the actual restore.
V7.0
Student Notebook
Uempty
Database Capacity Planning
Figure 12-48. Database Capacity Planning KM5021.0
Notes:
Student Notebook
Repository database (XMETA) database sizing

Repository database stores design and operational metadata
Recommendation:
Plan for growth of database
Possibly 40GB or more
Continuously monitor database capacity and adjust as necessary
Using the bundled DB2 database for XMETA
Installed in DB2 instance home, /home/db2inst1
Set to auto-expand but requires adequate file system space
Using a database other than DB2 (Oracle, SQL Server)
Create using scripts
Set to auto-expand but requires adequate file system space
Watch out for logging data
Disable unnecessary logging, and purge as often as possible
Figure 12-49. Repository database (XMETA) database sizing KM5021.0
Notes:
The growth of the Information Server respository databases (XMETA and the Information
Analyzer databases) needs to be monitored and planned for.
You should assume that XMETA will continue to grow over time, as more and more objects
are created and stored in it. These objects include Information Server produced objects,
such as DataStage jobs, logging events data, and metadata, including operational
metadata and BI metadata imported into the Repository using Metadata Asset Manager.
V7.0
Student Notebook
Uempty
Information Analyzer analysis database (IADB)

Stores the high-volume, detailed analysis results generated during the
Analysis Processes
Column Analysis, Primary Key Analysis, Domain Analysis, and so on
Can be created during IS installation, before installation using the supplied
scripts, or after installation
Initially empty
No tables created by the installation process
All tables, indexes, stored procedures are created at runtime dynamically during Analysis
Processing
Use the Analysis Setting Panel in the Information Server Console to
configure IADB
51
Figure 12-50. Information Analyzer analysis database (IADB) KM5021.0
Notes:
Information Analyzer generally uses a database separate from XMETA to store its analysis
results. By default, this database is named IADB. Initially, this IADB is empty. Tables to
store the analysis results are created when an analysis is initiated.
It is difficult to predict the growth of the IADB database, since this depends on how
Information is used and how much it is used. Regular monitoring of this database is
recommended to determine the growth pattern.
Student Notebook
IADB and XMETA deployments
XMETA and IADB can be on the same database server instance but in
different databases
Typical configuration
Default configuration
XMETA and IADB can be on two different database server instances,
one using DB2, the other Oracle
Supported configuration, some customers configure deployment this way
XMETA and IADB are developed using two different application access
designs
XMETA is designed as Object-Relational database
IADB is designed as a 3NF Relational database
52
Figure 12-51. IADB and XMETA deployments KM5021.0
Notes:
XMETA and IADB can be located in the same database, with different schemas, but this is
not recommended for performance reasons. XMETA and IADB have different
characteristics in terms of sizing, change frequency, and performance.
There are two different design approaches used in table creation for XMETA and IADB.
XMETA is designed as an Object-Relational database. IADB is designed as a 3NF
relational database.
V7.0
Student Notebook
Uempty
IADB sizing
Size of Information Analysis Database depends on source
system analysis requirements
Sampled vs. actual data
Actual requires more storage
Total size of all analyzed source data
Retention policy for existing analysis results and baselines
Recommendation:
Start with minimum of 300GB
Plan for four times the size of total source data
Detailed IADB sizing formula is available in
Information Server Capacity Planning Overview
53
Figure 12-52. IADB sizing KM5021.0
Notes:
The size of IADB depends on the source system analysis requirements. If samples of data
can be used instead of the actual data, then less storage will be needed. Another factor is
the retention policy for the analysis results. A longer term retention policy will obviously
require more storage than a shorter term retention policy.
Student Notebook
Engine High Availability Option
Figure 12-53. Engine High Availability Option KM5021.0
Notes:
V7.0
Student Notebook
Uempty
Engine High Availability (HA) option

Uses redundancy to increase availability
Eliminates single points Thisof failure
IS has HA solutions for each tier
Engine: Active-passive configuration managed by HA cluster
management software, such as IBM Tivoli System Automation for
Multiplatforms
Services: WAS clustering
Repository: Database clustering
DB2 supports
Oracle supports cluster through Oracle Real Application Clusters (RAC)
Figure 12-54. Engine High Availability (HA) option KM5021.0
Notes:
This unit focuses on Engine High Availability (HA) solutions. Information Server also has
HA solutions for the Services and Repository tiers as well.
HA uses redundancy to increase the availability of the Engine. HA ensures that if an
Engine system goes down, an alternative Engine system can take over. This eliminates
single points of failure. If one Engine system goes down, there will always be another
Engine system that can take over. In order for the system to go down as a whole, multiple
Engines systems must fail at the same time.
Student Notebook
Active-Passive topology
IS software is installed on a file system shared by multiple
computers
HA software is used to cluster the computers
Active-Passive model
The active Server hosts the IS Server instance
The passive Server or Servers are started but not running IS
HA software on all Servers maintains a heartbeat
Sent from the active Server to the passive Servers periodically
Indicates to the passive Server that the active Server is still active
When the active Server fails (heartbeat ends), the HA software
restarts IS on the passive Server (which then becomes the new active
Server
Figure 12-55. Active-Passive topology KM5021.0
Notes:
Information Server software is installed on a file system shared by multiple computers. The
HA software is used to cluster the computers. At any given time, one of the computers is
active, that is, it hosts the running DataStage Server instance. The other computers in the
cluster are passive; they are running but not hosting the DataStage Server instance.
HA software on all the computers in the cluster maintains a heartbeat. The heartbeat
informs the passive computers that the active computer is still active. If the active computer
goes down, the heartbeat is not sent. A passive computer then restarts Information Server,
thereby becoming the new active computer.
V7.0
Student Notebook
Uempty
HA Active-Passive model
Active Passive
Server Server
Heartbeat
Figure 12-56. HA Active-Passive model KM5021.0
Notes:
This graphic illustrates an HA cluster. Notice that the active server in this diagram is
running the Engine, Services, and Database software tiers. the passive Server is running
with the HA management software, but the Information Server software is not running on it.
In this configuration, there are only two computers: one active and one passive. You can
add additional passive computers increases the redundancy.
Student Notebook
Installation configuration
Host name alias that will always refer to the active Server
Alias moves between the active and passive systems
Clients connect using the alias
IS services are unavailable during the period between the time
of the initial active Server failure to when the new Server
(formerly passive) is operational
Client connections are broken and need to be reestablished
Running DataStage jobs abort and would need to be reset and
restarted
Figure 12-57. Installation configuration KM5021.0
Notes:
The active Server is referred to by a Host name alias. This alias is always used to refer to
the active Server. If the active Server goes down, the alias is moved to the passive
computer chosen to be the next active computer.
It is important to realize that when the active computer goes down, DataStage stops for a
time, until the new active computer restarts it. This means that any DataStage jobs that
were running at the time of the failure will have aborted. When the cluster comes back up,
they will need to be reset and restarted. The HA solution reduces downtime; it does not
completely eliminate it.
V7.0
Student Notebook
Uempty
Engine HA
DataStage parallel Engine supports distributed job processing
DataStage parallel jobs can run on multiple nodes
Nodes can be associated with processors on different computers
connected over a network (grid)
Resource manager software can be used to dynamically reassign the
nodes used to run a job to those that are active
When jobs fail (because an active Server goes down)
The resource manager creates a new configuration file to run the failed job
only on nodes that are now active
IS supports grid implementations on Red Hat Enterprise Linux only using IBM
LoadLeveler resource management software
Figure 12-58. Engine HA KM5021.0
Notes:
The DataStage parallel Engine supports distributed job processing. That is, DataStage jobs
can be running on multiple nodes associated with multiple physical computer systems.
If a job fails, resource manager software can be used to dynamically reassign the nodes
used to run the job to those that are associated with computers that are running. It does
this by dynamically creating a new configuration file.
Student Notebook
Checkpoint
1. Can more than one DataStage Server exist in the same
Information Server domain?
2. What HA solutions are available for Information Server?
3. What do you need to install a fix pack?
4. In HA, what is the purpose of the host name alias?
5. What is maintenance mode?
6. What command is used to backup (or restore) Information
Server?
Notes:
V7.0
Student Notebook
Uempty
Exercise 12
Put Information Server into maintenance
mode
Use IS Recovery to backup Information
Server
Use IS Recovery to restore Information
Server
Take Information Server out of
maintenance mode
Figure 12-60. Exercise 12 KM5021.0
Notes:
Student Notebook
Unit summary
Notes:
V7.0
Student Notebook
Uempty Unit 13. Serviceability

This unit discusses troubleshooting using audit trace files and ISA Lite.

View audit trace files on the server
View audit trace files on the client
Generate an ISA Lite Basic System summary report
Generate an ISA Lite PX Engine Configuration Test report

Copyright IBM Corp. 2007, 2012 Unit 13. Serviceability 13-1

Student Notebook
Unit objectives
Notes:
V7.0
Student Notebook
Uempty
Audit tracing
Helps determine the action being performed at a point of
failure
When the action occurred
User that initiated the action
Two areas of auditing:
Server Audit Tracing
Includes project creation and deletion
Client Audit Tracing
Includes Client login and logout, compilation, and so on
Figure 13-2. Audit tracing KM5021.0
Notes:
If failures occur there are several sources of information you can look at for clues. Audit
tracing helps determine the action being performed at a point of failure. There are two
areas of auditing: Server audit tracing and Client audit tracing. Each provides useful
information.

Student Notebook
Server audit tracing

Traces when projects are created and deleted
When and by whom
Server tracing placed in a new file in DSEngine directory on
the Server
/InformationServer/Server/DSEngine/DSAuditTrace.log
File contains header generated when file is first created:
LOG CREATED: 14:29:52 11 AUG 2012, user=,
from=/opt/IBM/InformationServer/Server/DSEngine,
version=9.1.0.0, platform=LINUX64
File is appended to forever
It is safe to delete it if it gets too large
It will be recreated next time it is needed
Subsequent lines relate to either a project creation or deletion
call
Figure 13-3. Server audit tracing KM5021.0
Notes:
Server audit tracing traces when projects are created and deleted, and it provides
information about each of these events that occurs. The information is contained in the
/InformationServer/Server/DSEngine/DSAuditTrace.log file.
After the file header, which is generated when the audit file is created, each event is
recorded. This file will continue to grow as new events are recorded. You can delete the file
at any time. If you do, the file will be recreated when the next audit event occurs.
V7.0
Student Notebook
Uempty
Project deletion/creation messages

When a project is deleted, messages similar to the following will be
generated :
Project deletion from xmeta repository started : name=dstage,
host=EDSERVER
Project deletion from xmeta repository finished: name=dstage,
host=EDSERVER,
result=<value usually 0>
Project deletion from server started : name=dstage
Project deletion from server finished: name=dstage,
ErrorMsg=<message if any>
Successful project creation call will generate messages similar to the
following three messages:
Project creation started on server: name=dstage,
path=/opt/IBM/InformationServer/Projects/dstage
Adding project to xmeta repository: name=dstage, host=EDSERVER,
locale=ENU
Project creation finished OK: name=dstage, host=EDSERVER
Figure 13-4. Project deletion/creation messages KM5021.0
Notes:
After the file heading, the file records both project creation and project deletion messages.
Samples of these are shown. A graphic example of the file is displayed on the next page.
The format of the audit messages is displayed here. There are several lines of messages
recorded for each event. The information displayed includes when the DataStage project
was created or deleted, what its name is, the name of the system hosting the project, and
error messages if applicable.

Student Notebook
Example DSAuditTrace.log file
Project
creation
Project
creation
Figure 13-5. Example DSAuditTrace.log file KM5021.0
Notes:
This graphic shows part of a sample DSQuditTrace.log file. The first row is the heading. It
identifies the Engine and provides information about its system.
Following the header are project creation messages. Two sets of messages are
high-lighted. The first provides information about the creation of the DataStage project
named ANALYZERPROJECT, which is a project created during Information Server
installation for use by Information Analyzer. The second set of high-lighted messages
provides information about the creation of a project named DSProject.
V7.0
Student Notebook
Uempty
Client audit tracing

Covers the main actions the DataStage client performs:
These include login, logout, import, export, and compilation
The client tracing information is output to existing
dstage_wrapper_trace.log files used by the DataStage clients
<USER_HOME>/ds_logs/dstage_wrapper_trace_<n>.log
Example message format:
2012-10-13 10:38:07,933 INFO
com.ibm.datastage.Auditor.log(Auditor.java:100) - [AUDIT
EVENT] <message>
Figure 13-6. Client audit tracing KM5021.0
Notes:
Client audit tracing covers the main actions the DataStage client performs, including login,
logout, import, export, and job compilation. The trace information goes into the existing
dstage_wrapper_trace.log files used by the DataStage clients.
To locate the directory containing the files, start at the Windows home directory of
DataStage user. For example, if the user is student, on the Client image, in Windows
Explorer, open the Documents and Settings>student>ds_logs folder. The folder
contains a number of log files.

Student Notebook
Example client trace log file
Figure 13-7. Example client trace log file KM5021.0
Notes:
Shown in this graphic is an example of one of the client trace files. This one is named
dstage_wrapper_trace_20.log. The user on this system in this example is student. The
path to this log file is C:\Documents and
Settings\student\ds_logs\dstage_wrapper_trace_20.log.
From the log file shown here, we can determine that several jobs were opened and
compiled and then closed.
V7.0
Student Notebook
Uempty
ISA Lite
Figure 13-8. ISA Lite KM5021.0
Notes:

Student Notebook
ISA Lite
Provides the ability to gather problem data and diagnose
issues across the Information Server suite
Recommended method of gathering customer problem data
The ISA Lite tool will retrieve information from the DataStage
Server audit trace file:
<IS_HOME>/Server/DSEngine/DSAuditTrace.log
The ISA Lite tool will also retrieve information from any report
archive files generated:
<USER_HOME>\Application Data\IBM\Information
Server\DataStage Client\<client-tag>\Error Reports\*.zip
The ISA Lite tool also incorporates the DataStage SyncProject
tool to aid in determining and resolving DataStage project
inconsistencies
Figure 13-9. ISA Lite KM5021.0
Notes:
ISA Lite provides the ability to gather problem data and diagnose issues across the
Information Server suite. ISA retrieves information from a variety of sources including the
audit trace files.
ISA Lite can also be helpful during the installation and testing of Information Server. You
can use it to check whether your system has the prerequisites necessary for the
installation. You can use it to verify an installation after it has been performed.
ISA Lite is also used when submitting problems to the IBM Information Server Support
staff. The data generated from ISA Lite can be sent to IBM Support to aid them in
diagnosing and solving the problem.
V7.0
Student Notebook
Uempty
ISA Lite Sync Project functionality

The existing architecture of DataStage involves the inclusion of
two repositories:
The XMETA Repository for holding the design time assets
The DSEngine Repository for holding the associated runtime assets
The problem with this design is that problems can arise
whereby the data held in the two Repositories goes out of sync
ISALite will determine the state of projects contained within the
DSEngine Repository
Provides the ability to restore projects that are found to be missing,
incomplete or that contain inconsistencies
Figure 13-10. ISA Lite Sync Project functionality KM5021.0
Notes:
ISA Lite also has functionality for restoring corrupt DataStage projects. The existing
architecture of DataStage involves the inclusion of two repositories, XMETA and the
DSEngine repository. Sometimes these repositories can get out of sync. ISALite can be
used to test the repositories and, if necessary, to restore them.

Student Notebook
Example Sync Project report output

IS Host = MK-ASHH
IS Port = 9080
IS User = admin
DS Host = MK-ASHH
DS Port = 3158
DataStage Project: dstage3

--------------------------
0 Issues Found.

--------------------------
ISSUE: Unable to lock project.

--------------------------
0 Issues Found.
DataStage Project = dstage9

---------------------------
2 Issues Found.
ISSUE: DS Engine Job testJob is missing.

ISSUE: DS Engine Job testJob2 category incorrectCategory should be correctCategory
Overall Summary
---------------
2 Issues found.
Figure 13-11. Example Sync Project report output KM5021.0
Notes:
This graphic shows an example of sync project report generated in ISA Lite. In this
example, several DataStage projects were examined by ISA Lite for problems.
Two issues were found in the DataStage project named dstage9. In the first case, the
XMETA repository contains a DataStage job named testjob. But the corresponding
DSEngine repository project is missing that job. In the second case, there is a disparity in
how a job property is named in the two repositories.
V7.0
Student Notebook
Uempty
ISA Lite tool

Located under the IS home directory where Information Server
is installed
/opt/IBM/InformationServer/ISALite
Installed and configured as part of the IS installation
Documentation is in the /ISALite/doc folder
Runs in GUI or command-line mode
Log in as an system administrator (root)
Invoke: ./runISALite.sh from the /ISALite directory
Figure 13-12. ISA Lite tool KM5021.0
Notes:
ISA Lite is opened from the command line. On the Server, open a terminal. Execute the
command to change to the /IBM/InformationServer/ISALite directory, for example: cd
/opt/IBM/InformationServer/ISALite. Then run ISA Lite by executing the following
command: ./runISALite.sh.
You need root authority to use ISA Lite.

Student Notebook
ISA Lite window
Select
data
collection
option
Path to
collection
file
Start collecting
data
Figure 13-13. ISA Lite window KM5021.0
Notes:
The ISA Lite opening window lists problems it can collect information about. You first select
the type of problem. In this example, a Basic System Summary report will be generated.
Next you specify the file name for the collected data. The generated file will consist of a
compressed .zip file.
When the tool runs it will prompt you for additional information as needed, such as the
Information Server home directory. You will also have the option of transferring the
information to IBM Support.
V7.0
Student Notebook
Uempty
Sample ISA System Summary report
Figure 13-14. Sample ISA System Summary report KM5021.0
Notes:
The ISA Lite results zip file contains a summary report file, SYSTEM-SUMMARY.html file.
An example of this file is shown here. The report consists of a table of contents with links to
different sections of information.

Student Notebook
Checkpoint
1. What information does the DSAuditTrace.log files contain?
2. What tool is the recommended method of gathering customer
problem data?
Notes:
V7.0
Student Notebook
Uempty
Exercises Unit 13
View audit trace files on the Server
View audit trace files on the Client
Generate an ISA Lite Basic System
summary Report
Generate an ISA Lite PX Engine
Configuration Test Report
Notes:

Student Notebook
Unit summary
Notes:
V7.0.1
backpg
Back page

Front Cover: Ibm Infosphere Information Server Administration V9.1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Front Cover: Ibm Infosphere Information Server Administration V9.1

Uploaded by

Copyright:

Available Formats

V7.0.

IBM InfoSphere Information

(Course code KM502)

December 2012 edition

Copyright International Business Machines Corporation 2007, 2012.

Unit 0. IBM InfoSphere Information Server Administration v9.1 . . . . . . . . . . . . . . . 0-1

Unit 1. Technical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

Copyright IBM Corp. 2007, 2012 Contents iii

Unit 2. Overview of Clients used for Administration . . . . . . . . . . . . . . . . . . . . . . . . 2-1

iv Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012

TOC Logging on to the Information Server Console . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-48

Unit 3. Authentication and Suite Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

Copyright IBM Corp. 2007, 2012 Contents v

Unit 4. Stopping and Starting Information Server . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1

Unit 5. Session Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1

vi Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012

TOC Clearing logs in Director . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31

Unit 6. Engine Tier Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1

Copyright IBM Corp. 2007, 2012 Contents vii

Editing a configuration file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-44

Unit 7. Engine Tier Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1

TOC Operational metadata option in Administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-30

Unit 8. Engine Tier Database Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1

Copyright IBM Corp. 2007, 2012 Contents ix

DB2 DataStage configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-33

Unit 9. Engine Tier Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1

x Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012

TOC Stages folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-41

Unit 10. Metadata Asset Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1

Copyright IBM Corp. 2007, 2012 Contents xi

Unit 11. Information Services Console Configuration . . . . . . . . . . . . . . . . . . . . . . 11-1

Unit 12. Installation and Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1

TOC Installation steps - 13 - DataStage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-21

Unit 13. Serviceability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-1

Copyright IBM Corp. 2007, 2012 Contents xiii

Example DSAuditTrace.log file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-6

pref Course description

Copyright IBM Corp. 2007, 2012 Course description xv

Configure and manage IS Engine components including

pref Unit 13: Serviceability

Copyright IBM Corp. 2007, 2012 Course description xvii

Copyright IBM Corp. 2007, 2012 Agenda xix

Unit 12: Installation and Deployment

xx Information Server Administration v9.1 Copyright IBM Corp. 2007, 2012

Uempty Unit 0. IBM InfoSphere Information Server

What this unit is about

Figure 0-1. Course objectives KM5021.0

Course objectives, continued

Figure 0-2. Course objectives, continued KM5021.0

Figure 0-3. Agenda KM5021.0

Figure 0-4. Agenda KM5021.0

Figure 0-5. Introductions KM5021.0

Uempty Unit 1. Technical Overview

What this unit is about

What you should be able to do

How you will check your progress

Copyright IBM Corp. 2007, 2012 Unit 1. Technical Overview 1-1

Copyright IBM Corporation 2007, 2012

Figure 1-1. Unit objectives KM5021.0

Information Server functional categories

IBM InfoSphere Information Server

Understand Cleanse Transform Deliver

Discover, model, and Standardize, merge, Combine and Deliver information