You are on page 1of 20

ABC

Cook Book

===========================================
===============================================================================

A REAL ABC STREAM STATUS IN BMSR AS VIEWED FROM THE ADMIN UI

PURPOSE
The objective of this book is to make it possible to use theABC framework in the
best possible way for PMDB. It is extremely import to clearly understand the
ABC frame work and the best practices associated with it. Most problems
encountered with the ABC framework are often due to lack of clear
understanding of the framework or improper usage of the framework. This
books aims to provide all the necessary details required to make the best use of
the ABC framework for PMDB - It is not an objective of this book to explain
all the features of ABC framework itself.

TERMINOLOGY YOU SHOULD BE AWARE OF


Here is some terminology that you should know about
ABC

A framework that enables to model and execute work flows.


It provides the ability to set parent child relationship
between tasks to be executed and group a set of related
tasks.
A set of related tasks realted through parent child
relationship. A child task can have zero, one, two, three or
more parent tasks.
A task in a stream

Stream

Step

ABC OF ABC
A for Audit, B for Balance, C for Control.
The only functionality of ABC that is used in PMDB is the C part the
Control Part. We will not spend any time discussing the Audit and Balance
functionality that is not used in PMDB.

THE THREE STEPS TO GET YOUR STREAMS WORKING WITH ABC


The following 3 steps is all you need to do to get your streams working with
ABC.
Import Definition: Imports the details of
Streams and associated Steps to the ABC
dictionary tables from the XML files
where it is defined.
Command : abcImportDef -xmlfile <filename>

Load Batch: Loads an instance of a Stream


for execution, i.e. loads from ABC Stream
definition tables to ABC run time tables.
Only Steps of those stream instances which
are loaded for execution gets executed.
Here batch refers to a Stream instance.
Command : abcLoadNRun -loadBatch <stream Id>
Command : abcLoadNRun -loadBatch -allStreams

Run Steps: The Steps in a Stream are


executed one after another depending on
the parent child relationship. If a parent
Step fails, the child Step will not get
executed. The Run Steps command
executes all the Steps in all loaded Stream
instances in run time tables which are
ready for execution.
Command: abcLoadNRun runSteps

Load Resource Details (Optional): This is


an optional step. This step should be
executed immediately after Import
definition. The resources loader helps to
attach specific resources to a Step to
which the Step needs exclusive access. If
exclusive access to a resource is not
available the Step will not be executed by
the framework.
Command : abcResourceLoader file <file name>

DEVELOPING YOUR OWN ABC STREAM


This book does not discuss the stream development details in detail. This is well
known and there are a number of examples in the PMDB Content Packs.
However this section does discuss some salient points you need to remember.
There are just two types of XML files you need to write and one of them is
optional. So it is really easy. The files are ---- (a) Stream definition XML file
and (b) Resource definition XML file.
What is present in a Stream definition XML?
The stream XML defines the details of the list of tasks that are part of the
stream. The details include the following
Job Stream Step Details:
This section has details on the individual steps
Dwid:
Business Name:
Catalog:
Executable Identifier:
Max exec time:
Max retries:

A unique identifier
An alias name.
Always set to BSMRExec
Set to BSMRLauncher or
BSMRAuditLauncher
Maximum execution time in minutes
Maximum number of retries on
failure

Job Stream Links:


This section has the details on parent/child relationship of the steps.
Step Identfier:
Parent Step Identifier:

Unique identifier of the child step


Unique identifier of the parent step

What is present in a Resource definition XML?


The resource definition XML has the details of resources attached to a
particular Step in a stream. The details include the following.

Resource DwId:
Resource Type:
Pool Count:
Score:

Unique identifier for the resource


The type of resource. Only type supported is
Table
Maximum number of Steps that can access a
resources conccurently
The start score, always set to 0. This is
internally used to avoid starvation for ABC Steps
with needs for exclusive resource access. Dont
worry about this, just set it to 0.

A SAMPLE STREAM
Given below is a sample ABC stream. This has been provided for illustrating
different Step status and how the stream behaves. In reality most of the job
Streams are much simpler than these, consisting often of a sequence of Steps
alone and some times a couple of parallel paths.
The main points that you should remember here are these
When a Step fails no futher child Steps will get executed. For
example, in the diagram, Steps M and P will not get executed because
Step J failed.
The child Steps get executed only after the parent Step has finished
successfully. For example, in the diagram, Step K will get executed only
after the Step H is successfully completed.
It is possible that some Stream paths can complete successfully even if
some other Stream paths fail. For example, the stream path that ends
with Step O will run to completion even if the stream path that ends with
step P fails to complete successfully.

STATE TRANSITION
This section builds up on the previous section and provides some additional
details related to the different States a Step can take and state transitions
allowed.
Infact there are two things here
1. Status:
This is based on the exit code for a Step. This indicates how the task
associated with a Step completed its execution.
2. State:
This is the transition states a Step goes through during its life time.

The picture below shows the different status a Step can move to during
the course of its execution. Some point to note are these

When a Stream is in Error state, loading a new instance of the


same stream using the load batch operation is not possible. Load
batch will succeed in all other cases including the Aborted state.

When a Step execeeds maximum execution time, the Step will


displayed in Red in the Admin UI, indicating that this is similar
to an Error state. However in this case the process is still running
and not killed by the framework.

STEP STATUS CHANGE

STREAM STATUS CHANGE

STEP STATE TRANSITION


Given below is the state changes of a Step during its life time. Once a Step
moves to finished State without Error Status or Max execution time
exceeded Status, the next Step will get executed.

STREAM STATE TRANSITION


Load batch will not succeed if an Active instance of a Stream is present. If the
previous instance of Stream is in Aborted or Finished state, load batch
operation will succeed.

UI INTERFACE TO ABC FRAME WORK


There is a very nice UI interface provided as part of the PMDB
Administration UI which allows to monitor the state of streams in a graphical
manner. The UI also allows editing certain Stream configuration parameters
like Maxretries and Maxexectime (discussed later). All the administration
functionality, not supported by UI, can be performed through CLI described in
the next section. When other administration functions are added to UI, this
cook book will get updated with those details as well.

MANAGING THE ABC FRAME WORK


This section provides you with all the details you will ever need to manage ABC
in the PMDB context.

I.

IMPORTING THE STREAM DEFINTIONS


Once you have completed the job of writing the stream definition XML file, you
are ready to import the definition to the ABC dictionary tables. This step is
perfomed typically at the time of content pack installation. So this is a one time
task. The command to execute to do this is below.

abcImportDef -xmlfile <filename>


Debugging:
Any errors or log messages generated during stream import will get logged in
to PMDB_HOME/log/importdefs.log

II.

LOADING THE STREAMS TO RUN TIME TABLES


Unlike the importing of definition, this is not an one time task. The import
definition task will place the stream definition in to certain dictionary tables
internal to ABC. However to execute a stream, these definitions should be
loaded to a set of ABC run time tables. This is a task that should be executed
periodically, using a scheduler, if you want the Steps in the Stream to get
executed periodically. The command load Stream definition to ABC run time
table are the ones below.
To load a specific stream:
abcLoadNRun -loadBatch streamId <Stream Id>
To load all the streams that are eligible: (Recommended option)
abcLoadNRun -loadBatch allStreams
Some points to Note:
A new instance of a stream cannot be loaded if an Active instance of a
stream is present in the run time tables.
The abcLoadNRun loadBatch allStreams command will load all the
streams that are eligible (i.e. those without an Active instance in run
time tables) to the run time tables. This would mean that it would not be
possible to control the loading of streams individually. Each stream will
be loaded for execution as soon as it finishes execution, depending on the
frequency at which the load batch command is invoked for all streams.
The command is a called loadBatch because it is loading a batch of Steps
which is nothing but a Stream.
Debugging:
The errors and log messages generated during the load batch operation is
logged in dw_abclauncher.log in PMDB HOME/log folder.

III.

RUNNING THE STEPS

The load batch operation took care of getting the Stream definitions to the run
time tables. However this itself does not ensure tha the Steps will get executed.

The Steps will get executed only when the ABC Run Steps command is invoked.
The command line is as below
abcLoadNRun runSteps
The run steps command will check all the streams in the run time tables,
scouting for steps that are in the Waiting state. It executes the eligble steps in
no specific order, but executes them all.
The one other point that needs to be remembered here is that, if there a Step
under execution which has requested exclusive access to a resource required by
another Step which eligible for running, the later Step will not get invoked by
the framework.
Debugging:
The errors and log messages generated during the load batch operation is
logged in dw_abclauncher.log in PMDB HOME/log folder.

IV.

IMPORTING RESOURCE DEFINITIONS


This is an optional step. It is possible to specify the list of resources to which a
step needs exclusive access. The resource definitions can be imported after the
stream which contains the Steps has been imported. Once the resource
definition for a Step is imported, any future RUN STEP command will take the
resource in to consideration. If resource required by a Step is already held by
another Step, i.e. if another Step with the same resource requirement is
running, then this particular Step will not get executed. It will continue to be
in the Waiting state.
The command to import a resource definition is as below
abcResourceLoader file <Resource file name>
Note: The Resource definition frame is very powerful. Though the only resource
type supported is Table, there is no check really performed to see if the
resource type is a Table indeed. It is actually just a namespace with a string
value. Using this feature it possible to achive two different objectives

1. Provide exclusive access to resources and help resolve locking issues with
critical resources. If the pool count, explained earlier, then access to the
resource becomes exclusive.
2. Control the number of processes launched. This feature is not currently
used but it is possible. For example, if I set a pool count of 5 for a
resource named Summary and attach the resource to all the trend_sum
jobs, then only a maximum of 5 instances of 5 trend_sum can run at any
point of time. This way the number of processes running can be
controlled.

ADDITIONAL COMMAND LINE TOOLS


This section provides details of certain command line tools which you will
never use under ideal conditions. However conditions are always not very
ideal and you may need these. This section also explains certain tools that can
be used to inspect the ABC database. So please read this section carefully.

I.

MANAGING THE STEPS


The command that allows you manage the steps is abcStatusSetter. The name
however is a misnomer. It is actually a State setter and not a Status setter.
This command allows you to change the state of a Step as allowed by the
supported state transitions discussed in an earlier section. The state setting can
only happen in the forward direction and not in the reverse direction.
The state of the Steps is normally managed by the framework itself and you
will never have to use the status setter. It becomes useful only in unusual
circumstances when you want to manage the state of a Step yourself.
The usage is as below:
abcStatusSetter -processId -running | -waiting | -success | -warning
| -error | -final [-info "status info"]

options:
errorupdateDWABCDBwitherrorstatusforthisjob
finalupdateDWABCDBwithworststatusofallaudit
metricsforthisjob
helpprintthishelpmessage

info<"statusinfo">Optionalstatusinfo
processId<"processId">StreamstepprocessId
runningupdateDWABCDBwithrunningstatusforthisjob
successupdateDWABCDBwithsuccessstatusforthisjob
waitingupdateDWABCDBwithwaitingstatusforthisjob
warningupdateDWABCDBwithwarningstatusforthisjob

Another command that is used to manage the Steps is abcRetryStep


command. The abcRetryStep command will have to be used to get out of the
situation if a Step stays in Error state even after the maximum re-tries are
exhausted. The syntax of the command is as below
abcRetryStep -processId <processId>
The value of processId can be got from the UI when you mouse over the Step
you are interested in.
Debugging:
The errors and log messages generated during the load batch operation is
logged in dw_abclauncher.log in PMDB HOME/log folder.

II.

MANAGING THE STREAMS


The command that allows you to manage the Stream is abcBatchControl. The
command is called batch control because a Stream is nothing but a batch of
steps. The abcBatchControl command allows you to change the status of a
Stream, as per allowed state transitions. This command will most often be used
to manually Abort a stream, so that a new instance of the Stream will get
loaded during the next load batch operation. The syntax of the command is as
below.
abcBatchControl <options> <streamId>
<streamId>:theidofthestreamonwhichtooperate.

options:
abortabortthespecifiedstream(killallcurrentlyrunning
constituantprocesses)
alloperateonallloadedstreams
helpprintthishelpmessage
resumeresumethespecifiedstream
streamId<arg>performtheactiononthenamedstream

suspendsuspendthespecifiedstream(allowanyrunningprocessto
complete)

Note: Please do not use the Suspend and Resume functionality, this has not been
tested .
Debugging:
The errors and log messages generated during the load batch operation is
logged in dw_abclauncher.log in PMDB HOME/log folder.

INTERPRETING THE STEP STATUS


This section briefly describes how to interpret the status of a Step as you seen
in the stream monitoring UI. You typically will not be worried about the
Stream status and would be focusing only on the Step status. Here are the
possible status values and details on how to interpret it.
Warning : If a Step is in Warning status, it means that the flow of Stream
will continue, however not everything is working fine. It is possible that the
child Step of parent step that is in Warning status can be in Success status.
CAUTION: It is very important not to ignore the warnings. A warning
message could imply that there was a serious problem with data processing by
the process executed by a step.
Error If a Step is in Error status, it means that the task associated with
the Step exited with an Error status. It also means that its child Step will not
get executed after maxretries. A Step that is in Error status after maxretries
will leave the Stream Active but blocked. This would need a manual
intervention to proceeed further.
Running This is a State and not a Status.The UI is actually using a
combination of State and Status to give the user the right picture. A Step in a
Running State will complete with one of the supporte Status values for a Step.
Success A Green success state indicates that everything went as
planned and there are not issues with execution of the Step. The child Step, if
any, will get executed on the next run of RUN STEPS.

GETTING OUT OF COMMON ERROR CONDITIONS


There are two common error conditions that occur with ABC
1. A Step failed to execute after maximum re-tries
If a Step has failed to execute after maximum re-tries, you can try the
following
a. If you think the Step will succeed on re-try, use the abcRetryStep
command to retry.
b. If you are not sure what went wrong, please follow the details in
the next section and take appropriate action. Use the Step and
Stream control commands to get the Stream back to execution
status after correcting the problem
2. A Step failed to complete after maximum execution time
a. If you think the command is genuinely taking more than the
allotted time to complete, wait for the process to complete and then
use the Stream/Step control commands to get the Stream to
execution state.
b. If you think the command has hung, please follow the details in the
next section to find out what went wrong and take appropriate
action.

FINDING OUT WHAT WENT WRONG


ABC is the engine that runs most of the data warehouse tasks in PMDB.
Knowing how the ABC streams are performing is hence the most important
task in monitoring the status of the data warehouse operations performed in
PMDB. ABC provides multiple mechanisms through which the Stream status
can be monitored

UI
This would be the first and most popular means of monitoring ABC
streams. The internal monitoring UI for ABC Streams available in the
PMDB Admin UI displays the status of the last run of a Stream
graphically (similar to one on sample stream in a section above). The UI
currently does not have the capability to display historical stream status
(Status information for streams is preserved for 30 days in the ABC run

time tables), however this will added going forward. The Stream
monitoring UI provides the capability navigate and view the Stream of
choice.
The second mechanism the UI provides to know the Stream status is
Alerts. ABC has an execution_log which stores the status of all the Steps
executed. The Alert windows picks up and displays any Fatal errors, so
that its immediately brought to the attention of PMDB administrator.

Log File
There are two log files of interest (a) importdefs.log and
(b) dw_abclauncher.log, both found in the PMDB_HOME/log folder. These
have been already discussed in earlier sections.
Another import log mechanism is the execution_log table. Here the exit
log associated with every Step is logged and the log messages are tied to
the unique identifier of the Step in run table. Currently there is no
mechanism to display these logs in the UI, however this will be added in
this release itself.

Examining the AB C DB tables.


There is this table called execution_log, mentioned in previous section,
which provides the exit status log for each Step executed. An experienced
user can also get all the details he needs related to the Stream execution
status by looking at the relevant dictionary and run time tables. The
discussion of the ABC dictionary table schema is no in the scope of this
book.

In future - Command line utilities


A suite of command line utilities which look up the ABC dictionary
tables and dump useful information will be developed going forward,
which will provide all the information required to understand what is
going on . There will also be additions to the management CLI. This
section will be updated with those details once they are ready.

FOLLOW RECOMMENDATIONS
As in any frame work, not using the ABC frame work correctly can result in a
lot of frustration and undesirable funcationliaty. Here are a set of
recommendations, if followed, will enable you to use ABC in the PMDB context
in the best possible way. This section provides specific recommendation on
certain PMDB processes and also some general guidelines on how to use ABC
effectively.
There are two parameters that you should know exactly how to handle. These
directly impact the ABC Stream execution functionality.
1. Maxexectime: This is the maximum execution time for a
command in minutes. Setting a very high value here will result in
delay in identifying hanged processes. Setting a low value here will
result in the Step getting marked as Error.
2. Maxretries: This parameter instructs ABC on the number of times
to retry, until successful execution. A Step will be re-tried only if it
has failed in the previous execution. The Step will be re-tried
during the next time the RUN STEPS command runs, as scheduled.
The Maxretries option should be set to a positive value only if a
retry could result in successful execution. This option is most
commonly used to over resource unavailability issues. The value of
this should be set carefully after careful examination of the use
case. This can be edited through the PMDB Administration UI.
Given below is some specific recommendations that you SHOULD follow. The
best way to arrive at the value of configuration parameters is to understand
the use case, test and arrive at an optimum value. This should be done by the
Stream developer. If this is not possible, the next best thing to do will be to
follow the recommendations given below.

TREND SUM:
Re-try recommendation:

Does re-try make sense? Mostly no. There never is a scenario where a re-try
will make an already errored out trend sum task to succeed. The scenario of
trend sum succeeding on retry due to system resource availability is rare. The
IQ concurrency issues with trend sum should be handled through the ABC
Resource Loader feature. Two trend sums will typically never run in parallel
for a set of destination tables. Since only key_id based summarization is
supported we never insert in the dimension tables.
Recommended value of retry: 2
Max execution time recommendation:
This depends on two thigns 1) data in the table and 2) type of summarization
(for example, forecast summarization takes long time). There is no single
recommendation possible here. If it is not possible to arrive at an appropriate
value, it is recommended that you put a value of 4 hours for all levels of
summarization.
Recommended value of maxexec time: 4 x 60 = 240
Resource locking recommendations:
The only available resource as of now is table type. This is typically used to
avoid Sybase multi writer problem. However with trendsum in BSMR there is
typically no scenario where multi write problem exists. No need to give any
resource locking for trend sum at all.
Recommendation: No resource locking required

LOADER:
Re-try recommendation:
When a number of loader processes for different facts but with common
dimensions are launched concurrently there could be locking issues. This can be
resovled to an extent using the re-try option. Re-try forsystem resources is
another use case. The number of re-tries to be configured will be a function of
number of loader jobs planned to be launched concurrently which share
common dimension tables. Retry could be required for dimenstion tables as
well, for snowflake and conformed dimensions. An important point to
remember here is that Loader has its own retry logic as well. Considering all
the above the recommended re-try value is as below.

Recommended value of re-tries = 2


Max execution time recommendation:
This to a large extent depends on the data. It is safer to set to about 3 hours.
Recommended value = 3 x 60 = 180
Resource locking recommendations:
Loader tries to lock the dimension table for a minimum amount of time. It also
has an internal logic of retry Try up to 12 times at 20 sec interval . But
despite this when loader processes are run concurrently there is a chance that
all required resources with exclusive access is not available. This should be
controlled intelligently using the ABC resource locking mechanism.
Recommendation: Analyze the use case and

AUDIT STEPS:
Re-try recommendation:
The re-try makes sense for audit steps for resource unavailability only.
Recommended value = 3
Max execution time recommendation:
This is an audit step that only involves a JMX lookup. This should complete
really fast.
Recommended value = 5
Resource locking recommendations:
There are no resource locking requirements for the audit steps.
Recommendation = No resource locking required.
NOTE: An important point to note is that, there is no need to run the audit
steps in sequence. The auditing is happening for an event in the past and all the
audit steps can be executed in parallel. This is recommended, however not
mandatory.

FAQ
1. Is there a way to know if max re-tries are exceeded for a Step in error?
Currently No. The user interface will be enhanced to support this going
forward. If you are interested you can query the ABC dictionary tables
and figure this out.
2. More to be added..

A BIT OF ARCHITECTURE
This is at the end, because you really dont need to read this. Read on only if
you are curious.
This section yet to be filled in

You might also like