You are on page 1of 27

DB2 CHECKPOINT/RESTART U SING

SMART/RESTART
AND

SMART/CAF
F ROM R ELATIONAL A RCHITECTS I NTERNATIONAL
The replacement for Quickstart.

Confidential, unpublished property of CIGNA. Do not duplicate or distribute. Use and distribution limited solely to authorized personnel. (c) Copyright 2001 by CIGNA.

Updated on 7/20/2012 11:56:00 AM

TABLE OF CONTENTS
SMART/RESTART and SMART/CAF..................................................................................5 BASIC CONCEPTS of CHECKPOINT/RESTART....................................................................6 The Problem..........................................................................................................................6 The Goal ...............................................................................................................................6 The Solution...........................................................................................................................6 SMART/RESTART An Overview ....................................................................................7 SMART/CAF An Overview ............................................................................................7 STEPS TO IMPLEMENT SMART/RESTART & SMART/CAF ...................................................8 CODING YOUR APPLICATION FOR RESTARTABILITY:......................................................11 PROGRAM PREPARATION..............................................................................................12 Endevor...............................................................................................................................12 Precompile...........................................................................................................................13 Link-edit..............................................................................................................................14 BIND SMART/CAF COLLECTION INTO PLAN....................................................................14 BATCH JOB EXECUTION................................................................................................15 JCL Changes.........................................................................................................................15 Run-time Parameters............................................................................................................17 Production Migration ............................................................................................................18 CONVERTING FROM QUICKSTART.................................................................................19 INTEROPERATION WITH OTHER PRODUCTS:.................................................................19 XPEDITOR...................................................................................................................20 COBOL INFORMATION..................................................................................................20 JOB RUN INFORMATION...............................................................................................20 COMMON PROBLEMS AND ERROR MESSAGES ...............................................................21 APPENDIX A - UNDERSTANDING THE CALL ATTACH FACILITIES (CAF)............................23 Link-edit..............................................................................................................................24 APPENDIX B Sample Execution JCL.............................................................................27

SMART/RESTART

AND

SMART/CAF

Smart/RESTART and Smart/CAF are products from Relational Architects International. These products may be incorporated into batch jobs which access DB2 data. They enable a batch job which has abended to be restarted from the last commit point, rather than having to restart it from the beginning of the step. This saves time and money, and enables an application to have a smaller batch window, providing for higher availability. These products replace Quickstart, which is being sunset by CIGNA, with an effective sunset date of June 30, 2002. Smart/RESTART should be considered for use in long-running batch DB2 jobs. This document gives the details of implementing Smart/RESTART, the various options available, how to use it with other products (especially Xpeditor), how to convert a Quickstart application, and a more in-depth discussion of how Smart/RESTART works. For more information please refer to the Smart/RESTART manuals which reside on the CIGNA Intranet, at http://enterpriseproducts.sys.cigna.com/Tps_Library_Startup.htm.

BASIC CONCEPTS
The Problem

OF

CHECKPOINT/RESTART

When a long-running DB2 batch job abends, it can take a significant amount of time to prepare it for restart. This is due to need to restart the job from the beginning. This effects the batch window dramatically, in addition to requiring quite a bit of people and cpu time and costs. For example, consider a simple case: PROGRAM1 reads an input file, updates three DB2 tables, and writes two output files. It does not issue DB2 COMMITs. It abends after the 3500th record on input, having updated the DB2 tables and written to output. All of those updates are backed out by DB2 immediately. This may take a long time. Then the job must be restarted from the beginning, and all of that work previously done is lost. In addition, the cpu costs for that first job must be paid for and the batch jobs may run into the online day. Another example is similiar to the one in the previous paragraph; however, the program issues DB2 COMMITs after each LUW. To restart in this scenario, the DB2 tables need to be restored before the job is restarted from the beginning. This elongates the down time and the batch window even further.

The Goal
To make our batch jobs run as efficiently as possible, we should implement them with restart capability. This capability would allow the batch job to restart at the point where it performed the last commit rather than at the beginning.

The Solution
In the past, there has been three possible solutions to code restartability into the programs, to use a homegrown utility or to implement Quickstart. However, coding requires time, and Quickstart is a sunset product at CIGNA. A new solution resides in Smart/RESTART. This product provides Checkpoint/Restart capability to batch COBOL DB2 programs. It effectively replaces Quickstart, and is easier to use.

SMART/RESTART AN OVERVIEW
Smart/RESTART takes an execution-time snapshot of your application, so theres no need for complicated restart logic or cumbersome operational procedures. It restores this picture of your application at restart time, so that the process may continue where it left off. Smart/RESTART automatically: Saves working storage at checkpoint time and restores it at restart time Repositions sequentially accessed files during restart Guarantees that working storage contents and sequential file position remain in sync with the unit-of-work managed by DB2, VSAM, IMS and/or MQSeries Informs the application whether an initial run or restart run is in progress Smart/RESTART receives control implicitly, whenever an EXEC SQL statement is issued or a read of a driver file is done. You can tell Smart/RESTART via run-time parameters to trigger a checkpoint based on either a COMMIT or the records in the driver file. When the SQL statement is a COMMIT, Smart/RESTART determines if a physical checkpoint should be taken. If so, Smart/RESTART writes the contents of working storage to a checkpoint file. It also writes the positions of the files which it is monitoring. Finally, it issues the DB2 COMMIT. If you have told Smart/RESTART to trigger a checkpoint based on what is read from a specific flat file, then Smart/RESTART will intercept each read, and determine if it is time to take a checkpoint. You tell it, via a run-time parameter, that you would like a checkpoint done based on the record count from the file, or based on a change in the values in certain columns in the record (at a control break). Smart/RESTART writes the contents of working storage to a checkpoint file. It also writes the positions of the files which it is monitoring. Finally, it issues the DB2 COMMIT. Upon restart, Smart/RESTART restores working storage and both input and output sequential file position to their state at the last checkpoint time. This state would be consistent with the state of the DB2 resources. Smart/RESTART then returns control to your application, telling it via the special register RETURN-CODE that it is a restart.

SMART/CAF AN OVERVIEW
Smart/CAF is a product which provides the call attach cability to the application process. It does the same thing as the following modules: DSNHLI, DSNELI, and DSNALI. It receives the EXEC SQL calls, and passes them to DB2. In addtion, it determines when a COMMIT has been issued, and passes control to Smart/RESTART. Smart/RESTART then determines if it is time to take a physical checkpoint. Smart/CAF was originally developed to produce job step completion codes which really showed the results of the batch job (which are hidden by IKJEFTO1). It also allows SMF records to now show the batch program name instead of IKJEFT01 It presents more meaningful messages, and is required to use SQL Monitor.

STEPS TO IMPLEMENT SMART/RESTART & SMART/CAF


Smart/RESTART is relatively easy to implement. Here is an overview of the steps required: 1. 2. 3. 4. 5. Code programs for restartability Program Preparation: Precompile programs with the Smart/RESTART Precompiler Program Preparation: Link the programs with the Smart/CAF call attach facility Bind the Smart/RESTART collection into your plan Create execution JCL

Note that the Program Preparation steps 2 and 3 may be implemented via Endevor.

Please use the following chart to help you with the implementation.

Implementation Checklist Smart/Restart and Smart/CAF

TASK
Code Program Properly

DETAIL
Code program to work properly in a restart situation: 1. Process in Logical-units-of-work and determine end of LUW (COMMIT or read of an input driver file). 2. Design processing so that restart begins on an LUW 3. Only initialize working storage in Procedure Division when it is an initial run of the program 4. Code DB2 cursors with host variables in the WHERE clause. Use of the WITH HOLD option is allowed. 5. Code main program to contain the working storage to be saved for restart (recommended but not required).

MORE INFO:
Page 11

Get Endevor Processor Groups Set Up Bind the Plan

The Endevor processor groups will run your program through the Smart/RESTART precompiler and/or linkedit in Smart/CAF, in addition to the regular functions. Add the Smart/RESTART package into the plan, by issuing the following DB2 command:
BIND PLAN(planname) PKLIST(SMART_RESTART.*,.........)

Page 12

Page 14

Include the application packages in the PKLIST also. Create Execution JCL Copy TTAP.TS2.DB2.SYSJCL(SMARTRES) into your library, and change the values as listed in the comments at the beginning. Copy TTAP.TS2.DB2.SYSIN(SMARTRES) into your library, and change the values as listed in the comments at the beginning. Refer to page 17 for more info on the parameter values. Change two parms in RAINPUT, to test under Xpeditor:
ESPIE(OFF) and IEXIT(OFF)

Page 15

Create Input Parms Member

Page 17

Test program with Xpeditor (optional)

Page 20

In Xpeditor, you may need to issue SET


STATIC(ON)

Test batch job

Run the JCL created above. Be sure the RAINPUT parms changed for Xpeditor are changed back. Modify the JCL to meet production standards.

Page 15

Implement into

Page 18

production

Restart instructions will state to restart the job on the abending job step. Pay special attention to how your application manages dataset disposition. Common errors are listed in this document. Please refer to it first. The products error messages are documented in the Messages manuals, located at
http://enterpriseproducts.sys.cigna.com/Tps_Library_Startup.htm

Problems and Errors

Page 21

Converting from Quickstart

To convert a job currently using Quickstart, to use Smart/RESTART and Smart/CAF, please read the chapter on this.

Page 19

10

CODING YOUR APPLICATION FOR RESTARTABILITY:


Your program(s) must be coded properly, to allow for restarting at a checkpoint. To be able to use any checkpoint/restart process, you must be sure that the following are in place in your programs: 1. Divide the application processing into logical-units-of-work, with COMMITs on an LUW boundary. A more detailed descripton of a logical-unit-of-work is below. 2. Design the application mainline processing so that when your application restarts, it will resume on a unit-of-work boundary. 3. Do not initialize working storage in the Procedure Division, unless you check for a restart condition first, and only initialize on a first time run. Also, be aware of any processing which may need to be done to working storage fields on a restart. 4. Code the DB2 cursors with a predicate using host variables. These host variables will be saved at checkpoint time. Code the predicates to bring back the appropriate answer set on restart. Use of the WITH HOLD option is allowed. Always process through the cursors sequentially. 5. If subprograms are involved, then it is recommended that the main module contain the main line logic, the storage to save, and the commit/rollback requests. This is recommended but not required. A logical breaking point, called a logical-unit-of-work or LUW, must be available in the program. Checkpoints must be taken when an LUW completes, when the data would be in a state of consistency. This philosophy is inherent in control-break logic. A simple example would be: read the transaction from the flat file, update the DB2 tables, write out a report line, commit, and read next transaction. With Smart/RESTART, you can take a checkpoint either at DB2 COMMIT or upon read of a driver file. These would indicate the end of the LUW. You tell Smart/RESTART which method you are using at run-time. For more information on programs coded around a logical unit of work, please read Chapter 2 of the Smart/RESTART User Guide, call Developing Restartable Applications. To determine if a restart is underway, your program may test RETURN-CODE at the beginning of the Procedure Division. A value of 2001 indicates that this is a restart, and that working storage has already been restored from the checkpoint dataset (to the values which were in place as of the last checkpoint before the abend in the previous run). A value of 0 indicates an initial run of the jobstep. You may test for 0 and then initialize fields in working storage. A value other than 0 or 2001 means the call to Smart/RESTART failed.

11

PROGRAM PREPARATION
This section reviews the way to prepare your program to run with Smart/RESTART and Smart/CAF. It consists of running the program through a precompiler, and linking it with a different Call Attach Facility module. Programs which contain working storage to be saved for restart must be precompiled with the Smart/RESTART precompiler. Programs which issue DB2 SQL must be linked with the Smart/CAF module. Endevor process groups make the details of the precompile and link steps fairly transparent to you. There are standard process groups which can be used to set up ones for your application.
Please note: The use of the Smart/RESTART API (EXEC SRS calls) is not supported at CIGNA. The use of Object Transparency mode is also not supported at CIGNA.

Endevor
The following sections entitled Precompile and Link-edit will educate you about how this process works. However, because they are implemented via Endevor, there is no effort on your part to implement them. Endevor process groups contain the precompile, compile and link steps described below. The use of Endevor will save you from having to worry about program preparation JCL and other parameters. To use Endevor, simply inform your Endevor support person of the following information for each program: 1. Whether the program contains working storage to save 2. Whether the program contains DB2 SQL statements The Endevor support team can determine which processor groups to use based on this information. You may then proceed on to developing your run JCL. There are special considerations for the following: 1. If you do not want to save all of the working storage in the program, then you will need to specify what storage to save, via a parameter to the precompiler (passed via the RAINPUT dd member) refer to the Precompile section for more information 2. If you do not want to use Smart/CAF, then you need to link in the default DB2 call attach facility (refer to Appendix A for more information) 3. If the subprogram issues DB2 SQL, is called by multiple programs (some Smart/RESTART-enabled and some not), then you need to pick up the call attach facility at run time rather than at link edit (refer to Appendix A for more information). The Endevor support team is prepared to handle these special considerations. The following show the steps required to prepare a program for use with Smart/RESTART.

12

Precompile
The Smart/Precompiler is executed after the DB2 Precompiler and before the COBOL compile step. The Smart/Precompiler adds the fields and code needed to completely automate the restart process. Smart/RESTART provides three ways to implement Smart/RESTART; however, only one is supported at CIGNA. This method is called Precompiler Mode, and is implemented by precompiling your program with the Smart/RESTART precompiler. The precompile step only needs to be done for programs whose working storage is to be saved for use upon restart. The program with the DB2 calls in it does not need to be precompiled, unless it has working storage to save. In the majority of cases, you only want to precompile the main program. Precompile subprograms only if they have working storage to save. They can only be called by Smart/RESTART-enabled main programs. Note that a commit can be done in any module, but what he does is save the working storage from each module active at the time of the checkpoint. On restart, he will restore each Smart/Restartenabled subprograms storage the first time the module is called. In order to restart successfully, these subprograms must be called in the same order as they were called in the previous run. Precompile JCL is part of the Endevor process groups. It looks like this: //SRESTART EXEC PGM=SRSPC,PARM=DBMS(DB2),CAF(SMARTCAF) //STEPLIB DD DSN=TTAP.TS2.SRESTART.DCALOAD,DISP=SHR //SRSIN DD DSN=&DSNHOUT,DISP=(OLD,DELETE) //SRSOUT DD DSN=&SRSOUT,DISP=(NEW,PASS) //SRSLIB DD DSN=TTAP.TS2.SRESTART.DCACNTL,DISP=SHR //SRSPRINT DD SYSOUT=* //RAINPUT DD DSN=TTAP.END.TPROD.SYSIN(SRSPPT),DISP=SHR Precompiler parameters are put into a parmlib member, pointed to by the RAINPUT dd statement. These are the default parameters provided in REOPEN(NO) IO_CONVERT(NONE) OPTIONS TRACE(NO) PRECOMPILE(YES) STORAGE(ALL) the standard SRSCTL member: MAR(8,72) HOST(COB2) FLAG(I) SOURCE DBMS(DB2) CAF(SMARTCAF)

The parameters not described above are documented in the Smart/RESTART Users Guide. The use of COB2 is not a problem, so dont worry about it. Both COB2 and COB370 work with LE/370. The default parameters are fine for most applications. However, they may be overridden, if desired. A standard member for the system may also be set up to enable this for all programs in the application. The precompiler parameters which may of interest to change are (defaults are in bold):

13

CAF(SMARTCAF|TSO|IMS|OTHER) specifies the call attach facility to use to connect to DB2 and execute the SQL statements. Smart/CAF is the default. Refer to Appendix A for more information. STORAGE(ALL|dataname1,dataname2,) identifies the names of data items which should be saved at checkpoint time and restored at restart time. If ALL is specified, then all storage areas defined with the source module will be saved. ALL is the default, and this will be fine for most programs. STORAGE_RANGE(NONE|ALL|dataname1,dataname2) allows specification of a range of data items which should be saved and restored. Two values are specified, consisting of a starting address and ending address. All data between and inclusive of those data items is saved. If ALL is specified, then all of the storage areas in the module are saved; if NONE is specified, this identifies the special case when no storage needs to be saved or restored. ALL is the default, and this will be fine for most programs. A note of warning if you specify STORAGE(datanames) or STORAGE_RANGE(datanames), future modifications to the program will have to be aware of where to define their working storage, so that it will be saved for restart. The Smart/RESTART default is to save ALL program working-storage. However, if you wish to have Smart/RESTART save only that portion of working-storage necessary for a successful restart, then you can add an ENDEVOR element type SRSCTL card containing the SRSPC (Smart/RESTART Precompiler) override parm STORAGE_RANGE(field1, field2). This parm will enable you to tell Smart/RESTART to save only those working-storage fields found between the specified field names provided by the override parm.

Link-edit
For programs with DB2 SQL statements, a module needs to be linked into the program. This module is called the Call Attach Facility (CAF), and all DB2 programs require one. The EXEC SQL statements in a program are translated by the DB2 precompiler into calls to DSNHLI, which is an entry point name in the call attach facility module. The Endevor process groups with Smart/RESTART are set up to statically link the Smart/CAF into the modules which access DB2. For more information on the call attach facility options (Smart/CAF and DB2 CAF), and on the option of dynamically linking in the call attach facility, please refer to Appendix A.

BIND SMART/CAF COLLECTION INTO PLAN


Bind the Smart/CAF collection into the DB2 plan. This is done once per plan, in each DB2 subsystem, and all of those who do a plan bind must know about this. Include the application packages in the package list in the Bind command (PKLIST). The syntax is: BIND PLAN(planname) PKLIST(xxxx.*,SMART_RESTART.*,.....)

14

BATCH JOB EXECUTION


JCL Changes
There is skeleton JCL in TTAP.TS2.DB2.SYSJCL(SMARTRES), which you may copy and modify. Please follow the comments in the JCL. This JCL is shown in Appendix B. Here is some sample JCL:
//* //* Allocate the datasets in this step (Output and Checkpoint datasets) //* //ALLOC EXEC=PGM=IEFBR14 //* //SRSCHECK DD DSN=your.checkpoint.dataset.name,DISP=(NEW,CATLG,CATLG), // DCB=DSORG=DA,UNIT=SYSDA,VOL=SER=xxxxxx,SPACE=(TRK,60) //* //OUTPUT DD DSN=your.output.dataset.name,DISP=(NEW,CATLG,CATLG), // DCB=ddddd,UNIT=SYSDA,VOL=SER=vvvvvv,SPACE=(xxxxxxx) //* //* //* Run your main program here //* //SRSSAMP8 EXEC PGM=GHXC9310,REGION=xM //* //STEPLIB DD DSN=TTAP.TS2.SRESTART.DCALOAD,DISP=SHR // DD DSN=TTAP.TS2.DB2.APFLIB,DISP=SHR // DD DSN=your.loadlib,DISP=SHR // DD DSN=SYS1.SCEERUN,DISP=SHR //* //SRSCHECK DD DSN=your.checkpoint.dataset.name,DISP=(OLD,KEEP,KEEP) //* //OUTFILE DD DSN=your.output.dataset.name,DISP=OLD //* //INFILE DD DSN=your.input.dataset.name,DISP=OLD //* //SYSOUT DD SYSOUT=* //SYSPRINT DD SYSOUT=* //* //RAINPUT DD DSN=your.sysin.dataset.name(membername),DISP=SHR

Run-time JCL should be structured as follows: EXEC statement - Program executed is the application program. STEPLIB should have TTAP.TS2.SRESTART.DCALOAD before TTAx.TS2.DB2.APFLIB SRSCHECK DD add this ddname, pointing to a pre-allocated checkpoint dataset for use by Smart/RESTART. The attributes should be DCB=DSORG=DA,UNIT=SYSDA, VOL=SER=vvvvvv,SPACE=(TRK,60) without an LRECL or BLKSIZE. SRSPRINT DD add this for Smart/RESTART messages, send to SYSOUT Input and output files should be pre-allocated, unless you want CA-11 to take care of disposition on restart (SYSIN, SYSOUT and DUMMY files are not managed by Smart/RESTART). In the job

15

run, allocate them as DISP=OLD,KEEP,KEEP. If it is tape, then allocate as DISP=MOD,CATLG,CATLG. Flat files must be allocated with an LRECL or BLKSIZE parm. RAINPUT DD add this ddname, with the run-time parameters for this program. These include parms to control checkpoint frequency or time, input file repositioning on restart, which files to reposition on restart (instead of all of them), or establishing an LUW external to the program (driver file oriented). See list of parms below.

The SYSTSIN DD is not used. If there are parms on the DSN RUN statement which are passed to the program, you can now pass them via the PARM statement on the EXEC statement: //STEP01 EXEC PGM=TESTPGM,PARM=(xxxxx=yyy,zzzzz,rrrr) SYSIN, SYSOUT and DUMMY files are not managed by Smart/RESTART. If you need them repositioned at restart time, please refer to the Smart/RESTART User Guide, Chapter 4 Program Execution and Restart, item 4.1.4 SYSIN, SYSOUT and DUMMY files. CHECKPOINT DATASET: For the checkpoint dataset, do not use DISP=(MOD,DELETE,DELETE). You need the checkpoint dataset saved for restart in case of an abend. Allocate the checkpoint dataset in a step before the program you are running. You may also use the same dataset over and over again, by specifiying CLEAR_CKPT(AFTER) in the runtime parameters. This clears the checkpoint dataset upon successful completion of the job. Code the disposition as DISP=(OLD,KEEP,KEEP). Do not code RLSE. The vendor very specifically recommends creating the checkpoint dataset in a prior step, and then using DISP=OLD. This is the cleanest approach. However, if CA-11 is being used, you may create the checkpoint dataset in the actual program step, using NEW,CATLG,CATLG. CA-11 will change the disposition to OLD on restart. INPUT AND OUTPUT SEQUENTIAL FILES: It is strongly recommended that output files be preallocated before the program step runs. They would then be allocated as DISP=OLD in the program step. This avoids many many issues with dataset disposition on restart. Some applications prefer to use MOD,CATLG,CATLG. This causes a problem when the step abends before the first physical checkpoint is taken. After restart, there will be duplicate data is the files allocated MOD. If use of this DISPOSITION is required, please contact the Smart/RESTART support person for details on how to arrange the restart so that it will work properly. The DCB parm "RECFM" must be specified for any output files that SRS is intended to REPOSITION on a restart. RESTART: Changing the JCL for a restart is the responsibility of the production support personnel. Tasks such as changing GDG numbers and dataset disposition exist for all restart situations. There are no special restart instructions for Smart/RESTART.

16

Run-time Parameters
Run-time parameters are passed to Smart/RESTART and Smart/CAF via the dataset specified in the RAINPUT DD. These parameters include: PLAN(xxxxxxxx) SYSTEM(DB2x) CKPT_FREQ(50) MSG_DISPLAY(VERBOSE) REPOSITION(EXCEPT(ABNLTERM)) CLEAR_CKPT(AFTER) ESPIE(ON) IEXIT(ON) VERIFY_JID(OFF) SQL_ERROR(WARNING) SQL_TRAP(ON,911) These reside in TTAP.TS2.DB2.SYSIN(SMARTRES). You may copy these and use them, putting in the proper plan name and DB2 subsystem name. You must also determine if the logical unit of work will be indicated by a DB2 COMMIT or a read of a driver file. You may want to change some parameters, especially during testing. Refer to the Smart/RESTART Reference manual and Smart/CAF Users Guide for a list and description of all of profile parameters available. A description of some useful parms is included here:
CKPT_FREQ: There is a difference between logical checkpoints and physical checkpoints. This difference also existed in Quickstart. You tell Smart/RESTART the checkpoint frequency desired for this job run. The checkpoint frequency tells how many logical checkpoints (or SQL commits) are issued before a physical checkpoint is actually done. Physical checkpoints include issuing the commit, completing I/O, and writing all working storage and checkpoint information into the checkpoint dataset. This frequency can be changed dynamically during a job run. The checkpoint frequency allows you to manage your resources, choosing either speed (less frequent checkpoints) or concurrency (more frequent checkpoints). Note that restart always occurs from a physical checkpoint. You can choose a checkpoint frequency of 1, which would do a physical checkpoint at every COMMIT. If you choose 20, a physical checkpoint would be done every 20th COMMIT. The 19 COMMITs issued before then would not actually cause a commit to be done. There is a time parameter that operates the same as CKPT_TIME. REPO_STYLE: For each sequential input file, you can specify via runtime parameters, which record in the file you want to read on restart. You can choose between the record after the last record read at checkpoint, or to read the same record as the one read at checkpoint.

17

CLEAR_CKPT: This parameter clears the checkpoint dataset. BEFORE says clear it before invoking the user program; AFTER clears it upon successful completion of the program. BEFORE is a useful option when you want to rerun your program from the beginning over and over again as part of testing. AFTER tells Smart/RESTART to read the checkpoint dataset for information, determine if it is a restart, and then set up the environment to look like it did as of the last checkpoint (restore working storage, reposition the files). EXT_COMMIT: There is an external commit runtime parameter, which allows you to specify a unit-of-work external to the application. You can tell it to issue a checkpoint based on the record count from an input driver file, OR you can have him issue a checkpoint based on the values in various fields in the input driver file. (pass column and length, and a checkpoint will be triggered whenever the contents of that field changes.) Works with up to 6 columns. Requires cursors to be programmed with WITH HOLD. CKPT_TIME: causes the checkpoint to be done after a certain amount of time has passed. Still requires a commit to have been issued.

ESPIE(OFF) = use only when testing under Xpeditor, to allow Xpeditor to intercept abends IEXIT(OFF) = use only when testing under Xpeditor SQL_TRAP = tells Smart/RESTART to intercept certains abends, and automatically restart the application. For example, to prevent your job from abending due to a 911, you could have the following parm in your RAINPUT:
SQL_TRAP(ON,911,AUTO,5) This would tell Smart/RESTART to intercept the 911, and rollback the application to the last checkpoint taken, and restart the application from there (restoring working storage and resetting file positions). It will do that 5 times, and after that the job will abend.

SQL_ERROR = recommendation is WARNING. Keep this value unless you want to have the step abend if any negative SQL codes where encountered during program execution. Read about it before changing the value. TEST_ABEND(physical,logical) = TEST_ABEND requests a user abend after NNNN physical check-points or NNNN logical checkpoints, which ever occurs first. TEST_ABEND enables you to test abend/recovery/restart processing without modifying the source program.

Production Migration
To implement the job into production, please follow the normal rules for migrating into production. These rules include naming standards and dataset dispositions. It is highly recommended that the checkpoint dataset and sequential output files be allocated in a step prior to the program step. This removes any confusion over dataset disposition in an initial and in a restart run. Please be aware of changing the RAINPUT in the run JCL, to be ready for production. This includes being sure the following parameters are set to these settings: CLEAR_CKPT(AFTER) CKPT_FREQ set to a reasonable value for production ESPIE(ON) IEXIT(ON) MSG_DISPLAY(TERSE) TEST_ABEND delete this parm 18

CONVERTING FROM QUICKSTART


Change the PGM= from QSATTACH to the name of the main program, and include the PARM= statement for any parameters that need to be passed to the main program. Delete the PARM= statement which was being used for Quickstart. Change STEPLIB library from TTAP.TS2.QSTART.ISPLLIB to TTAP.TS2.SRESTART.DCALOAD Change the QSTART DDname to SRSCHECK, and replace the dsn information for your checkpoint dataset. It is recommended the you pre-allocate the checkpoint dataset and specify a DISP=(OLD,KEEP,KEEP). For more detail see Chapter 4, item 4.1.2 Defining the Smart/RESTART Checkpoint Dataset on page 4-4 of the User Guide. Change the QSCTRL1 DDname to RAINPUT. Change the QSREPT DDname to SRSPRINT. Edit the member referenced by the RAINPUT DD; delete any Quickstart parms no longer needed, and add any new run-time profile parameters used by the Smart/RESTART and Smart/CAF utilities (refer to the section of this document titled BATCH JOB EXECUTION, bullet RAINPUT DD for more detail information as well as references to the complete listing of profile parms found in the Smart/RESTART manuals).

Please refer to .xls document named SRS Conversion (Quick-Start to Smart-Restart), the Tab labled Parm Conv for example Parms converted for the Pricing System. The Tab labled Detail for QSCTRL1 shows every Quickstart profile member, the parms conatined in it, and the value(s) of those parms. This Tab also contains a key at the bottom describing the specific Quickstart Checkpoint Frequecy (which is govern by several variables) used for each member. With this understanding one should be able to genericly convert to the much simpler Smart/RESTART Checkpoint Frequency governing Parm CKPT_FREQ(rate,time). See above for detailed information on the CKPT_FREQ( ) parm. The Tab labled RUNJCL, PROC & SYSIN (QSCTRL1) simply lists all of the Pricing Systems Quickstart profile members and the RUNJCL / PROCs which call them.

INTEROPERATION WITH OTHER PRODUCTS:


CA-11 transparent, esp. if auto setup specified for job in CA-11. Smart/Restart considerations on file space issues: If a batch job abends with an SB37 on a file being managed by Smart/RESTART, then you need to coordinate your JCL and space changes with CA-11. CA-11 changes the JCL on restart to allocate the +0 gen rather than +1. The fix is to reallocate the output file yourself with more space, and copy the existing data into it, delete the old file and rename the new one OR tell CA-11 that it is a new run and to start fresh. The Smart/Restart manual says to reallocate a new file with more space, copy the old one in there and restart. A Smart/Restart limitation exists in the need to have the file on the same device type and to have the same dataset attributes, which shouldn't be a problem. From Smart/Restart's point of view, the name of the file could be different. But because of CA-11, you would rename the new, larger file to the +0 gen name.There is a run-time parameter called REST_NEWDSN which controls what Smart/Restart should do if the name of a repositionable file is changed between runs. You can specify REST_NEWDSN(ABEND) or REST_NEWDSN(RETURN), which is the default and preferred. RETURN indicates to issue a warning message and allow the job to continue.

19

XPEDITOR
When testing with Xpeditor, use ESPIE(OFF) and IEXIT(OFF) in RAINPUT, and SET STATIC(ON) in Xpeditor. Both Smart/RESTART and Xpeditor replace the LE/370 ESPIE. There can only be one ESPIE active at a time. Smart/RESTART will not disable anybodys interrupt handlers. Therefore, Xpeditor will handle the interrupts before Smart/RESTART and you want to turn Smart/RESTARTs interrupt handler off via ESPIE(OFF). These interrupts include S0C7, divide by 0, user abends, etc. IEXIT(OFF) disables the Smart/RESTART implicit exit, which is the only command that happens at the beginning of the invocation. It competes with Xpeditor alos. The number two thing to remember is to remove the ESPIE(OFF) and IEXIT(OFF) parm from RAINPUT when you are not testing through Xpeditor.

COBOL INFORMATION
Here are the DB2 Precompiler values for the HOST parm: HOST(COBOL) = for OS/VS COBOL HOST(COB2) = for VS COBOL II HOST(IBMCOB) = for COBOL/370 and IBM COBOL for MVS Here are the Smart/Restart Precompiler values for the HOST parm: HOST(COB2) = for VS COBOL II, COBOL/370 and COBOL/390 Only needed to tell Smart/Precompiler that program is not OS/VS COBOL. The COB2 value works for all other versions of COBOL. This was the status of the different COBOLs at CIGNA: COBOL/MVS/370 = not supported OS/VS COBOL = not supported COBOL II = supported until April 2001 COBOL/390 = supported The COBOL compiler option, DYNAM, causes separately compiled programs invoked through the CALL literal statement, to be loaded dynamically at runtime. DYNAM causes dynamic loads of separately compiled programs at runtime. Any CALL identifier statements that cant be resolved in your program are also handled as dynamic calls. All of the Endevor process groups have DYNAM specified in the compiler options.

JOB RUN INFORMATION


You may use Smart/MONITOR to view job run information. As an alternative, you may select rows from SRS.CKPT_SYNCH. At job start-up, SRS reads the checkpoint dataset for the JOB ID. If there isnt a matching row in SRS.CKPT_SYNCH, then he considers it an initial job run. Otherwise, he identifies it as a restart run and proceeds accordingly.

20

COMMON PROBLEMS AND ERROR MESSAGES


Here is a list of some errors which have been encountered, and some colloquial wisdom regarding them. For complete descriptions of messages from Smart/RESTART or Smart/CAF, please refer to the Messages manuals at http://enterpriseproducts.sys.cigna.com/Tps_Library_Startup.htm.

Error
SQL Code 805 SQL Code 911 and DSNT4081

Action
-805 = on plan, with reference to SRSDBRM.....xxxxx = means the Smart/RESTART collection is not in the plan -911 = DCA096I = DSNT408I = current UOW has been rolled back due to deadlock or timeout. Reason 00C9008E, type of resource 00000D01, and resource name 00000mmm.00000nnn = this is the classic deadlock/timeout contention message. The object on which the contention occurred is sometimes referenced by internal ID rather than name. This occurs with both DB2 CAF and Smart/CAF). To obtain the table name, run the following select statement:
SELECT CREATOR, NAME FROM SYSIBM.SYSTABLES WHERE DBID = mmm AND OBID = nnn

SQL Code -927

-927 = the Language Interface was called, when the connecting environment was not established. The programs should beinvoked under the DSN command. This means that a main program has one CAF and a subprogram has another (you cant mix DSNHLI (IBMs call attach) and DCAHLI (Smart/CAF)). Usually indicates that Smart/CAF is in the main program, and DSNHLI is in the subprogram. See S0C1 below for another manifestation of this. This problem may be masked by Xpeditor (not occur in Xpeditor but then occurs in batch). S0C1 = can receive this if IKJEFT01 is loaded, and then Smart/CAF is loaded. Smart/CAF could be statically linked into main (or sub) program, or if Smart/CAF is dynamically loaded. Check links and Steplib concatenation. U3000/U4000 = these return codes are from LE/370, not Smart/RESTART

Abend S0C1 Abend U3000 or U4000 Restarts at beginning instead of last checkpoint Duplicate Data

RESTARTS AT BEGINNING RATHER THAN LAST CHECKPOINT = this occurs when the checkpoint dataset does not have any checkpoint information in it. Two scenarios where this can occur on restart are: the CLEAR_CKPT(BEFORE) parm is specified, or the checkpoint dataset used on restart is a new one (for example, the JCL specified a +1 generation for both the initial and the restart run.) DUPLICATE DATA IN OUTPUT FILE AFTER RESTART = This can occur if the abend occurred before any checkpoints were taken and the dataset disposition for the output file is set to MOD. The recommendation is to never use MOD,CATLG,CATLG.

21

Reason Code 1001 Abend U2005

Reason Code 1001 = means that there was an attempt to initialize the Smart/RESTART environment but it was unsuccessful. Notes say to specify VERIFY_JID(OFF). U2005 = can occur after the program has run to completion, and it received a few negative SQL codes while it was executing. The U2005 is a way of saying, yes, there were errors encountered during the run. Most jobs do NOT need to know this. The fix is to specify SQL_ERROR(WARNING) on input. SB37 If a batch job abends with an SB37 on an input or output file being managed by Smart/RESTART, then you need to coordinate your JCL and space changes with CA-11. CA-11 changes the JCL on restart to allocate the +0 gen rather than +1. The fix is to reallocate the output file yourself with more space, and copy the existing data into it, delete the old file and rename the new one OR tell CA-11 that it is a new run and to start fresh. The Smart/Restart manual says to reallocate a new file with more space, copy the old one in there and restart. A Smart/Restart limitation exists in the need to have the file on the same device type and to have the same dataset attributes, which shouldn't be a problem. From Smart/Restart's point of view, the name of the file could be different. But because of CA-11, you would rename the new, larger file to the +0 gen name.There is a run-time parameter called REST_NEWDSN which controls what Smart/Restart should do if the name of a repositionable file is changed between runs. You can specify REST_NEWDSN(ABEND) or REST_NEWDSN(RETURN), which is the default and preferred. RETURN indicates to issue a warning message and allow the job to continue. If a SB37 occurs for a checkpoint dataset, then restart is not possible.

Abend SB37

22

APPENDIX A - UNDERSTANDING THE CALL ATTACH FACILITIES (CAF)


There are two potential CAFs which may be used. One is the CAF which comes with DB2. The other is the CAF which comes with Smart/RESTART. We recommend the use of Smart/CAF as the call attach facility to use with Smart/RESTART-enabled job steps. This is the default in Endevor. The differences between Smart/CAF and the DB2 CAF are discussed below. This module can be resolved at linkedit or at run-time. Here is a quick reference to the physical differences:
DB2 CAF = module DSNELI in TTAx.TS2.DB2.APFLIB it has an entry point named DSNHLI Smart/CAF = module DCAHLI in TTAP.TS2.SRESTART.DCALOAD it also has an entry point named DSNHLI

Smart/CAF receives the EXEC SQL request, and if it is a COMMIT or ROLLBACK then decides if it is time to take a physical checkpoint. If so, it invokes Smart/RESTART to save working storage and file positions. Then a commit is issued and information about the checkpoint is logged into a DB2 table. If it is not a COMMIT or ROLLBACK request, Smart/CAF simply passes the request on to DB2. Smart/CAF is a separate utility from Smart/RESTART. Smart/CAF provides many capabilities. Because execution JCL would specify the main application program name, rather than IKJEFT01, the job step completion codes would accurately reflect the results of the program run (which are hidden by IKJEFT01). Smart/CAF provides much more meaningful messages than running through the regular DB2 CAF (batch TSO and the DSN command processor). Finally, Smart/CAF is required to use Smart/Monitor, a front-end ISPF interface which shows the disposition of batch jobs and allows modification of the run-time parameters. DB2 CAF also receives the EXEC SQL requests, and passes them directly to DB2. Using the DB2 CAF allows you to make fewer changes your current batch job JCL. The JCL structure is more familiar to production support personnel. Also, you can use this in the following situation: when you have subprograms called by many different existing main programs, and you dont want to change all of the job JCL, and you are willing to not have 911 retry logic. Static vs. Dynamic Links: By default, Endevor statically links in the CAF. You may want to dynamically pick up the CAF in the following scenario: If the program is a subprogram and is called by many main programs, and Smart/RESTART is not utilized by all of those main programs, then the CAF should be picked up at run-time to control which CAF is used. This may be accomplished by changes to the Endevor process group to not include DSNHLI, and by concatenating the Smart/RESTART load library ahead of the DB2 APFLIB in jobs which are utilizing Smart/RESTART. (how to do this in Endevor?) If the Smart/CAF-Smart/RESTART load library is ahead of DB2s, but Smart/RESTART is not being used, it is okay because Smart/CAF checks for Smart/RESTART, and if it isnt there then it just passes control to DB2. An error will occur when the following is encountered: PGM=IKJEFT01 in the JCL, but DCAHLI is picked up from steplib by mistake or linked into the program.

23

You can include the CAF in a variety of ways: Either at linkedit time or at runtime. In either case, the module which gets picked up follows this chart:
INCLUDE DSNHLI with TTAP.TS2.DB2.APFLIB first in steplib ==> uses DSNELI INCLUDE DSNHLI with TTAP.TS2.SRESTART.DCALOAD first in steplib ==> uses DCAHLI INCLUDE DSNELI with TTAP.TS2.DB2.APFLIB first in steplib ==> uses DSNELI INCLUDE DSNELI with TTAP.TS2.SRESTART.DCALOAD first in steplib ==> uses DSNELI if APFLIB in steplib INCLUDE DCAHLI with TTAP.TS2.DB2.APFLIB first in steplib ==> uses DCAHLI if DCALOAD in steplib INCLUDE DCAHLI with TTAP.TS2.SRESTART.DCALOAD first in steplib ==> uses DCAHLI

If you want to use Smart/CAF then you want to Include the DSNHLI from DCALOAD, which is an alias to DCAHLI in DCALOAD. This may be included during linkedit or picked up at run-time. To pick it up at run-time, DCALOAD should be concatenated ahead of the DB2 APFLIB in the STEPLIB dd. The best approach is to pick up the CAF at run-time. Smart/CAF receives control on every SQL request. When it receives a COMMIT, it calls the checkpoint service to save an execution time snapshot of the working storage and seq file positions, which will be in sync with the committed DB2 resources. The checkpoint service will do the saves and commit based on its checkpoint frequency. There is a limitation which forces you to use either all Smart/CAF or all TSO attach for the main and subprograms. This is implemented via a knowledgeable mix of static and dynamic links.

Link-edit
At this point, you need to decide which CAF Call Attach Facility you would like to use. You can choose between either the DB2 TSO call attach or Smart/CAF. The CAF will intercept the EXEC SQL statements in a program and pass the request on to DB2. The differences between them are discussed below. Smart/CAF receives the EXEC SQL request, and if it is a COMMIT or ROLLBACK then decides if it is time to take a physical checkpoint. If so, it invokes Smart/RESTART to save working storage and file positions. Then a commit is issued and information about the checkpoint is logged into a DB2 table. If it is not a COMMIT or ROLLBACK request, Smart/CAF simply passes the request on to DB2. Smart/CAF provides many capabilities. Because execution JCL would specify the main application program name, rather than IKJEFT01, the job step completion codes would accurately reflect the results of the program run (which are hidden by IKJEFT01). Smart/CAF provides much more meaningful messages than running through the regular DB2 CAF (batch TSO and the DSN command processor). Finally, Smart/CAF is required to use Smart/Monitor, a front-end ISPF interface which shows the disposition of batch jobs and allows modification of the run-time parameters.

24

SMART/CAF is implemented by linking a module named DCAHLI. DCAHLI has an entry point called DSNHLI. Use of the DB2 CAF allows you to make fewer changes your current batch job JCL. The JCL structure is more familiar to production support personnel. DSNHLI is the DB2 call attach facility stub which intercepts DB2 calls from a program. It is an alias to DSNELI. Currently, Endevor process groups linkedit DSNELI into batch COBOL DB2 programs. CIGNA recommends the use of Smart/CAF. After making a decision between the two call attach facilities, you must now decide between dynamically and statically linking the CAF into the program modules. Program has call to DSNHLI inserted by DB2 precompiler, which can be resolved at linkedit or at run-time. During link-edit, Endevor has an explicit include statement for DSNELI. To include the Smart/Restart module and the Smart/CAF module, you can do either explicit includes for them or add the Smart/Restart loadlib first in the SYSLIB concatenation. It is much much better to dynamically link the module, rather than statically link it in, due to reusability of modules. See notes below. Library = TTAD.TS2.SRESTART.V6R2M0.DCALOAD Member = DCAHLI (which has an entry point named DSNHLI) Best approach is for main program and subprogram to each dynamically call DSNHLI, and to have the SMART/RESTART loadlib concatenated ahead of the DB2 APFLIBs in Steplib. Second best approach is to have DCAHLI statically linked into the main program, but for each dynamically called subprogram to dynamically link to DSNHLI and pick up DCAHLI from Steplib. It is okay to have both the main and subprograms to have DCAHLI statically linked into them It is futile to mix DCAHLI and DSNHLI between main and subprograms. Example 1: Subpgm B (no SMART/RESTART precompile) does commit, has dynamic call to DSNHLI, picks up DCAHLI from SMART/RESTART loadlib. DCAHLI checks to see if SMART/RESTART initialized in main module, sets up Smart/CAF, looks for SMART/RESTART. SMART/RESTART established from main program. (???) Example 2: Subpgm B (with SMART/RESTART precompile) and with working storage to save, cannot be called by a main program which doesnt have SMART/RESTART. Main program must have SMART/RESTART precompiled into it for subprograms to save storage. If a subprogram has commits or other SQL, but no working storage to save, and it is therefore not run thru the SMART/RESTART precompiler, then it is okay for it to be called by either a main program which has SMART/RESTART or one which hasnt. Subprograms which need working storage save can be called only by a program compiled with SMART/RESTART. Smart/CAF it is best to do a dynamic call of DSNHLI, and control the use of Smart/CAF by coding SMART/RESTART lib ahead of DB2 lib in the STEPLIB concatenation.

25

If SMART/RESTART is not being used, it is okay cuz Smart/CAF checks for SMART/RESTART, and if it isnt there then it just passes control to DB2. Vendor suggests we consider Smart/CAF use for non-SMART/RESTART programs. Smart/CAF is separate from Smart/Restart. If a subprogram using Smart/CAF is called by a main program with Smart/RESTART but no DB2 in it, then this is okay if the RAINPUT to the job has CAF(SMARTCAF) in it, or if that is the default (it is the default at CIGNA). Note that Smart/RESTART has DCAHLI linked into it; however, it uses DSNHLI if the RAINPUT parm specifies CAF(TSO). An error will occur when the following is encountered: PGM=IKJEFT01 in the JCL, but DCAHLI is picked up from steplib by mistake. Refer to the Endevor Process Group documentation for more info on how to implement. If using DSNHLI from IBM for all programs, then must pass the following parms via RAINPUT: Precompiler = CAF(TSO) or caf(other)? Check this Runtime = RMI(SRSDBRMT) Bind = Include collection Smart_Restart_DSNH into plan For programs with DB2 SQL statements, a module needs to be linked into the program. This module is called the Call Attach Facility (CAF), and all DB2 programs require one. The EXEC SQL statements in a program are translated by the DB2 precompiler into calls to DSNHLI, which is an entry point name in the call attach facility. For subprograms, you must linkedit in Smart/CAF. (give details)

USING DB2 CAF INSTEAD OF SMART/CAF


If you decide not to use Smart/CAF, but to use the call attach facility which comes with DB2, then you should make the following changes to the program preparation and runtime procedures: PRECOMPILER The Smart/RESTART precompiler must be run for each program with an EXEC SQL COMMIT in it, even if there isnt any working storage to save. The following parm must be in the RAINPUT for the precompiler: CAF(OTHER)
this replaces CAF(SMARTCAF)

LINK Include DSNHLI from the DB2 APFLIB BIND The bind for the plan must include the collection SMART_RESTART_DSNH(*) rather than SMART_RESTART(*) RUN-TIME The parms passed to Smart/RESTART at runtime must include the following in RAINPUT: CAF(TSO) RMI(SRSDBRMT)
this overrides the installation default of CAF(SMARTCAF) this overrides the installation default of SRSDBRM

26

APPENDIX B SAMPLE EXECUTION JCL


BROWSE TTAP.TS2.DB2.SYSJCL(SMARTRES) - 01.01 Line 00000000 Col 001 080 Command ===> Scroll ===> CSR ********************************* Top of Data ********************************** //* //*-------------------------------------------------------------------//* PLEASE CHANGE THE FOLLOWING VARIABLES: //* application-checkpoint-dataset-name //* vvvvvv //* yourpgm //* application-loadlib //* application-parmlib(membername) //* //* //*-------------------------------------------------------------------//ALLOC EXEC PGM=IEFBR14 //* //SRSCHECK DD DSN=application-checkpoint-dataset-name, // DISP=(NEW,CATLG,CATLG),UNIT=SYSDA, // DCB=DSORG=DA,SPACE=(TRK,60),VOL=SER=vvvvvv //* //* //STEP01 EXEC PGM=yourpgm,REGION=8M //STEPLIB DD DISP=SHR,DSN=TTAP.TS2.SRESTART.DCALOAD // DD DISP=SHR,DSN=TTAx.TS2.DB2.APFLIB // DD DISP=SHR,DSN=application-loadlib // DD DISP=SHR,DSN=SYS1.SCEERUN //* //SYSABOUT DD SYSOUT=* //SYSOUT DD SYSOUT=* //* //SRSCHECK DD DISP=SHR,DSN=application-checkpoint-dataset-name //* //SRSPRINT DD SYSOUT=* //* //* PROFILE PARAMETERS WILL OVERRIDE CORRESPONDING SITE DEFINED //* DEFAULTS //* //RAINPUT DD DISP=SHR,DSN=application-parmlib(membername) //* //* ====> ADD THE INPUT AND OUTPUT FILES HERE <==== //* //*

27

You might also like