You are on page 1of 12

UNIT 1 RISK ANALYSIS AND DISASTER

PLANNING

Structure

1.0 Introduction
1.1 Objectives
1.2 Risk Analysis

1.2.1 Initial Planning


1.2.2 Role of a Risk Manager
1.2.3 The Need for Backup and Recovery
1.2.4 Preparing Procedures
1.2.5 Requirement of Critical Jobs
1.2.6 Evaluating Alternate Response
1.2.7 Compiling the Package

1.3 Disaster Recovery Planning

1.3.1 Disaster Recovery Planning Task


1.3.2 Disaster Recovery Plan Components

1.4 Summary

1.0 INTRODUCTION
Information has now come to be treated at par with other vital resources by most organisations. Inadvertent
or malicious loss, misuse or destruction of data can lead to consequences as disastrous as loss of men,
material or money.

Traditionally, the armed forces have been very sensitive to leakage of plans or information on dispositions.
Financial institutions too have paid attention to building checks and balances to guard against fraud or
misappropriations.

Currently, the need for safeguarding Corporate Information has become more acute. This is due to the wide
dispersal of data within the organisation and the sophisticated means available for tapping into the
databases. An ostrich like attitude, towards security of data, can only result in disasters, and, therefore, it is
better to be aware of and implement security measures.

1.1 OBJECTIVES
At the end of this unit you would be in a position to

• explain and appreciate the need of Risk Analysis


• define initial planning and role of a Risk manager
• understand the need of backup and recovery process
• explain disaster recovery planning
• define various places of disaster recovery planning.

1.2 RISK ANALYSIS


The purpose of risk analysis is to determine the probability of problems occurring, the cost of each possible
disaster, the areas of vulnerability and the preventive measures to adopt as part of a contingency plan.
Thus, what is required is risk management.

Risk management has been described as that element of managerial action that is concerned with
identification, measurement and control of uncertain events. It is used to make decisions regarding the
costs of (monetary as well as other) protecting against possible events endangering the organisation.

In subsequent sections let us look into several aspects relating to Risk Management.

1.2.1 Initial Planning

While carrying out the initial planning, considerable thought should be given to the following:

• Estimated cost and availability of funds to perform an analysis.


• Value of the physical installation.
• Worth of data to the organisation and to others.
• Existing safeguards.
• Impact of data processing on the organisation's mission of goals.

From this summary, management could then determine those risks that could be tolerated by the
organisation and those which require some control. Those requiring control then could be assessed
clinically for risk avoidance.

1.2.2 Role of a Risk Manager

Creation of a position of risk manager is strongly recommended because the system is not likely to succeed
without having one knowledgeable individual responsible for decision making, and supervision; overall
control of technical and analytical activities in the process; and it is continuum.

In a small organisation, the position could be assumed as a collateral one to a top level management
official. In a large and complex entity, however, a separate position that is sufficiently high in the
organisation, should be established for a risk manager, with authority for data processing security across
the organisational lines. Some requisites for a top level risk management position are:

• Knowledge of short and long range goals of the organisation;

• Awareness of users security needs and priorities to the establishment and maintenance of
appropriate level of security;

• Awareness of new technology in security;


• Authority to make, or assist in making, policy decisions on security programs and procedures;
• Authority, with management approval, to implement security measures, deemed feasible from
a risk analysis;

• Ability to follow through, periodically, on security policies and practices in action; checking
actual performance and, results and taking corrective action; if necessary punitive action.

It is advisable to take up this work along with the Data Base Administration of the organisation.

To the start of the contingency planning project, a team of 3-4 managers from various functional areas is
formed. The approach normally followed is to base the contingency plans on rational economic analysis
and to avoid problems of internal politics of the organisation. The objectives of the project team generally
include the following :

• Conservation of assets upon exposure to a major hazard whether fire, storm, sabotage of other
hazard;

• Assurance that the corporation will survive even if the computer facilities are disabled, or
destroyed;.

• Specific action plans that a 'prudent man' should take while incharge of the organisation's most
vital asset : data.

Generally this activity is a pioneering effort, therefore a detailed project plan preparation is recommended
Typical duration of the contingency planning project is an estimate of 275 man-days for the total effort for
the development of the contingency plan, Break up of activity duration are given in Table 1

Table 1
Project Out-Line
S1.No. Task Applied effort
(man-days)
1. Plan the project 11
2. Establish current status of backup and recovery 08
3. Prepare procedure, lists and forms 09
4. Establish loss due to delay* 136
5. Specify critical applications 26
6. Evaluate alternate responses 18
7. Document the recommended plans 17
8. Creation of emergency procedures note-book 22
9. Document the information required to reconstruct 18
10. Complete project 'package' 10
Total 275

Remarks
*Establishing losses resulting in delays in processing is the most difficult part of the contingency planning.

1.2.3 The Need for Backup and Recovery


The hazards that could disable the computer operations are generally categorised as follows:

• Hardware and software failures;

• Environmental failures involving electric power air conditioning, building integrity etc.

• Accidents like fire, smoke, water, storms,

• Vandalism, sabotage, rioting;

• Operational errors-probably the most frequent case for inability to operate, often with the most
severe consequences;

• Non-availability of personnel whether due to strike, disease communications breakdown or


disruption of transportation.

For any of the first five categories, the effect would be partial or total inoperability, or perhaps the
destruction of facilities, data, programs and files, the duration of the effect could range from a temporary
interruption to a permanent loss. Hence, there is a definite need of a proper system for backup and
recovery. The sixth category, the unavailability of personnel, would result in temporary interruption.

1.2.4 Preparing Procedures

The form at Table 2 is used as a tool for uniformly recording and evaluating tile data showing the potential
losses to the organisation if a hazard makes it impossible for the computers to produce outputs on time.

Table 2
"Criticality Evaluation"
Application Progress Loss if delay is
12 hrs. 24 hrs. 2 days. 4 days. 7 days. 2 weeks. 1
month
System A 1m 5m
Subsystem A3 0m 175m
Program A3N 50m 70m
Program A3M 2m 5m
Subsystem A4 15m 25m
System 3m 200m

where m is the monetary unit

Note : The object of the contingency plan is to discover which applications/programs are most critical in
terms of losses incurred.

In many cases it is discovered that the cost to the organisation (if it was unable to produce the outputs on
time) was of such magnitude that both the Organisation and the users agreed that under no circumstances
would the organisation tolerate such losses. Hence a dual evaluation was undertaken for those application
systems with extremely high loss potential. First, as usual, the loss to the organisation, if unable to produce
the output on time, is calculated; then for comparison, the steps that would be required under the worst
conditions regardless of cost to prevent this major loss from ever happening.
The detailed analysis, in normally all cases, is done by the user group itself, with assistance and guidance
from the project team members. Getting the user group involved in the analysis is found to be of high
value, because it forces to think through what user will have to do in case of an emergency. It also
compelled them to make an economic analysis of the value of their work, in a corporate sense, rather than
from the usual parochial point of view.

1.2.5 Requirement of Critical Jobs


Upon identifying the critical jobs, the requirements of these jobs are established as following:

• Timing :Schedules acceptable delays, minimum/maximum for normal processing.


• Equipment :Core, tape (density, track), discs, printers, special features.
• Data :Files, generation, data group, catalogue and procedures.
• Software :Special programs, protection, passwords,
• Preprocessing :User interface, Inputs, data preparation, error handling, prerequisite runs.
• Personnel :User contacts, data preparations data controller, distribution, supervision, support.
• Post Processing :Distribution lists, control.
• Others :Documentation (Block diagrams, record layouts, source program listing), procedures (operati
instructions, check points and restarts), security, sabotage, forms, supplies.

1.2.6 Evaluating Alternate Response

With critical systems satisfactorily identified, what should be the responses to an 'accident' or 'catastrophe'?
Let us summarise, first, the essential elements of any form of response to an unwanted vent which could
lead to delay in data processing operations :

• Obviously one must evaluate the situation and estimate the consequences, including a recognition
of the time period in which the accident occurred. If it occurs on a weekend, some specific steps
must be taken. At what period of a cycle, in processing, are we when the operation is brought to
sudden halt ?

• Probably the most neglected response element is communication with all of the affected parties.
One should not hide the fact that a significant emergency has occurred. Mechanisms (including
responsibilities and authorisations) must be set up in advance to communicate with the users,
suppliers, personnel and all others in any way involved.

• As quickly as possible, the selected response actions should be initiated. Operations in the back-
up mode should be activated on the basis of the contingency plans developed, and by those made
responsible in the plan. Necessary check points and controls should not be over looked including
extra security safeguards. It should be remembered that everything will be under abnormal
conditions, for instance, transportation problems may become severe.

• Actions to restore normalcy should be started. During emergency the data processing is based on a
limited scope. Once back to operating in routine, nomalcy has not yet been reached. Time will be
need on equipment and overtime for most of the personnel to restore master files and bring them
up to a current status. Those files and systems, that were temporarily processed in a contingency
mode, will require much updating. Additional checking of files and supplementary audits must be
undertaken to assure that normalcy is indeed restored.

Let us now examine some of the alternative responses :


• Accept the delay. Just do nothing and wait, if one can afford to. This is the simplest response.

• Attempt to remove or minimize the cost of delay.

• Change, immediately, the schedules of operation and process only what is critical, using as a basis
the economic analysis of critical jobs. By reducing the scope of operation one will concentrate on
only the true essentials.

• Go off-site whether locally or remotely. This may require running extra hours for the main
processing, and again subsequently to help catch up with the backlog of systems to be updated.
For any processing off-site, appropriate concern must be shown for configuration and software
compatibilities. Cash advances or credit should be handy to provide air tickets for personnel to fly
out suddenly. Communications, work-flow, controls, and security will become important items
requiring attention.

The emergency procedures note-book, like the whole contingency plan, is designed to limit losses. It
should he available to console operators, shift supervisors, and operations managers. Included in it should
be sections dealing with fire, water, flood, bomb, threats, smoke, dirt, storms, electric problems, air-
conditioning failure, building hazards, communication facility problems, hardware malfunctions,
evaluation of the building and entry procedures, and other emergency situations ( The section on other
situations, could deal with radar interference, magnets, backup tapes, situations involving off-site data
storage vaults, lack of supplies and forms, vandalism, theft and fraud ).

Much of the information incorporated in these sections previously exists within the organisation in various
shapes and forms, and in various degrees of completeness. By consolidating all the information and by
assembling the best for each source, it would be possible to produce a useful reference.

In an emergency, things usually go from bad to worse. Taking hasty steps, by-passing normal
precautionary measures and making faulty responses aggravate the situation. But the emergency
procedures notebook will certainly help to avoid this.

By assigning specific contingency responsibilities in advance, of major emergency, it will do much


towards the elimination of chaos and confusion. In the plan, and in the emergency response notebook, each
response action should be spelt out in detail, after thinking them out under calm conditions.

1.2.7 Compiling the Package

In order to achieve ultimate restoration of the data processing operation, one needs to be able to replace
damaged or destroyed facilities. This calls for an up-to-date package of records containing complete
specifications and purchasing information for all resource necessary in the operation. It should include data
for hardware, communication equipment, system software, operating procedures, run instructions and
various logs.

Also to be included are data needed for the reconstruction of files, and for updating, testing and debugging
of programs. One should be certain that the environmental services such as air conditioning and electric
power, as well as paper stock, tapes, discs, printer ribbons, forms and general supplies, are all taken care
of.

Conclusions
Contingency planning is not easy, and it can take a great deal of time for sophisticated installations. But
planning for emergencies is well within the state of the art. The methodology listed here could be of help to
those who wish to take advantage of it.

1.3 DISASTER RECOVERY PLANNING


What follows is, possibly, relevant to any disaster as much as to a cataclysmic event in an Information
System. To convey the idea rather than a standard specimen, only the frame has been sketched in; to be
interpreted, as applicable, to an organisation or a situation.

This submit deals with two aspects : the tasks in planning; and the components to a Plan.

1.3.1 Disaster Recovery Planning Tasks

Disaster Recovery Plan tasks can be visualized to be of six major phases, as below, and detailed later

• Definition Phase;
• Functional Requirements Phase;
• Design Phase;
• Implementation Phase;
• Testing and Activation Phase;
• Maintenance Phase.

Phase I :

In this phase the parameters of all that is to be included is assessed and put in perspective. It would consist
of things like.

• The objectives;
• Terms of reference;
• Planning perspective;

Phase II :

Is possibly the most critical phase to include such sections and activities as

• Making an inventory of resources to be included; e.g. hardware, software, telecommunications


components, data life cycles, also actions or movement like Data Conversion, Movement,
Physical as well as electrical; also procedures and Standard Operating Practices;
• Critical appraisal of the Applications and the installation against recovery Objectives;
• Deciding what is to be covered in the Plan;
• Establish priorities based on the criticality of time frames, threats and Organisation's performance.

Phase III :
Is particularly significant in a plan being prepared for the first time (Note the reference to equipment
alternatives) and would include such things as:

• Identify design alternatives;


• Specify in detail, the alternatives e.g. hardware, software, telecommunication, staffing, rocedures
etc.
• Identify potential vendors : if purchase necessary;
• Analyse risks in various alternatives as well as costs involved to improve to desired levels;
• Select the acceptable design: including financial approvals.

Phase IV :

Would put into action the desired and designed Plan and would be made up of:

• Acquisition of land, building, utilities, hardware, Telecommunications lines etc.;


• Negotiate and sign contracts with vendors, Consultants;
• Writing of Manuals of Procedure;
• Training of personnel;
• Site preparation;
• Development of a test plan;
• Development of a Maintenance Plan.

Phase V :

Is equivalent to a system test run in computer jargon. It will consist of three Segments as below :

Segment 1 : Paralleling;

Segment 2 : Live Testing;

Segment 3 : Maintenance Testing.

Segment 1 :

In this, all activities external to the Complex in Disaster Recovery plan are tested or triggered, such as :

(a) scheduling all 'on call' personnel and practising the drill;

(b) triggering arrangements external to the complex;

(c) practice 'back-up' by invoking them and working job;

(d) validate adequacy of back-up by comparing with a live job selected at random;

(e) correct errors in plan if any;

(f) repeat (d), (e) till snags, complications and so on are removed and simple, streamlined
Standard Operating Procedures emerges.

Segment 2 :

"Set the dogs free" and simulate a breakdown; the following actions will be included:

(g) attempt to run using the plan only;


(h) correct defects, if detected: retrain if necessary and review Standard Operating
procedures;

(i) repeat (c) to (h) till drill is free of all bugs, and is simple, reliable, economical and
effective.

Segment 3 :

Is in the best traditions of Management Science and is invoked on two occasions : as a routine, and
whenever there is a change as below :

(j) repeat (g) to (i) annually;

(k) repeat (c) to (f) whenever there is a revision to Plan.

Phase VI :

It is not strictly a part of the planning, as no tasks are performed. It is a development of philosophies during
the implementation phase and applied as on-going activity. The stress will be on the software and in this
connection, two books must be maintained : software Change authorisation and software packages. Other
items needing constant maintenance are:

• Names, titles, Media etc.;


• Back-up library (data systems, Application Software etc.);
• Documentation and Standard Operating Procedures.

1.3.2 Disaster Recovery Plan Components

In this Section, the sub-divisions to the plan manual are purely recommendatory and are guides only.
Twelve facets are identified :

Section I :

Statement of Purpose

• Objectives
• Scope, constraints
• Priorities

Section II :

Would describe all Hardware in use

• CPU/S;
• Peripherals etc.

Section III :

Could be devoted to a description of the Telecommunicaions component of the complex and would include

• Message switching;
• Multiplexors, concentrators and the like;
• Diagnostic devices;
• Modems;
• Terminals and the like;
• Protocols;
• Lines, channels and circuits.

Section IV :

Off-line devices like data conversion, Data entry, plotters etc.

Section V : All "firm-ware" should be described.

Section VI : Should describe all software like:

• Operating Systems;
• Compilers;
• Utilities;
• Data Base Management and Communications Management;
• Full details of Applications - Source, Object, etc.

Section VII : Every form used in the complex should be covered

• Flat packs;
• Checks;
• Turn-around Documents;
• Input forms;
• Coding sheets;
• Forms used to invoke Back-up etc.

Section VIII :

This will elaborate on procedures, areas with potential for jeopardy to the System and include:

• Operation at Back-up installation;


• Critical procedures for Manual Operations;
• Controls on Data, software etc.;
• Training.

Section IX : Could elaborate on the policy on space e.g.

• Hardware deployment;
• Storage;
• Terminals;
• Off-line devices;
• Clerical areas;
• Forms and Stationery;
• Input/Output controls;
• Repair and maintenance;
• Security.

Section X :
All aspects of the Utilities: Water, electric power, Air conditioning etc.
Section XI :
Personnel aspects and assignment of duties in the various stages of the System

• Recovery Management;

• Site preparations;

• Selections

• Construction.

• Hardware installation

• Telecom installation.

• Stores

• Support Services like typing, reprography etc.

• Administration

• Applications Management-

• The Manager and his responsibility;

• System maintenance;

• System development and review;

• System reconstruction;

• Data Base reconstruction;


• Transaction processing principles, supervision (like transaction authorisation, input preparation,
including conversion/entry, output control error correction etc.);

• Staffing and training.

Data Centre Recovery-

• Installation Management;

• Shift Organisation and supervision;

• Console Operation;

• Scheduling;

• Terminal Access;

• Media Library;
• System Programming;

• Input/Output Control etc.

Plan Maintenance-

• Overall responsibility;

• Applications responsibility;

• Installation responsibility;

• Plan testing and review.

1.4 SUMMARY
This unit introduces you to the techniques of risk analysis and disaster planning. With examples it explain
various components of risk analysis and their usefulness. Briefly it also discussed about disaster recovery
planning and its requirement.

You might also like