Excerpted from the book Disaster Recovery Testing: Exercising
Your Contingency Plan, available from The Rothstein Catalog On Disaster
Recovery. Copyright 1994 and 1996, Rothstein Associates Inc. This article
also appeared in InfoSecurity News Magazine, March/April, 1995.
It should be no surprise to anyone in the business
world that disaster recovery testing is susceptible to the inter-personal
and inter-departmental give-and-take which characterize most any significant
organizational endeavor. In that the degree of maneuvering, posturing
and negotiating seem to correlate to the importance of (and effort required
for) most any corporate undertaking, it should not be surprising that
disaster recovery testing engenders more than its share of controversy.
Moreover, the all-too-common perception that disaster recovery testing
(let alone planning) is a discretionary investment adds to the difficulty
in traversing the minefield of corporate politics. It should also be noted
that many of the issues addressed in this article are not unlike those
faced in developing and implementing the continuity or recovery plan in
the first place.
Commitment and Motivation
It is absolutely essential to the process that the commitment to recovery
testing be authentic and clearly communicated prior to beginning the testing
process. All too often, the commitment is either shallow or implied, thereby
dooming the testing program, and likely the entire disaster recovery capability,
to failure. The direct result is most certainly to be failure of the disaster
recovery program when it is needed most, that is, as a critical tool during
an actual disruption.
A critical distinction between effective or ineffective
recovery testing programs can be observed by answering the question:
Is the disaster recovery testing program designed
as a tool to be used during an actual recovery, or as evidence?
The incompetent recovery testing program is, as often
as not, a direct result of a powerful, organizational incentive to produce
tangible documentation rather than to produce less visible (but far more
critical) changes to the organization. In other words, given the task
of putting together a testing program, a manager or staff member is most
likely to be motivated to deliver an impressive paper document. Of course,
the same incentives apply to recovery plan development.
document may or may not be important to the organization during an actual
disruption. Far more likely to be vital to recovery is the team experience
and process shakedown from exercising the actual testing program. Therefore,
one should be conscious of this common, underlying pressure to produce
evidence to satisfy management, auditors, regulators or others evidence
which appears to meet the stated demands, but which in fact is unlikely
to work when it is most needed.
Where the contingency testing program
(and presumably the underlying contingency plan) fall into this 'evidence'
trap, this is a clear symptom of a weak or absent commitment from top
management. The factors which motivate top management to commit to recovery
testing are closely related to the motivating factors for implementation
of the recovery program in the first place. Top management must be made
to understand that (1) an untested contingency plan is unlikely to succeed
in an actual recovery; (2) testing (and, for that matter, plan maintenance)
is an integral part of the plan development and implementation process,
and not an option; and, (3) an untested contingency plan could, in an
actual disruption, turn out to be dangerous as a result of unverified
The factors motivating an organization
to implement a continuity exercise program often do not carry much weight
with business partners, vendors or other departments. Motivating these
outside parties to participate even minimally in a testing program could
take more effort than any other aspect of the test planning and management
process. With outside vendors or business partners who do not respond
to the obvious arguments for their active participation in the testing
program, the simplest method to overcome resistance or apathy is typically
to point to competitors who would be eager to participate. With other
departments or divisions, pointing out the potential direct, tangible
impact on them of a failure of the testing department or business unit
to recover from disruption is usually sufficient. If all else fails, appeal
to top management is prudent providing, of course, that top management
is truly committed to continuity testing in the first place.
The second organizational issue to tackle in the
process of implementing and operating a disaster recovery testing program
is usually to determine the relative priorities and sequence of business
areas, functions, locations or processing applications to be tested. Theoretically,
the earliest test subjects should be those which are deemed most critical.
In practice, visibility and the potential for embarrassment are far more
likely to be motivating factors for choosing (or avoiding) a functional
area than is criticality.
On the other hand, the earliest test
participants, if even modestly successful, can serve as valuable role
models (if not outright advocates), to encourage other business areas
to understand and appreciate the value and benefits of testing. Therefore,
some initial consideration should be given to areas which are likely to
benefit significantly and directly from the testing process and to have
a positive testing experience, even if these areas are not the most critical.
Consideration should also be given to those managers who are on the fence,"
likely to be won over and to become enthusiastic supporters of recovery
An additional consideration under
the heading of embarrassment" is that the first few tests are most likely
to be the most awkward and cumbersome. In other words, the first areas
tested should be aware that they are guinea pigs." This can be used to
advantage: these early participants are, in effect, being asked for their
assistance in developing and implementing a workable testing process.
Therefore, the pressure on the business area (and potential for embarrassment)
is easily deflected. Table-top exercises and structured walk-throughs
are ideal for these first few tests and, with little extra effort, can
even be made quite enjoyable as well as productive for the participants.
Specific, tangible and realistic
objectives should always be established for each test cycle. Further,
intangible objectives should be considered, which are likely to motivate
participants. At the least, the knowledge that their functional area has
been tested first and is therefore in better shape to withstand a disruption
than other business areas should at least in theory be appealing.
Especially during the initial rounds of recovery
testing, line managers and other test participants are likely to be acutely
conscious of the potential to be put on the spot without being sufficiently
knowledgeable or prepared. For a testing program to be successful, test
participants should feel comfortable being open and uninhibited, especially
in identifying weaknesses or shortcomings as well as opportunities for
improvement. Clearly, it is inherently uncomfortable for most people to
acknowledge their shortcomings. Further, given that these participants
probably have had little, if any prior experience conducting disaster
recovery tests, this comfort level is likely to be quite low.
One approach to increase this comfort
level, and thereby increase the productivity of the initial tests, is
to assure the participants of some limited degree of confidentiality.
With the concurrence of their upper management, the initial recovery testing
process can be positioned explicitly as a learning tool (which, of course,
it is anyway). If the participants are assured of the confidentiality
of the appropriate aspects (i.e., the most potentially embarrassing aspects)
of their participation in the testing process, they are far more likely
to participate fully and willingly. After the initial round(s) of testing,
of course, the need for as well as desirability to maintain this level
of confidentiality should diminish.
Confidentiality may also be a factor
when an exercise program crosses organizational, divisional or departmental
boundaries, and even for participating client or vendor organizations.
Negotiated, carefully defined ground rules may be needed to address confidentiality
during the design of the testing program. Obviously, simulated or modified
test data should be employed wherever practical, providing that this substitution
does not materially alter the process or outcome of the exercise. At the
conclusion of the exercise process, an explicit procedure for destroying
any confidential test data should be included.
For any disaster recovery testing program to prove
effective, it must be ongoing and consistent. The reality of most organizations
is that planning and testing for a potential event which may never occur
can easily slip down in the priority list when stacked against day-to-day
urgencies. Therefore, it is generally advisable to address consistency
and discipline early in the disaster recovery testing program. It is a
pretty safe bet that management's focus will shift away sooner or later,
even if testing starts out as a high priority.
One useful method to at least maintain
an ongoing level of awareness and discipline is to document and budget
a recovery testing program up front as an ongoing, multi-year program.
When presented with a well thought-out, continuing plan, top management
(as well as management of affected business or functional areas) are much
more likely to stick with the program.
One specific technique which has
proved useful in many organizations is to regularly refer to tests or
test phases with terms which avoid the inference of completion. Such terms
as interim," strawman," or trial" may be used to logically set the stage
for the next iteration of testing.
Another technique which works well
to focus attention on continuation of the testing process, is to specifically
define each test as a step in the overall testing scheme, that is, to
spell out a long-term testing program. Of course, the contingency planner
should stay aware of the risk of blindly following the approved testing
program without periodic reviews and retuning. Alternately, this can be
expressed as a percentage of a complete" testing cycle where the end of
one cycle becomes the beginning of the next cycle - the classic analogy
of the bridge painter starting over again after reaching the end of the
bridge may be appropriate.
Continuity / recovery testing is not immune from
politics or personalities, and any contingency planner who assumes otherwise
will fail - and, as often as not, be out of a job. The failure of many
contingency planners is less often in their awareness of politics and
personalities than in their willingness to apply the management and interpersonal
skills necessary to overcome resistance.
Copyright (c)1997-2003, Rothstein Associates Inc. All