|
by Philp
Jan Rothstein, FBCI
ING Canada Property and Casualty, part of the International
Netherlands Group, successfully thawed out from the ice storm which paralyzed
dozens of other Canadian and New England organizations in January, 1998.
This article originally appeared
in
INFORMATION
SECURITY MAGAZINE,
March, 1998; Copyright 1998, Rothstein Associates Inc.
Frozen Assets
Snow storms and ice storms are not uncommon in New England or in much
of Canada. The massive ice storm that blasted parts of New England, northern
New York State, Quebec and Ontario in mid-January of 1998 might very well
be one for the record books. Early estimates of overall business and residential
losses are on the order of $1 billion in Canada and $200 million in the
U.S., according to Computerworld Magazine. Extended
power outages in these areas have been particularly painful for I.T. facilities
coping with the frosting. ING Canada Property and
Casualty, a part of International Netherlands
Group headquartered in Montreal, was especially hard-hit. Operation
of their data center in Saint-Hyacinthe, Quebec has been outsourced to
Computer Sciences Corporation (CSC); ING Canada I.T. staff also work out
of the same building. A hot site recovery agreement is in place with Comdisco
Disaster Recovery Services (CDRS). The Saint-Hyacinthe data center supports
all of INGs Canadian operations. The ING Canada Property and Casualty
Group is an industry leader. With more than half a million customers,
customer service and satisfaction are top priorities.
Uh-Oh...
The data center lost utility power quickly when the ice storm hit, at
around 10:00 A.M. on Wednesday, January 7, 1998. Fortunately, an uninterruptible
power system (UPS) and backup generator effectively took over the short-term
power load for the mainframe serving ING. Not so fortunately, this was
not to be a short-term power outage: the ice storm impacted operations
for two weeks. Three major risk factors were identified, according
to Robert Proulx, First Vice President of Technology and Systems for ING
Canada Property & Casualty: (1) power, (2) the telecommunications
network serving all of Canada, and (3) people. An important
characteristic, Proulx noted, was that we were facing a major
issue, because of downed poles and wires in the road, getting people to
the data center. His first assessment was that the disruption would
be a matter of weeks, not days, with the added realization that they could
not run for weeks on one generator even if they could get enough
fuel. In short order, Proulx became aware that Bell Canadas serving
central offices and other essential facilities were also running on generators,
presenting another weak link. Although data center operations staffing
is ordinarily lean ˆ two operators were on duty at the time ˆ even
if we could drive people in, how would we feed and house them? What about
their homes, spouses, children, with no heat or power and with water in
the basement, fretted Proulx.
Phase
I
Proulx advised CDRS Wednesday night they were formally declaring a disaster;
CDRS immediately activated their Toronto/Mississauga, Ontario recovery
center. Although the Saint-Hyacinthe data center was still up and running
on backup power, the situation was tense and uncertain. Martin Goulbourn,
CDRS Vice President, Business Continuity, Western and Canadian Regions,
worried, their main concern and the reason they declared a disaster,
was that they didnt know how long their generator would operate
properly, and if there was an outage on the generator, they wanted to
be prepared so that there would be no business time lost. An added
worry was that the generator fuel tank was only half full, with a run
time of under 24 hours at the time of the power failure. Fuel delivery
within 24 hours was by no means assured.
Phase
II
In the midst of the internal I.S. disruption, business was by no means
as usual. INGs property and casualty claims processing exceeded
five times normal volume. ING elected to continue claims operations straight
through the weekend, compounding the already high stress level on I.S.
Even though the generator and UPS were working well and the fallback capability
was in place, ING prudently elected to bring in a second generator to
back up the first. The second generator was cut in over the weekend to
ensure that it would work if the first generator failed. Proulx and his
staff faced a tough choice: should they continue running live at Saint-Hyacinthe,
or bite the bullet and make an orderly move to Toronto? When the added
risks associated with cutting over to the hot site and back again were
weighted, Proulx determined it would be safer to continue production operations
at Saint-Hyacinthe as long as he had the hot site fallback primed
and ready. During the twelve days at the recovery site, they were running
in parallel at both Saint-Hyacinthe and Toronto. On a nightly basis,
they took backups and refreshed the system at CDRS so they were never
more than about eight hours out of synch with their home systems,
noted CDRS Gilbourn. By the following weekend, utility power was
returning to Saint-Hyacinthe. Nevertheless, the data center remained on
generator power. Proulx worried that utility power would continue to be
unstable for some time as continuing repairs added load to the power grid,
and as weakened or damaged power supply components failed once power was
reapplied. At Proulxs Montreal office, he was unnerved by four power
drops during that one afternoon.
The People
Rest... And Eat... And Bathe
INGs Proulx never stopped worrying about the people who were making
the recovery work. Long hours, tremendous workloads and unreasonable stress
were only part of the problem. Housing, feeding and caring for hundreds
of employees many displaced from their homes and dealing with personal
crises was essential. INGs Human Resources involvement was
essential. We were serving over 800 lunches, 700 dinners and 700
breakfasts each day [at Saint-Hyacinthe]. We even had to install showers.
Many of our people were working fourteen or fifteen hours a day at five
degrees Celsius. Employees were assured their wages were continuing.
Psychologists were brought in to counsel and sustain employees. INGs
proactive, supportive attitude paid off. Up to this, our employees
have been proud and happy. We were succeeding as a result of our people
ˆ especially a handful of key people who really came through.
Plan
Ahead
Fortunately for ING, they had recognized the potential impact of a data
center disruption and had the foresight to develop a data center contingency
plan. With the exception of network switching, a comprehensive exercise
had most recently taken place in May, 1997. The exercised recovery plan
was to play a crucial role in the continuing operation and recovery. Network
switching was the only aspect which had not been exercised. ING Canadas
systems are deployed in hundreds of insurance brokers offices throughout
Quebec. Acting quickly, dial-up modems were deployed to all of these locations
shortly after the storm hit. Thanks to a combination of advance planning,
extensive testing and fast footwork in the clinch to deal with last-minute
revelations, communications were successfully rerouted and, remarkably,
ING Canada Property & Casualty never stopped doing business with their
customers throughout the Ice Age of 1998. What would Proulx do differently,
having been gone through his first trial by ice less than three months
after joining ING Canada? I would get every
detail really well planned, admits Proulx. The overall strategy
was sound, but those niggling details certainly made the recovery more
difficult. Rigorously maintained recovery data equipment, network,
contacts can save a lot of grief. Im going to work
on improving communications even more, adds Proulx. No matter how
timely and effective the communications channels, they could always be
better. Dont assume anything,
dont take anything for granted
urges CDRS Goulbourn.
Exercise....
Exercise... Exercise
Even a well-exercised contingency plan can have glitches. CDRS Goulbourn
noted. They had tested with us previously, and from our perspective
testing is very important. We understood the requirements, we understood
how they were going to execute the recovery and where we could provide
assistance. They determined they needed a very new CISCO router for which
they did not have a spare, they had not included in the recovery contract,
it was not included in the hot site contract. An up-to-the-minute
inventory might have averted this oversight, but Goulbourn notes ...you
cant do that every week or every month. You should have specific
time points or change management checkpoints. While Proulx certainly
will remember that Cisco Router next time, he has little doubt that there
will be other details to handle on the fly. Any contingency plan should
be flexible enough to accommodate these last-minute glitches or oversights.
Goulbourn admonishes Testing is absolutely
critical. While everyone pays that lip service, where it becomes very
critical is in situations like this where you build the relationship and
rapport between the organizations so that when the disaster happens, the
supplier can provide useful support. Exercising pays a lot
of dividends. Building strong relationships and rapport through
mutual exercises is the best way to ensure suppliers can provide useful
support in a pinch. The senior executive focus also paid off. We
were having regular conference calls twice a day with ING and CSC. INGs
Proulx was the one driving those conference calls. In the whole recovery
process he understood what was going on, he was the sponsoring executive
for business continuity in the organization. The fact that it went up
that high in the organization very much showed up in the execution of
the plan.
Footnote: According to
The New York Times (8/17/98), Niagara Mohawk Power Corporation has
spent the most by far [recovering from the January ice storm] - more than
$125 million to repair lines to 120,000 customers who in some cases were
left without power for weeks... The House of Representatives recently
passed a $2.9 billion emergency spending bill intended to help upstate
New York and New England recover from the storm. (Disclosure: Niagara
Mohawk is a consulting client of Rothstein Associates).
For you maple syrup fans, ... an estimated 380,000 taps were lost
in northern New York as a result of the storm.
Copyright (c)2003, Rothstein Associates Inc. All Rights
Reserved.
|