To prepare for natural disasters, catastrophes or other events that could disrupt operations or cause loss of data, NAVOCEANO reengineered infrastructure, procedures and processes to mitigate the impact of disaster-related events on its computing infrastructure …
Mix a Category 4 hurricane with network equipment, and you have a less than ideal environment for mission essential oceanographic data processing. But as recent events have shown, operations can continue with proper planning, preparation, teamwork, improvisation — and a little bit of luck.
Supporting Fleet Operations
The Naval Oceanographic Office (NAVOCEANO), located at Stennis Space Center, Miss., is responsible for collecting and processing real-time oceanographic data to provide up-to-date environmental information in support of U.S. Navy fleet operations. To accomplish this, NAVOCEANO has a fleet of seven ships to conduct oceanographic survey operations.
NAVOCEANO has two primary information technology (IT) components: N6, the NAVOCEANO Engineering and Systems Department, which directly supports the office's primary oceanographic production capability, and N7, the NAVOCEANO Major Shared Resource Center (MSRC), which is the major Navy component of the Department of Defense (DoD) High-Performance Computing (HPC) Modernization Program. The MSRC fulfills NAVOCEANO's supercomputing requirements. Both departments work together to form a complete IT capability for NAVOCEANO.
The Oceanographic Information System (OIS) constitutes NAVOCEANO's scientific computing enterprise. The maintenance, sustainment and operations of the OIS are conducted by the Information Systems Division. To prepare for a disruption to operations that could be caused by natural disasters or other catastrophic events, the Information Systems Division instituted redundancy for its computing infrastructure.
Several years ago, the OIS underwent major reengineering to migrate the majority of its storage capacity to storage area network, a technology that moves storage from individual workstations or servers to centralized managed areas that have built-in redundancy and backup.
DoD regulations and directives require that owners of information systems prepare a disaster recovery (DR) and continuity of operations plan (COOP) for the possibility of a destroyed or disabled IT infrastructure. While our DR and COOP are not fully complete, NAVOCEANO has placed significant emphasis in this area over the past few years and has made great strides in this direction.
In fiscal year 2001, with assistance from the National Technology Alliance (NTA), the office conducted an evaluation of internal IT capability. Included in the study was an analysis of data recovery and restoration as well as recommendations regarding practices and procedures. NAVOCEANO combined recommendations from this study with DoD directives, and developed appropriate practices and standardized operating procedures.
This advance planning and preparation were crucial, enabling NAVOCEANO to immediately restore IT capability during and immediately after Hurricane Katrina.
Hurricane Katrina struck the Louisiana and Mississippi Gulf Coast on Monday, Aug. 29, 2005. The timing and path of the storm were troublesome, given that the storm did not take dead aim at the Central Gulf region until Aug. 26. This left little time for preparation.
Management executed the first stage of preparations Aug. 27, with phone calls going out to all IT personnel to ensure employee safety and sufficient manning at the site. Phone calls were made again on Sunday to ensure all the bases were covered.
As a result, NAVOCEANO had 20 percent of its IT staff on-site during the hurricane and confirmed that most of the remaining personnel had left the area for their safety. At this stage, the National Hurricane Center predicted that Katrina would hit late morning Aug. 29. The IT staff, who would ride out the storm on-site, arrived with their families on Sunday afternoon, settled in and prepared for the next day.
The three things IT capabilities need are: power, communications and a cool dry environment. Unfortunately, we were about to lose all three.
At 0705 the first major squall line passed through the Stennis Space Center area and immediately took out the commercial power feeds to our facility. Although NAVOCEANO is equipped with a number of backup generators, the transfer switch of one unit failed. As Murphy’s Law would dictate, this unit supplied power to the N6 servers and communications devices, and resulted in complete IT systems failure.
Something we refer to as a “hard crash.”
Generator power was soon reestablished.
Then the IT team had to reestablish capability. We had adequate staffing on hand but not our full scope of expertise. At this stage, documentation and procedures were critical. Over the next 16 hours, the IT staff repaired file systems and brought up 80 servers and more than 100 terabytes of data back online as well as all internal communications capabilities.
During this time we discovered that we were taking on water. Upon further examination, we found that water was coming in from a door on the floor above us. The door, an emergency fire exit, was bearing the full brunt of Hurricane Katrina’s winds and rain. The water was coming in around the gaps of the door as if a fire hose were directed at it from the outside, and there was no stopping it! If you can’t stop it, mop it — and we did — for the remainder of the storm.
The MSRC provides supercomputing and petabyte-scale storage for the Naval Meteorology and Oceanography Command and for the DoD research and development community. MSRC support includes operational modeling for both NAVOCEANO and the Fleet Numerical Meteorology and Oceanography Center.
This critical facility was reengineered in two phases several years ago to “harden” it (hardening, in this instance, refers to what must be done to secure a system or facility) against natural disasters and other catastrophes.
In the first phase, all permanent data and archival storage capabilities were moved to a hardened facility capable of withstanding Category 5 hurricanes and tornados.
In the second phase, the MSRC relocated a subset of the computational systems and high-speed networks that could backup any losses in the primary center. Uninterrupted power supply and diesel generators support the hardened facilities.
Is anybody out there?
As the eye of Hurricane Katrina approached, we steadily lost our ability to talk with the outside world. In addition to losing our external network communications, we lost telephone service. Enter the MSRC. During Katrina, the MSRC operated without failure. The only weak area was the loss of external commercial carrier interexchange facilities from Gulfport to New Orleans.
Within 48 hours of Katrina’s landfall, the MSRC obtained mobile satellite communications facilities that were used to provide phone services and data connectivity to the Defense Research and Engineering Network and the Internet.
Within 96 hours, our organization resumed general network connectivity and interim phone communications via mobile satellite. We were back in business.
NAVOCEANO is currently working with NASA and regional telecommunications carriers to install and activate diversely routed local and interexchange carrier services for Navy activities at Stennis Space Center to avoid complete dependence on the Gulf Coast commercial communications infrastructure.
NAVOCEANO is the Navy’s center for ocean modeling, and it executes a large and tightly coupled suite of databases and applications that work together.
These include systems that ingest ocean observations from deployed sensors and satellites through data assimilation and numerical modeling such as these below:
•Modular Oceanographic Data Assimilation System (MODAS), the Navy’s real-time primary source of sound speed fields;
•Altimetry Data Fusion Center (ADFC), the collection point of satellite altimetry data to drive the ocean models;
•Navy Coastal Ocean Model (NCOM) and Navy Layer Ocean Model (NLOM), the world’s only operational global ocean models;
•Shallow Water Analysis and Forecast System (SWAFS), the Navy’s only operational three-dimensional data assimilation ocean model.
All of these systems (and others) work together to produce a comprehensive global picture of the ocean environment.
Individuals from the Oceanography Department and its supporting contractors, as well as their colleagues from the Engineering Department, MSRC and Warfighting Support Center, worked to bring the Navy’s daily ocean model forecasts back to operational status in the days immediately following Hurricane Katrina.
Their expertise in areas ranging from real-time satellite and in situ data feeds, to computer systems and ocean modeling, was essential in making sure that the interruption to Navy consumers of daily analyses and forecasts of sound, speed, currents and waves was minimal.
The keys to the success of the recovery were found in the human capital of the oceanographers and physical scientists, whose combined understanding of the diverse fields of science and the computing software base led to resumption in essential operations.
While a complete configuration managed software baseline is necessary for recovery, it is not sufficient. Also required are the boundary conditions for the models, and the skill to select and calibrate the proper starting conditions to monitor the models’ progress as they reconverge to an accurate state.
What is noteworthy is that many of these individuals worked tirelessly with the full knowledge that their homes and possessions had been destroyed by the storm.
The full story of our recovery will be fully documented in the context of lessons learned. But without preparation, we would not have been able to easily restore our capabilities — even with a full staff. The ability to adapt and adjust to fit the situation can only be done when you can quickly assess your existing status and know your options.
In the Future
While we do not look forward to the next hurricane, we do take pride and satisfaction in our team and our ability to perform under harsh conditions.
Many of our employees lost their homes, and as an organization, we are focused not just on the full recovery of our production levels, but on the people and family that we refer to as NAVOCEANO.
For more information about the Naval Oceanographic Office (NAVOCEANO) go to http://www.navo.navy.mil/. For more information about the Naval Meterology and Oceanography Command go to http://pao.cnmoc.navy.mil/.