In the Beginning
The distributed information systems experimentation (DISE) team, at the Naval Postgraduate School, uses knowledge management to plan and execute Department of Defense experiments. For example, the annual Trident Warrior series involves year-round coordination with geographically disparate organizations and personnel using both classified and unclassified networks.
During planning, execution, and the post-experiment stages of TW, data is collected through online feedback forms. The analysis of this and other collected data culminates in multiple decision informing reports.
DISE has been on the forefront of using KM for nearly a decade. The initial DISE team implemented a KM solution called FIRE (FORCEnet Innovation and Research Enterprise) in 2004 with the Oracle Collaboration Suite for Trident Warrior planning. OCS is comprised of a “shrinkwrapped“ server with portal and collaboration applications. Before the advent of Web 2.0 collaboration and the flexibility of service oriented architecture, FIRE was considered somewhat advanced, and it served the needs of KM well.
The core backend of OCS is the database server which made Oracle a good choice. The "out-of-the-box" software was both a plus and a curse. For those just beginning KM and collaboration, the out-of-the box approach simplified many aspects of maintenance, but could also serve as a straightjacket because of the inability for customizing to meet ever-changing user requirements. Nevertheless, the simplified approach allowed extensive portal post-development work that included the development of forms and reports needed by the experimenters.
Over time the FIRE solution grew into an enterprise-wide product. With growth came production challenges, including the need for hardware upgrades; software updates and security patches; backup and recovery decisions; and a requirement for help desk support. In addition, because DoD-wide systems must be certified and accredited according to information assurance policies to operate on a defense network, we had to satisfy security requirements.
Many of the collaborative and data collection tools were limited on FIRE, and not flexible enough to expand capability and functionality when needed. The options for backup and recovery solutions were limited. In addition, our servers were reaching the five-year end of life cycle, and there was an opportunity to migrate to faster and cheaper servers to meet the new demands.
Since the heart of the system is the database, we decided to stay with an industry leader, so we chose Oracle Database 11g. The added bonus to using Oracle 11g is that the Navy has an Oracle Database Enterprise License which provides significant benefits, including substantial cost avoidance for the DON. (See page 66 for more information about the Navy's Oracle Enterprise License Agreement which requires mandatory use for Navy programs and activities covered by the agreement.)
For designing the next-generation architecture to build around the database, we determined that the system must:
• Work in three enclaves, unclassified, secret and top secret, with minimum IA work.
• Be easily maintained by a small team of IT personnel and faculty but still be scalable.
• Minimize the number of single points of failure and offer a robust backup and recovery capability.
• Leverage current developers' knowledge and our development investment.
• Take advantage of new technologies, such as SOA and Web 2.0, while maintaining a bridge from the legacy solution to the new.
• Include developer tools for simple applications used to import Microsoft Excel and Access files.
• Use open source standards as much as possible.
• Work with a wide range of operating systems from Windows 7, to Snow Leopard, to Red Hat Linux. The platform must support a wide-range of developer tools and languages, such as Java, JavaServer Faces and C#, and be able to integrate third-party apps, such as Microsoft SharePoint.
• Minimize licensing costs.
• Incorporate strong help desk and vendor support.
Because our team is small, and we are risk averse, we took an evolutionary approach. There are many good KM collaboration products, but we used a conservative approach while creating options to add best-of-breed products when users required additional capability.
We stayed with Oracle products to minimize costs for licensing and training developers and maintainers because we concluded Oracle’s features best met our requirements.
While the products are Oracle-specific, they are Java-based; therefore, the chance of vendor lock-in is reduced, and we could use other products, for example, from IBM or Microsoft, with the new KM portal. Because of our conservative approach, we reluctantly ruled out virtualization for this KM portal iteration, but it will remain a goal for future upgrades.
The Oracle solution consists of two major parts each connecting to an Oracle database. One part is Oracle Fusion Middleware that consists of the WebLogic Server, and the Oracle Portal 11g, which supports knowledge management efforts with seeded portlets, forms and reports.
Oracle Portal also comes with a suite of developer tools, including JavaServer Faces and Application Express (APEX), which allow custom development.
The second part of the new solution is collaboration and social networking capability using Oracle Beehive for chat, team workspaces, tasks, wikis, discussion groups, content management and Web conferencing. These features are tied together with Oracle Single Sign-On and Oracle Internet Directory (SSO/OID) which provide a single sign-on, not only for Oracle products, but any third-party products, such as SharePoint, that you might decide to add later.
Research and Learning
Even if your staff is experienced in IT management and programming, reaching a solution requires extensive research, learning about options and analysis. Prior to proposing an architecture to sponsors, a requirements and feasibility study should be conducted with a review of technology candidates that includes how easily a proposed technology can be bridged to your legacy system.
Technology is always in a state of flux so there is a point you just have to pull the trigger and go with the best technology at the time. Great care should be taken to explain backend improvements from a management perspective because users naturally concentrate on user enhancements and not necessarily in understanding management issues related to selecting enterprise software.
Production systems cannot just be thrown away and replaced by a new system, so decision makers must carefully consider how a new technology can be implemented without risking the whole enterprise.
Both Oracle and Red Hat products can be downloaded and used at no charge if you are testing concepts. Oracle also provides many of its applications (pre-built appliances) virtually through VirtualBox (www.virtualbox.org/), including Oracle 11g, WebLogic, TimesTen, APEX SQL Developer Data Modeler and JDeveloper. VirtualBox allows a quick way to explore software candidates without extensive installation or new hardware.
Once the basic architecture and requirements are determined in the critical research and learning stage, you will need a test lab. Our plan was to first build a lab to test various vendor software and hardware candidates, and on successful completion, move them to the production environment.
To build the lab we salvaged old server hardware and restored hard drives using SpinRite, a disk recovery tool. For some older servers we had to upgrade memory and central processing units. We then rack mounted all the servers, added a network switch, a kernel-based virtual machine, and an uninterruptible power supply to make the lab simulate the production environment.
Our small lab was not only a test environment but a learning one as well. We reinstalled test systems so many times that we could do it almost without thinking about it. We made many mistakes, but they were not costly because they were made in the lab and not in production. Installation mistakes can be fatal in a production environment, but in a test environment, they are instructional.
While you do not need to test the exact same architecture, you must mimic the general design of the production system. We emphasize that you must have test servers available even after moving into production.
Manpower and IA Challenges
Hardware and licensing are but a small part of the cost of any KM system. Initially, one person managed FIRE. But, with growth in the number of users, FIRE went through a major upgrade in 2007 using industry consultants working with DISE team members. By 2008, due to budget constraints, DISE faculty team members learned to run FIRE by attending Oracle and Red Hat classes.
The effort to modernize FIRE was spearheaded by DISE team members who began by researching new collaboration solutions at Oracle OpenWorld 2008. Most DISE team members are faculty who teach two to four classes a year, as well as working on other projects, so we had to identify additional team members for the project.
NPS students were unavailable because most were working full-time on master’s or doctorate degrees. We eventually decided to recruit computer science interns from local colleges to assist. The resulting team was comprised of two faculty members and up to two to four interns. Although we all worked part-time on the project, we worked round-the-clock to ensure rapid progress.
Before going live on operational networks, all our servers went through a certification process conducted by an NPS information assurance team. Because DISE members are typically researchers who work on KM, and not IA professionals, this was particularly challenging. Approximately 80 percent of manpower costs may be related to IA requirements.
Anyone fielding a system in a classified environment should be careful not to underestimate the level of effort required. Instead of a standard Red Hat Enterprise Linux installation, which requires extensive post-installation rework to comply with IA security requirements, we installed a mini version of Red Hat using a custom configuration file known as Kickstart. Kickstart manages the operating system installation process to ensure required files and correct security and network settings are used.
Our Kickstart approach is based on the Defense Information Systems Agency’s DoD Bastille, one of the projects of Forge.mil (www.forge.mil/), a DISA collaboration activity designed to improve the ability of the DoD to rapidly deliver dependable software, services and systems in support of net-centric operations and warfare.
Forge.mil capitalizes on concepts proven in open source software development that have already reaped tremendous benefits for software and technology development communities. Along with accelerating technology development and fostering innovation, Forge.mil can also enable early and continuous collaboration and information sharing among all stakeholders in a secure development environment.
Bastille integrates the specific security, technical and implementation guidelines required by DoD. We further modified Kickstart so that the Red Hat installation would also meet Oracle technical requirements. The Kickstart file can be reused for other OS installations with minimum modification, thus ensuring consistency and simplicity that would automatically implement our security and technical best practices.
Using Kickstart was only a first step; we then had to run DISA’s approved Security Readiness Review (SRR) and Retina Network Security Scanner. The SRR produces a detailed report and delineates alerts based on severity as Category 1, 2 or 3, with CAT-1 as the most severe. Retina reports are more detailed and provide more specific fixes. Since the two software tools detect virtually all possible security holes, the reports are extensive, with each IA alert corrected one-by-one.
We resolved most of the more complex alerts with help from DISA documents and NPS faculty members, who were Linux experts. At times, the IA solutions made our system unstable so we had to reverse some changes. For this we used a server with a RAID 1 mirrored pair and always maintained a system backup to revert back to the original version. RAID, an acronym for redundant array of independent disks, is a technology that provides increased storage functions and reliability through redundancy.
Oracle server software uses a wide range of ports that may not be compatible with DoD network policy so it is important to consult with your network administrator and IA office. We recommend that you illustrate your architecture using Microsoft Visio, or a similar program, so that all parties have a clear understanding of the proposed architecture and how it fits into the overall infrastructure.
Oracle default ports can be changed, but this should be done prior to installation. The installation may need a reverse proxy; we used Squid, an open source high-performance caching proxy server designed to run on Unix systems.
Opening a port is complicated and tedious; involving a lot of time and paper work, so the Squid reverse proxy server was a quick and adequate solution. Our solution involved four physical servers residing on the DMZ and the internal network.
A DMZ, or demilitarized zone, is a subnetwork that exposes an organization’s external services to a larger untrusted network, usually the Internet; it also adds an additional layer of security. We recommend that the database and Oracle Single Sign-On and Oracle Internet Directory reside on internal networks.
The Oracle Portal and Beehive should reside in the DMZ. To get this configuration to work properly, we had to construct unique port configurations. Finally, all Web connections had to be encrypted connections (HTTPS), which required testing DoD root certificates in a development lab to ensure trusted site identification for users accessing FIRE, and generating new certificates for production servers.
Handling Database Growth
Originally, the servers were running from a single server with approximately 300 gigabytes of storage. We quickly realized the storage disks were reaching capacity, and we were routinely deleting files to increase available disk space.
In our lab we tested a Dell disk storage array dedicated to the database that gave us more room to work with and expand. We are also exploring automatic storage manager (ASM) solutions to seamlessly grow Oracle database storage. Changing RAID configurations to allow growth might be difficult or impossible after installation so it's better to get it right the first time.
Selecting the right hardware and software greatly depends upon the quality of support available for your system. For Oracle support we purchased a plan for 24/7, 365 days of online and telephone support. The Oracle support team has experts for each of its applications, and we talked to them whenever we had a problem to resolve.
In some cases, Oracle support staff would call us or set up a Web conference if we were dealing with an issue that was critical to our mission. They were also great teachers if we didn’t understand a concept. For this support we chose a pricing model based on number of users as opposed to number of processors. This worked well within our budget, and it is the cheapest way to go if you have less than 300 users.
In addition to the savings we obtained by using the Navy Oracle Database Enterprise License Agreement, we used contract models, like the GSA Schedule, to bring costs down.
Testing Your System and Going Live
Even with a well-planned system, it is vital that you thoroughly test and not only within your environment. We tested our system on the Navy Marine Corps Intranet and found additional port problems so having users access the system via different networks is important to eliminating problems.
We made a major effort to respond quickly to our representative users to quickly resolve any connectivity problems, but it may take several weeks to resolve site specific network or specific client problems. The success of these efforts begins with a well-written test plan.
With little developer work required, Oracle Beehive provides collaboration and social networking tools right out of the box. Some development work is required for the Web portal to provide users with dynamic content and an interactive experience. Portal 11G offers developers a variety of tools and seeded portlets to capture and display data and reports.
We chose the Linux operating system to run Oracle products, and we used the NPS Information Technology and Communications Services (ITACS) contract with Red Hat, Inc. Red Hat’s Web and phone support is comparable to the support that Oracle and Dell offer.
On the hardware side, we chose Dell servers. ITACS has a contract with Dell, and Dell support is provided around-the-clock. We could also get replacement parts within one workday.
The architecture leaves a small foot print, just 5U (a rack unit measurement), consisting of five rack-mounted PowerEdge R610 servers featuring reduced power consumption. The 610s offer room to grow with two sockets and up to 48 gigabytes of memory. The servers have room for six 2.5-inch Serial Attached SCSI (SAS) drives that allowed us to configure RAID 1 mirrored (two) hard drives for the operating system and RAID 5 striping (four) hard drives for the databases and Web servers. This configuration allowed some fault tolerance in case of hard drive failure.
SAS is a computer bus used to move data to and from computer storage devices. In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, in a way that accesses of sequential segments are made to different physical storage devices.
Striping is useful when a processing device requests access to data more quickly than a storage device can provide access. By performing segment accesses on multiple devices, multiple segments can be accessed concurrently. This provides more data access throughput, which avoids causing the processor to idly wait for data accesses. Striping is used across disk drives in RAID storage, network interfaces in grid-oriented storage, and RAM in some systems.
The DISE and ITACS teams signed a memorandum of understanding that enabled us to host servers in a production grade network operations center. The temperature and dust-controlled NOC has high-speed connectivity to the Internet and to military (.mil) networks. In addition, it provides power backup using batteries and generators.
Backup and Recovery
Because we are running several different servers, we had to implement separate backup strategies. For the Oracle Database 11G, we used information gleaned from the database training class and ran the database in archive log mode, which enables hot backup (also called a dynamic backup), point-in-time recovery and scheduled daily backup.
A hot backup can be performed on data even though it is actively accessible to users and may be in a state of being updated. Hot backups can provide a convenient solution in multi-user systems because they do not require downtime, as does a conventional cold backup.
We use Dell Storage Enclosure to store backups on a single array. The servers are connected to the single storage array with multiple disk controllers for redundancy. We also produce large capacity tape drive backups for off-site storage.
We believe that a small team can create an effective enterprise KM system if the team plans well and addresses the issues we discussed. Critical in all of the efforts is a robust test environment and strong support from your IA team.
Nothing we did was extremely technical. If your team is willing to do some “homework” and has the patience to persevere though occasional setbacks, the reward could be a KM portal that will specifically meet the needs of small groups doing important work.
Arijit Das is a faculty member in the computer science department at the Naval Postgraduate School.
Tony Kendall is a faculty member in the information sciences department at the Naval Postgraduate School.