Meticulous planning, constant testing ensure optimum IT system uptime

Jul 11, 2007

Three years ago, LifeBridge Health commenced a major initiative to completely computerize the four facilities under its banner and end its dependency on film and paper.

The Baltimore-based healthcare system already had installed a PACS and, among other plans, wanted to add computerized physician order entry (CPOE) across its campus. To ensure continuity and support for its current and future radiology and business needs, LifeBridge needed optimum, yet realistic, goals for uptime to derive the most benefit from its investments.

LifeBridge includes four healthcare facilities with a total of approximately 1,200 beds. Sinai Hospital is the largest institution, with 467 acute care beds, while Northwest Hospital offers 195 acute care beds. Also under the LifeBridge banner is Levindale Hebrew Geriatric Center and Hospital, a 292-licensed-bed facility, which specializes in comprehensive nursing home care, and Jewish Convalescent & Nursing Home, which has 147 licensed beds.

LifeBridge's chief financial officer (CFO) was involved from the start to help determine the investment needed to establish optimum uptime goals, relative to disaster recovery and business continuity, "because that would drive the price of the infrastructure we were going to invest in," recalled Chris Panagiotopoulos, director of information technology at LifeBridge. "The majority of planned or unplanned downtimes are created by ourselves, whether they are upgrades, database corruption, or component failure."

After negotiation, the target became 99.6% uptime. While that figure may appear low -- considering that healthcare facilities strive for 99.99% uptime -- the number still represents 8,724 hours of annual uptime and only 36 hours of downtime for a full year.

Productive downtimes

"The point is, we knew we had 36 hours for upgrades, preventive maintenance (PM), and unplanned downtimes throughout the course of the year," Panagiotopoulos said at the 2007 Society for Imaging Informatics in Medicine (SIIM) meeting in Providence, RI. "It was up to us to use that time as we saw fit. It made us more efficient when it came to upgrades and PM downtimes, because every time we did, it started to chew away at those 36 hours."

In addition, LifeBridge wanted to minimize downtime to six hours for its critical business, clinical, and registration systems -- every system necessary to keep the business operating smoothly. "From a disaster recovery perspective, the expectation was 100% capacity production," Panagiotopoulos said. "So, in the event we had to move over to our disaster recovery site, the organization did not want to have degraded capacity on its servers."

LifeBridge evaluated the capabilities of its existing technologies and infrastructure. At the time, the healthcare system had what Panagiotopoulos described as "two fairly robust servers for our clinical system, and we had it at high availability." There were 5 TB of storage capacity, and LifeBridge copied data off its tapes every day for backup.

Early tests to determine how long it would take to restore the system did not fare well. Initially, the IT team figured it would take eight to 10 hours. The first several tests took 20 hours to restore what at the time was a 500-GB database.

To help solve the problem, LifeBridge upgraded its infrastructure with two new servers, which immediately provided more storage capacity on demand. "As the business grew and we had to add CPU and memory, it did require some downtime, but the capacity was already in the new servers," Panagiotopoulos said.

The servers contained prebuilt memory (32 bit) and CPU (64 bit), so LifeBridge had additional capacity in place to expand. "We are now running at 32 (bit) and 64 (bit). We are 'maxing' out, because we continued to grow at a fairly good pace," Panagiotopoulos said. "If we had to stop to add memory and CPU, this easily would have represented five to six hours of downtime easily."

LifeBridge also increased its storage capacity from 5 TB to 30 TB by acquiring a Hopkinton, MA-based EMC Symmetrix mainframe and a second unit for data replication and backup.

The healthcare system also upgraded its network from a 100-MB line to a 1-GB line through Comcast. It also installed a 1-GB line from Verizon as a backup and to help share the load. "The year after we signed the contract (with Comcast) … somebody cut a line," Panagiotopoulos said. "The line was cut and the users never saw anything." All the transmission of data was automatically routed to the backup line and users experienced no interruption in service.

While those actions reduced the recovery times to eight hours, it was still short of the six-hour target. Upon further evaluation and two weeks of practice testing, LifeBridge improved and perfected the documentation process to the point where recovery time was down to two hours.

While all the upgrades and installations were taking place, LifeBridge also was in the process of building a new data center and creating a read-only database for its clinical system. When planned or unexpected downtime occurred, users were still able to access data during downtime and operate with minimal disruption to patient care.

PACS participation

With regard to its PACS, LifeBridge has three servers -- two at Sinai and one at Northwest -- and a total of 20 PACS workstations to acquire, process, and store images in the database. Once again, having the multiserver configuration was a blessing in disguise. "Once we implemented our initial PACS, within the first six months, we had some challenges and some system issues," Panagiotopoulos said.

If and when one system failed, another server covered the demand to minimize, if not eliminate, disruption in patient care.

LifeBridge's read-only capabilities also proved beneficial for its PACS in case of downtime. The healthcare system replicates data on two different servers, alternating one server to the other, every two hours.

"The good news -- or bad news -- about replication is that if you have corruption on the production side, there is instantaneous corruption on the DR side," Panagiotopoulos noted. "If you go back two hours or four hours and you have a copy of that database on your spinning drives, you can recover quickly."

As for the confirmed savings, Panagiotopoulos said LifeBridge has saved $400,000 on film costs since implementing its initiative.

The next target is to increase uptime to the holy grail of 99.99%.

By Wayne Forrest
AuntMinnie.com staff writer
July 12, 2007

IBM debuts grid archiving service for healthcare IT, May 16, 2007

Planning for PACS downtime eases impact on operations, March 22, 2007

EHR deployment presents familiar obstacles, February 27, 2007

Choose image archiving solutions before data volume grows too large, January 5, 2006