|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
From... How to build computers that don't crash
March 15, 1999 by Stan Miastkowski (IDG) -- Where do you want to reboot today? You don't? Well, it's not exactly a secret that you may have no choice. Every desktop version of Windows locks up from time to time, necessitating the old "three-fingered salute" (Ctrl-Alt-Del) or a complete system reboot. Most of us just live with this inconvenience, but it's not an option for servers running critical applications such as a nationwide telecommunications network, an assembly line on a factory floor, or life-critical medical equipment. Highly reliable redundant computer hardware has been available for years, but creating truly "crashless" computers has required very expensive hardware, specially modified operating systems, and carefully designed proprietary applications. However, Texas Micro says it's developing what it claims will be a virtually crashless server that uses unmodified Windows NT server software and off-the-shelf apps.
The company, which got its start in the eighties building ruggedized PCs for the rigors of oil exploration, says its servers will offer "Five-Nines" availability, meaning they'll be up and running 99.999 percent of the time -- which translates to a total of no more than 5 minutes of downtime per year. A three-fold pathTexas Micro, which intends to ship its servers by year end, is employing a three-tiered approach to reach its goal. The first is using its own high-reliability PC hardware, which includes a ruggedized design and redundant components such as power supplies, and puts almost everything in individual hot swappable modules that can be changed without the need to power down the machine. Next is the Intelligent Platform Management Interface, a system management tool that combines hardware and software to automatically and continuously monitor server hardware. IPMI is designed to anticipate impending failures, as well as detect and recover from many failures as they occur. Third, and the heart of Texas Micro's design, is System Directed Checkpointing technology. A typical SDC system consists of two identical Windows NT servers, each equipped with a special communications board and connected by a cable separate from the main network. The hard drives of the second server are continuously mirrored to the master server. Such mirroring is common in dual-server systems. But SDC goes way beyond drive mirroring: It actually keeps the contents of each server's memory identical to one another. Critical system parameters that can't easily be mirrored, such as the contents of the server cache and the state of the processor's internal registers, are stored in "snapshots" (checkpoints) that are made 20 times a second and sent to the second server. If the second server stops receiving these snapshots, it knows that something's wrong and immediately becomes the primary server in a fraction of a second, taking over from the point before the error occurred. Catastrophe-proof?Of course, the system isn't fully redundant again until the problem that caused the first server to fail is corrected. And what happens if a software problem that caused the first server to fail crashes the second server immediately? A Texas Micro spokesperson says the company's extensive research shows that truly catastrophic server failures of this type are extremely rare, and claims that most failures are extremely short-lived timing problems that are self-correcting. In cases like these, the entire redundant system repairs itself and can be back to normal in seconds. (Some software failures, as well as hardware failures, however, do require human intervention. The monitoring system sends alarms to system managers whenever problems occur. Not for the mainstream -- yetThe company says it can't say yet how much SDC will add to the cost of a fully redundant dual-server setup, but hastens to add that the hardware that system requires is relatively inexpensive. The high-reliability PC hardware needed, however, is another story. The company declined to comment on the additional cost of its ruggedized PCs over off-the-shelf servers. Although Texas Micro will be initially be marketing "Five-Nines" systems to its traditional customer base, the company hopes to be working with major PC makers to eventually make the technology available to servers in small-to-medium businesses. Meanwhile, you and I need to back up our desktop PCs -- and be prepared for the three-fingered salute.
RELATED STORIES: Internet worm can crash corporate servers RELATED IDG.net STORIES: Are you being served? RELATED SITES: Texas Micro Inc.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Back to the top |
© 2001 Cable News Network. All Rights Reserved. Terms under which this service is provided to you. Read our privacy guidelines. |