Building a better bio-supercomputer
By Jennifer Couzin
(IDG) -- In December, an Atlanta company called NuTec Sciences acquired the fastest supercomputer in commercial use, an IBM machine capable of 7.5 trillion calculations per second. Only a few computers on the planet possess such speed, and many of those were built to simulate nuclear explosions and test the viability of atomic arsenals. But NuTec is using its supercomputer for biology, not bombs.
The mapping of the human genome has triggered an explosion in data about the nature of life. As pharmaceutical and biotech research companies like NuTec struggle to understand the workings of tens of thousands of genes and the hundreds of thousands of proteins they produce, biology is overtaking nuclear weapons as the field demanding the most sophisticated computers. Every other day seems to bring a new discovery -- like last week's announcement that a biotech company located a gene thought to be responsible for heart-attack-inducing cholesterol. NuTec will deploy its supercomputer to analyze cancer patients' individual genetic profiles to find the most effective treatments for their particular disease.
To capitalize on such advances, technology companies like IBM, Compaq and Sun Microsystems are pouring money into their life-sciences divisions and courting the biotech industry. Compaq and IBM recently launched dueling bio-supercomputer projects that will result in two of the world's most powerful computers. Like genomic research, bio-supercomputing is a leap into the unknown: It remains unclear exactly how these staggeringly expensive machines will be used and who will be willing to pay for them. But finding a market for bio-supercomputers is almost beside the point for the companies. Rather, the drive to push the boundaries of biological computing is behind the quest.
"Who knows what will come next," says Caroline Kovac, chief of IBM's life-sciences division. "That's the kind of high risk but incredibly high value" of the bio-supercomputer.
The biological and computing challenges posed by mapping the genome are daunting: What is the function of the more than 30,000 human genes decoded last summer? How many human proteins exist? (Scientists think there are at least a million.) Which proteins keep the heart beating; which ones repair damaged tissue; which ones help digest food? Just as important, how does a protein behave when a disease like Alzheimer's or colon cancer takes root? And which genes are generating those flawed proteins?
Some of these tasks, such as determining the function of different proteins, can be tackled with existing computer power -- albeit lots of it. Others, such as understanding the role of proteins in promoting disease, can require a computer of a different magnitude. A bio-supercomputer is not radically different from a conventional supercomputer, but it does require certain strengths, including the ability to recognize complex chemical and biological patterns and carry out complicated data searches.
That is where Compaq's Red Storm and IBM's Blue Gene come in. Currently under development, these bio-supercomputers aim to perform far faster than any existing supercomputer. "What we're seeing," says Kovac, "is a trend that the fastest computers in the world are starting to be driven by people looking at life sciences as opposed to physics calculations."
IBM was the first to announce its bio-supercomputer plan. When completed in 2004, Blue Gene will be capable of performing 1,000-trillion calculations per second, or 1,000 teraflops in supercomputer lingo. That's a quantum leap over the company's current champion, a supercomputer housed at Lawrence Livermore National Laboratory in California that operates at the relatively leisurely pace of 12 trillion calculations per second. In fact, Blue Gene is faster than the top 500 computers in the world put together. The machine will also be deceptively compact. While the Livermore Lab supercomputer compares in size to two basketball courts filled with refrigerators, Blue Gene will be contained in two fridge-size units that could be stored in a walk-in pantry.
IBM, which is spending at least $100 million on Blue Gene, also has chosen to build it for a narrowly defined purpose: mathematically predicting the changing shape proteins take -- a process known as protein folding -- before they go about their business regulating the human body. Misshapen proteins are thought to be a major trigger of disease. Determining the shape of a protein is crucial to developing drugs that can latch on to a damaged protein and prevent it from causing illness.
Competitor Compaq is taking a different path. In January, the company announced plans to develop a 100-teraflop bio-supercomputer dubbed Red Storm in partnership with Celera Genomics, the Rockville, Md., company that mapped the human genome, and Sandia National Laboratories in Albuquerque, N.M. Although Blue Gene will be 10 times faster than Red Storm, a Celera executive stresses that the company's machine could eventually match IBM's speed.
Unlike Blue Gene, though, Red Storm is being designed for a broader array of life-science experiments and may be used to conduct nuclear research. The supercomputer, set to begin operating in 2004, will cost an estimated $125 million to $150 million to build.
The structures of Red Storm and Blue Gene are quite different. Red Storm, says Marshall Peterson, Celera's chief of infrastructure technology, is taking "reasonably standard components and putting them together in somewhat creative ways." The architecture for Blue Gene, on the other hand, will be built largely from scratch by replicating many identical chips, which altogether will contain 1 million processors.
Compaq and IBM aren't necessarily counting on their machines being widely used. Rather, the supercomputers are intended to further computer technology as much as biology, and they'll act as prototypes for future bio-supercomputers. "We look at Blue Gene and say it's a research experiment," says Peter Ungaro, head of IBM's high-performance-computing division. "This is not a machine that will have broad-base commercial use." Peterson says he wouldn't be surprised if Celera never used the Compaq machine at all, instead applying knowledge gained from the project to design its own.
Is it good business to devote four years and $100 million to a computer that may never attract many paying customers? This was not, for the most part, the spirit that drove companies like IBM, Compaq, Hitachi and others to construct supercomputers for national labs like Lawrence Livermore and Los Alamos. Still, owning the world's most powerful supercomputer isn't bad for business. Its novelty alone could attract life-science clients that don't require supercomputing at all. And as microbiologists and genetic scientists increasingly rely on information science, such business is growing - IBM predicts a $40 billion market for life-science technology by 2004.
Blue Gene already has piqued the interest of at least one company, MDS Proteomics. The Toronto-based biotech firm chose IBM to help it analyze protein data and establish Blueprint, a nonprofit company that will create a protein database. "They've committed $100 million to build the world's fastest computer," says MDS Chief Executive Frank Gleeson. "That caught our interest."
Morningstar analyst Joseph Beaulieu says that while bio-supercomputer projects may not be immediately profitable, they raise their sponsors' profiles for high-end biological computing. "They're priming the pump," he notes of Compaq and IBM. These two tech giants are not alone in aggressively courting the biology market. Though not currently involved in a big bio-supercomputer project, Sun Microsystems has been working with biotech and pharmaceutical firms to develop hardware and software for genomic research. Sia Zadeh, head of Sun's life-sciences group, says the bio-supercomputer being built by his company's rivals are more attention-grabbing ploys than sound business strategy. "We do not believe in the creation of special-purpose machines," adds Zadeh, referring to Blue Gene.
Indeed, IBM's bet on Blue Gene, though admired, has raised questions about how specialized bio-supercomputers should be. Because hardware tends to become quickly outdated, some scientists question the logic of designing an extraordinarily powerful machine to tackle just one type of problem. Even Compaq's Red Storm -- designed to "switch-hit between different kinds of operations," as one company executive puts it -- can't necessarily meet the needs of a broad swath of genetic researchers.
Still, bio-supercomputers are an attractive option to scientists like Jill Mesirov, director of bioinformatics and computational biology at the Massachusetts Institute of Technology (dossier)'s Whitehead Center for Genome Research. The Whitehead Center was one of the participants in the Human Genome Project, the government-supported effort that competed with Celera to sequence the human genome. When Mesirov arrived at Whitehead four years ago from IBM, genetic research was chugging rather than flying. The Whitehead Center's computers sat on a wire rack in a room without air conditioning, the servers crashing in the summer heat. As the need for ever-more computing power escalates with the pace of genomic research, Mesirov says she's given up trying to squeeze even one more computer into rooms already crammed with a hundred of them. But whether MIT would buy time on a bio-supercomputer like Blue Gene or Red Storm depends on the computer's capabilities and its compatibility with MIT's software.
Even as Red Storm and Blue Gene take supercomputing to a new level of complexity, Celera's Peterson is impatient for something even more mind-boggling: systems that can analyze biological data even more intelligently than the bio-supercomputers that are still years from completion. "What we really need is something a lot more dramatic," he says, pointing to current research to create DNA-based computers as one possibility.
Just as biology is pressing the limits of computing, so too is computing pressing the limits of biology.
|Back to the top|