NASA to archive earthly data
(IDG) -- You think it's tough making sense of corporate data systems? Try managing the ultimate global database.
That's the challenge facing the IT group at NASA, which is responsible for much of what's known about the world. The space agency is a repository for earth-science data gathered by satellites, aircraft, ground-based systems, space shuttles and floating ocean buoys over the past 30-plus years. Streams of data filter daily into NASA to be sorted, cataloged and added to the knowledge we have of the earth, sea and sky. Caring for all that information is a task that challenges every network and data management discipline.
And managing the gigantic database is about to get a lot more complicated. For several years, NASA has been developing the Earth Observing System Data Information System (EOSDIS), a data collection effort that, over the course of 15 years, is intended to establish a baseline of integrated land, sea and climatic data covering everything under our piece of the sun. Key to the success of this project is the agency's new Interactive Image SpreadSheet (IISS), which will allow researchers to simultaneously sort through scientific imagery from multiple archives with a single intuitive browser.
Scheduled for deployment this fall, EOSDIS is the largest undertaking of its kind. Grand intent aside, it could quickly become a real database and network nightmare. A pair of satellites will generate a staggering amount of information, and the challenge of managing all this raw data in a way that allows the next generation of researchers to extract meaningful information is a mission that goes beyond mere rocket science.
The big blue marble
NASA's entire earth-science cache currently measures about 175 terabytes (one terabyte equals 1,000G bytes). The data is distributed among eight major archive centers around the U.S., but the data is fractured. It has been gathered from different missions with different objectives, providing only a piecemeal view of the earth. There exists no definitive, comprehensive baseline of earth observations from the many cameras that have been sent over the years to survey the earth.
EOSDIS is poised to resolve that shortcoming. Within six months of the initial satellite launch, EOSDIS will double the amount of earth-science data that NASA has collected to this point. When it starts orbiting early this fall, the Ante Meridian (a.m.) satellite will beam down nearly one terabyte of surface data daily during its morning scans. Its companion Post Meridian (p.m.) satellite, to be launched two years later, will add another 650G bytes of daily traffic from its afternoon rounds.
When NASA adds a couple more satellites flying related missions, the earth scan could accumulate more than two terabytes of data every day.
To put this in perspective, consider that the 32-volume Encyclopaedia Britannica contains 65,000 articles and 44 million words, which equates to about 300M bytes of information. If EOSDIS collects two terabytes of data each day, that's the equivalent of 6,667 Encyclopaedia Britannica sets, or one 300M-byte set every 13 seconds. Think of it as a full-motion video stream measured in kilometers squared, not pixels, and transmitted at a frame rate of some three miles per second.
And that's just the data from one program at one agency. NASA is partnered with the National Oceanographic and Atmospheric Agency (NOAA) and other international organizations, all of which will contribute data to the information system.
Once the data starts pouring in, the EOSDIS mission shifts from data collection to distribution. Thousands of researchers, scientists and policy makers must have access to the information. They'll need to know what information is available and how to find it. The system must also be future-proof: Ten years from now, researchers will turn to EOSDIS to answer questions that its designers today can't imagine. EOSDIS must become a living repository, not a data dump, and be engineered to accept contributions from new types of sensors and other devices that have yet to be conceived.
EOSDIS will shape our understanding of the earth beyond purely scientific terms. For example, the system will eventually link socioeconomic readings and environmental data, giving researchers an opportunity to quantify the human consequences of earth's physical changes. Standing between the EOSDIS launch this fall and our earthly enlightenment, however, is a shrinking federal budget and a bureaucracy that often runs like molasses when sizable funds are at stake.
The EOSDIS ground-based systems alone will cost $1 billion, and that infrastructure is being built around big contracts and big-business protocol, says Gordon Knoble, the EOSDIS Mission Systems network manager at NASA's Goddard Space Flight Center in Greenbelt, Md.
"It's a challenge to try and be flexible," Knoble says. "Turnaround is six to nine months on most things, so most of our time is spent trying to identify requirements far enough ahead. If we can't do something when it needs to happen, then its timeliness is gone and we've missed the opportunity."
The T-1/T-3 circuits and OC-3 ATM links that funnel NASA's wide-area data are provided by the NASA Integrated Services Network (NISN), which evolved from one of the early versions of the Internet. NASA operates NISN, purchasing bandwidth from carriers around the country to provide backbone connectivity for research facilities, universities and NASA's own command/control centers. Every NASA mission, such as EOSDIS, must requisition WAN services from NISN.EOSDIS net requirements are concerned mainly with getting information into the eight regional Distributed Active Archive Centers (DAAC). Each DAAC houses multiple RAID arrays, with Fibre Channel connections to an FDDI backbone. End users access the data at these sites through the Internet, mostly using T-1 lines connected to their universities or research organizations. Currently, researchers can access each site in NASA's archives one at a time through a Web browser. EOSDIS will push Web browsing to its limits.
A better browser
"You're talking about a terabyte of data coming in every day. How do you make it available?" Knoble says. "Just knowing what to look for is a challenge, then you have to retrieve it and pull other pieces together. Even with bandwidth going to OC-12 across the backbone, it's no luxury for thousands of people trying to browse through terabytes to find what they need."
For example, an EOSDIS user may want to retrieve all atmospheric data relating to volcanic smog (vog) downwind of a long, slow eruption. Instruments aboard a spacecraft take readings in various wavelengths, each of which is its own data set. A user may also need to scan separate reams of wind and cloud data from NOAA. Today, there's no easy correlation among such data sets. You can't readily mix information from different instruments or different agencies - combine ground-based measurements with satellite imagery, for example - to generate pandimensional views of earth phenomena. The cataloged data sets that exist can be interrogated and filtered, but they can't be tapped more than one at a time or answer questions they weren't designed to field. So researchers have to find each separate data set and zero in on the particulars.
To solve that problem, EOSDIS is developing the IISS. This distributed visualization browser is designed to make working with EOSDIS archives more intuitive and visual. Similar to what a conventional spreadsheet does for numbers and text, NASA's interactive spreadsheet will organize and display the relationship among scientific imagery gathered by EOSDIS. IISS will allow users to simultaneously examine data from multiple archives. The relationship between vog and downwind acid rain, for example, would appear visually on screen as users browse for the information they need."[IISS] does two things for us," Knoble says. "It becomes a search engine for the sort of data we have to handle, and it also becomes a next-level visualization tool that lets people start seeing things differently than we do now. This is the sort of data tool that people have been thinking about for years, but the technology is just now getting into the ballpark where we can begin to implement it."
And IISS is a net-killer, for sure. "A user might hit three or four sites, draw down 10G bytes of data, and say, 'Oops, that's not what we're looking for' - then start all over," Knoble says. "So while we don't have tens of thousands of transactions and different services going on like they do in corporate networks, when you get a few hundred or a thousand researchers pulling data like that every day, our challenge becomes pretty interesting."
How will the team handle this challenge? "If we could have anything we wanted, it would be ATM-based bandwidth-by-the-yard," says EOSDIS Deputy Project Manager John Dalton. "ATM is not so important in getting data down from the satellites because that's a fairly steady, constant stream. But the end-user requirements are going to be extremely bursty, and that's where ATM's bandwidth on demand could potentially change the way this kind of science is done."
Dalton will get his wish thanks to the National Science Foundation (NSF), which provides ATM connectivity to MCI Communication Corp.'s very high-speed Backbone Network Service (vBNS). A cooperative venture between MCI and the NSF, the vBNS was launched in 1995 to create a high-speed ATM backbone supporting scientific research across the country.
Nonetheless, bandwidth remains just a small part of the picture. When it comes to managing terabytes of earthly data, the real issue is how to turn raw snapshots into a readable catalog of the world. NASA has been chewing on the problem long enough to realize it will take another 15 years to digest.
Csenger is a freelance writer in the Chicago area. He can be reached at firstname.lastname@example.org.
Back to the top
© 2000 Cable News Network. All Rights Reserved.
Terms under which this service is provided to you.
Read our privacy guidelines.