Review: Tips to find everything faster online

By Michael Gowan and Scott Spanbauer

(IDG) -- Got a second? That's probably all the time you want to spend searching on the Web. Along with its billions of pages of information, the Web now serves up video and other media -- resources so diverse that you need a team of sites to find what you want: an omnibus search engine, plus specialized search services for specific tasks. We tested both types: General search engines and specialty search sites.

Much has changed since our examination of Web search services exactly one year ago (see "How to Stop Searching and Start Finding"). Consolidation is the word you hear most often from search engine experts about trends in the industry. "We're likely to have fewer search engines and search portals," says Danny Sullivan, editor of "If the search engines don't figure out a way to make money, they're just going to quit." Recent casualties include Magellan and InfoSeek, two of the Web's earliest search services.

Sullivan and other analysts see more search services following Iwon and Go in adopting a pay-for-placement model. "Every major search engine has some form of paid placement," Sullivan says. He adds that users should be concerned if they find it difficult to distinguish between the search results listings that are paid for and those that aren't.

Search technology has advanced, too. For example, Google now performs full-text searching within files in Adobe Systems' Portable Document Format -- used to post many government documents and corporate white papers. Chris Sherman, associate editor, believes the inclusion of PDF files signifies that the "invisible Web" is shrinking. The invisible Web refers to pages that engines -- for whatever reason -- don't index, like sports scores and other information of momentary interest, as well as databases and other non-HTML content. INFOCENTER
Technology improvements also mean that search results are less likely to include dead links (links to pages that no longer exist or that have been moved), and they are more likely to place the relevant link at or near the top of the first page.

But while some sites have improved, others seem to have gotten worse. The Open Directory and MetaCrawler, our favorite directory and metasearch engine last year, didn't perform as well in our tests this year -- in fact, the Open Directory did so poorly that it didn't even make it onto our chart of the top 12 services. Google remains our Best Bet; it's our flat-out favorite of all search tools. Close behind is Fast. And leading the directory pack are longtime favorites Yahoo and Lycos.

In our tests, we evaluated search sites based on five criteria: relevance, advanced features (such as the AND, NOT, and OR Boolean operators; see "The Tricks of the Search Trade" for information about these), the site's ease of use, its percentage of dead links, and the freshness of the results it returned (based on how well a site did at returning pages about a current topic). And because a good search site makes it easy for you to pose your query and get to relevant Web pages quickly and simply, we awarded extra points to sites that gave us a correct page within the first five links. For our complete test results, see the chart.

We looked for data in five categories: product information (reviews of Pioneer DVD-Recordable drives), business results (Cisco's first-quarter 2001 revenue), technology specifications (the maximum data transfer rate of the Universal Serial Bus 2.0 standard), regional data (room rates at the Agate Cove Inn bed and breakfast in Mendocino, California), and obscure facts (the author of an out-of-print 1920s sci-fi novel, Eater of Darkness). We tested the freshness of the results by asking a question that was topical at the time of our tests.

Relevance and ease of use proved a mixed bag: Some of the services with the cleanest interface (MetaCrawler, for example) didn't score well in relevance, while some that overflowed with ads and options returned a relevant result among the first five links. Overall, we were pleased with the advanced search tools and techniques the engines and directories offered; for many searches it pays to put a little effort into your query. Keep in mind, however, that no one test can serve as a definitive basis for ranking competing search engines. Your experience will vary depending on search topic.

Engines vs. Directories

In their simplest form, search engines rely on machines to gather responses to searches; directories, on the other hand, are created by humans. Although search engines may index more than a billion Web pages, directories rarely include references to more than a million.

Google, Fast, AltaVista, and other search engines use a program known as a spider to scan and record the contents of Web pages. The spider collects a page's title and other information stored in its HTML code, and then it follows links on the page to gather information about those pages as well. Spiders are set to exclude certain words (such as articles and prepositions), and they "time out" after a specific period to avoid being trapped on a single Web page or site. That timing out means large Web sites are rarely completely spidered.

The information the spider collects is compiled into an index -- something akin to a library's card catalog. When you enter a query at a search engine site, the engine searches its index and returns links to pages that seem to match your query. Of course, what the engine deems relevant may not match what you think is relevant.

Web directories, such as Yahoo!, Lycos, Excite, and LookSmart, collect sites with a spider or receive them as submissions from site owners. Then directory editors sort through the sites and put them in a database arranged by topic. You can either search the database using the site's search engine or click through the site's category tree until you find the topic you want.

Directories usually include query results from one or more search engines in their results page. (The reverse is also true: Google offers results from the Open Directory.) For example, Yahoo's search results combine pages from its directory with results on the same topic from Google's search engine, though you won't get all of Google's results on a topic at Yahoo. Lycos mixes the results from its directory with Fast's search engine results. Metasearch engines, such as Dogpile and MetaCrawler, send your query to several directories and search engines, and then aggregate the results.

Because search engines index so many more pages than directories do, you're more likely to get hundreds of results when you search for something general, such as Ford Mustang. A rule of thumb: Use directories when you're looking for general information or when you're not sure where to begin, and use search engines when you're looking for a specific piece of information. If you use a directory that is integrated with a search engine, or if you use an engine that includes a directory, you don't necessarily get the best of both worlds because each can water down the strength of the other.

Everything Is Relevance

Regardless of how many pages a search site indexes, the relevance of the first results is what matters. Each search site determines relevance differently, so even if two engines produce the same link in their results, the link may not appear at the top of the page in both lists. Most search engines base relevance partly on where your search term appears on a page: A position high on the page translates into a higher relevancy score.

Both Google and Fast rank a page's relevance, in part, on the number of other sites that link to it. They theorize that if many sites link to a particular page, it must have greater value to people looking for information on that topic. Judging from our test results, we think Google and Fast are right.

All is not perfect in search engine land, however. The refinements don't guarantee that you'll find what you're looking for right off the bat. For example, when we entered the relatively straightforward cisco first quarter revenue 2001 to retrieve links to pages listing Cisco Systems' first-quarter revenues in 2001 (which were $6.5 billion), we saw many pages that reported earnings from other companies (most of which did business with Cisco), plus several older and newer Cisco revenue reports. Only half of the search sites linked us to a page with the answer in the first 20 results, which we consider a reasonable number of links to peruse.

In our obscurity test, we searched to find the author of Eater of Darkness. We got links to pages about The Hasheesh Eater by Fitz Hugh Ludlow, and Confessions of an English Opium Eater by Thomas de Quincey. We also got links to several Dr. Who fan pages and to a vampire site or two. If you try this search at Excite, you may want to send the kids to bed first -- the word eater seems to appear on an awful lot of X-rated adult sites. The correct answer, found by both Google and Yahoo!, is Robert Myron Coates.

Directories such as LookSmart mix fee-paying sites with nonpaying results. This practice is in addition to the "featured links" box at the top of the results page. LookSmart and others let sites pay to have more of their pages included in the listings. LookSmart claims that this gives researchers more resources to draw on, but the site's poor scores in our relevance tests indicate that doing so hinders your chance of getting relevant links for your queries. LookSmart did well in our technology specifications category, however.

Advanced Search Pays Off

You can help search engines do their job by using advanced search features. Some search sites automatically apply these techniques for you. For example, some search engines add the Boolean operators OR and AND automatically when you enter a multiword search.

Several engines offer forms and drop-down menus for honing your results. For example, Excite, Fast, Google, HotBot, Lycos, and Yahoo! let you choose to view only pages containing your exact search phrase, pages that have all your phrase's words in any order, or pages with any of the words in your search phrase. AltaVista lets you sort your results by ranking the words in your search phrase.

In our tests, most search sites returned better results when we used advanced functions. The exceptions were Google (which couldn't improve because it recorded perfect scores in our basic tests) and Yahoo! (which also scored well in the basic tests). LookSmart and Ask Jeeves do not support advanced searching.

It's important to use the right advanced technique, however. When we looked for Eater of Darkness, only Google and Yahoo! returned a correct result within our test parameters (usually the first 20 links returned). The other sites failed to find it even when we inserted ANDs between the search terms. When we searched for the exact phrase eater of darkness, however, every site except Northern Light and MetaCrawler produced a correct result, generally within the top five links.

Looks Count

While the relevance of results is crucial, a site's interface and help features also matter. The best sites offer such features as a drop-down list of search delimiters ('.mp3s only,' for example) without cluttering the page. Sites that returned the most-relevant answers also scored well for their interface. Google and Fast use no-muss, no-fuss designs: Their home pages show little more than a search field and a link to the results. On the other hand, Yahoo's cluttered interface combines a search field, directory categories, news, shopping links, and more.

Not all directories are that distracting, however. For example, the small amount of extra information that appears on Lycos's home page is presented clearly. Similarly, among the metasearch sites, MetaCrawler is helped by its simple-yet-functional graphical interface, while HotBot's neon green will have visitors rushing to their Back button.

The true test of a site's interface is its results page. Ask Jeeves and MetaCrawler hide their results among ads and other clutter, while LookSmart and Excite fail to provide such useful, basic information as the full URL of the site that the included link leads to.

The best results pages balance clean interfaces with worthwhile extras, such as a search field that allows you to refine your search results by adding or deleting words. Google provides a link to a cached copy of the page so you can view it if the link to the live page isn't working.

