ad info  technology > computing
    Editions | myCNN | Video | Audio | Headline News Brief | Feedback  




Consumer group: Online privacy protections fall short

Guide to a wired Super Bowl

Debate opens on making e-commerce law consistent



More than 11,000 killed in India quake

Mideast negotiators want to continue talks after Israeli elections


4:30pm ET, 4/16










CNN Websites
Networks image

XML: The right tool for odd jobs


April 24, 2000
Web posted at: 1:13 p.m. EDT (1713 GMT)

(IDG) -- XML (Extensible Markup Language) has a good shot at being a really pervasive technology that may cause changes in all parts of the computing community.

XML looks quite a bit like HTML. The languages employ very similar syntax, though XML is distinctly pickier about compliance. Fundamentally, however, HTML and XML have rather different goals and some corresponding design differences.

The goal of a typical HTML file is to present information to a human reader, mediated by a browsing program. Although HTML documents can be written in a very abstract way, coders often use tricks to specify the exact appearance of the resulting document.

This makes Webpages interesting to look at, to be sure, but it greatly increases the difficulty of writing programs to read the data. Worse, the HTML organization of a page may change at any time, subject to the whims of the developing organization.

XML documents, in contrast, are optimized for processing by computer programs. Their tight rules of syntax allow both consistency checking by the creator and ease of access by random clients.

  SunWorld home page
  Site savvy
  XML at center of digital revolution
  Web publishing hinges on XML
  Reviews & in-depth info at's personal news page
  Questions about computers? Let's editors help you
  Subscribe to free daily newsletter for system admins
  Search in 12 languages
  News Radio
  * Fusion audio primers
  * Computerworld Minute

Further, the semantics (and high-level syntax) of XML files can be defined by a document type definition (DTD) file or an XML schema. These can provide a programmer (or a particularly deft program) with a guide to reading and parsing the actual XML file. (HTML has DTDs, as well, but they tend not to be as detailed.)

XML has many other interesting characteristics, but these should get us started. Let's explore some possible XML-based applications.

Book catalogs

Consider the task of parsing publishers' Webpages to generate a comprehensive list of books on a given topical area. Each publisher uses a different format, of course, and every time a publisher rearranges a page or adds a feature, some programmer must figure out (again) how to parse the format.

Big companies such as Amazon simply step around this problem, requiring publishers to give them listings in a specified format. Unfortunately, this means that each publisher now has to generate a different listing for each online reseller.

Wouldn't it be more reasonable for publishers and resellers to agree on a single listing format? The pages could be transmitted privately or posted on the World Wide Web for more general access. In either case, however, the target audience would be programs, rather than humans.

If the format were well documented, special-purpose search programs could be hacked up in Perl, etc. As an occasional book reviewer, I would love to have a program that could generate lists of books on specified topics!

XML is aimed at precisely this kind of problem. Publishers and resellers could easily (from a technical perspective) define a common vocabulary and structure for XML-based catalogs.

Although this could be accomplished by a prose description, a DTD or XML schema really should be used to specify the exact format. Existing DTDs (e.g., BiblioML and MARC) cover very similar problems, so the publishing community could probably adopt (or adapt) one for its own use.

Once agreement has been reached on the DTD, each publisher must find a way to convert its local catalog format into (out of) the XML format. This is a relatively trivial effort, however, compared with generating formats for an arbitrary (and steadily increasing) number of resellers.

I would love to be able to tell you that the publishing industry is well on the way to having such a system in place. Sadly, even publishers that have myriad books about XML haven't (yet) published their catalogs in XML form. I predict that it will happen, however, and probably sooner than later.

Software building and distribution

In the Unix community, software builds are commonly controlled by a version of the make utility. Make files describe dependency relationships between files (e.g., "foo is built from foo.c and foo.h"), using a largely declarative syntax supplemented by snippets of shell code.

Because make is a very flexible language, wizards can cause it to do spectacular things. The FreeBSD Project's Jordan Hubbard, for instance, has created a 2,500+ line make file as the basis for the FreeBSD Ports Collection.

In concert with a small specification file for each package, Jordan's make file automates the downloading, patching, building, and installation of given open source packages. About 3,000 of these specification files currently exist, covering a very wide range of packages.

Unfortunately, the system depends heavily on Berkeley-style make, as well as having a variety of FreeBSD dependencies. Consequently, adapting the system to support Solaris (let alone Linux) might be a challenge.

I have speculated about the possibility of using XML as the basis for a rewritten system. In the new system, the description files would be both abstract (no OS dependencies) and totally declarative (no embedded snippets of code).

Looking around a bit, I discovered that I was not alone in considering this approach. The Open Software Description, developed by folks at Marimba and Microsoft, proposes XML as the foundation for a complete software packaging and distribution system.

Apple is also reported to be making heavy use of XML in the software build and distribution mechanisms for Mac OS X. And, of course, XML plays a large role in Apple's WebObjects system.

Oh, yes, Webpages

Although I have discounted the use of XML for Webpages, there are some really interesting possibilities here, as well. In an effort to make Webpages more interesting and dynamic, programmers are stuffing all sorts of executable code (e.g., Java, JavaScript, Perl, and Tk) into HTML pages.

This makes me more than a bit twitchy, as I have no way of knowing the real intentions (or, for that matter, simple competence) of the programmers who wrote the code. So, I tend to leave these facilities off in my browser, missing pizzazz in return for a bit more safety.

Instead of sending executable code, however, programmers could send declarative descriptions of items, along with possible presentation modes. These modes, defined by style sheets, can support interactive graphics, multimedia, and more. What they do not do is pump arbitrary code into the viewer's machine.

Although I suspect that evildoers could find ways to subvert even XML, the opportunities are more limited. So I look forward to upcoming uses of XML that will take advantage of "trusted" presentation code to give me both pizzazz and safety.

Negatrends - Technologies you won't see in the year 2000
April 18, 2000
Commerce One adds XML tools to e-commerce line
January 27, 2000
Apache Software Foundation launches XML open-source project
November 11, 1999
XML might become standard for digital signatures
July 14, 1999
XML: The online-catalog solution
June 8, 1999

XML at center of digital revolution
Software AG opens U.S. subsidiary to push XML
Group formed to promote XML-based financial spec
XML shifts power to users, but can they handle it?
B2B stumbles over XML
Web publishing hinges on XML
Making it real with XML
(Network World Fusion)
Site savvy

The XML Handbook
The World Wide Web Consortium's XML pages

Note: Pages will open in a new browser window
External sites are not endorsed by CNN Interactive.


Back to the top   © 2001 Cable News Network. All Rights Reserved.
Terms under which this service is provided to you.
Read our privacy guidelines.