||Working With Java And Xml
What Is XML?
XML is a
text-based markup language that is fast becoming the standard for data
interchange on the Web. As with HTML, you identify data using tags
(identifiers enclosed in angle brackets, like this: <...>).
Collectively, the tags are known as "markup".
But unlike HTML, XML tags identify the data, rather than specifying how to display it. Where an HTML tag says something like "display this data in bold font" (
<b>...</b>), an XML tag acts like a field name in your program. It puts a label on a piece of data that identifies it (for example:
Since identifying the data gives you some sense of what means (how to interpret it, what you should do with it), XML is sometimes described as a mechanism for specifying the semantics (meaning) of the data.
the same way that you define the field names for a data structure, you
are free to use any XML tags that make sense for a given application.
Naturally, though, for multiple applications to use the same XML data,
they have to agree on the tag names they intend to use.
Here is an example of some XML data you might use for a messaging application:
<message> <to>[email protected]</to> <from>[email protected]</from> <subject>XML Is Really Cool</subject> <text> How many ways is XML cool? Let me count the ways... </text> </message>
this tutorial, we use boldface text to highlight things we want to
bring to your attention. XML does not require anything to be in bold!
tags in this example identify the message as a whole, the destination
and sender addresses, the subject, and the text of the message. As in
<to> tag has a matching end tag:
</to>. The data between the tag and and its matching end tag defines an element of the XML data. Note, too, that the content of the
<to> tag is entirely contained within the scope of the
<message>..</message> tag. It is this ability for one tag to contain others that gives XML its ability to represent hierarchical data structures
again, as with HTML, whitespace is essentially irrelevant, so you can
format the data for readability and yet still process it easily with a
program. Unlike HTML, however, in XML you could easily search a data
set for messages containing "cool" in the subject, because the XML tags
identify the content of the data, rather than specifying its
Tags and Attributes
Tags can also contain attributes
-- additional information included as part of the tag itself, within
the tag's angle brackets. The following example shows an email message
structure that uses attributes for the "to", "from", and "subject"
<message to="[email protected]" from="[email protected]" subject="XML Is Really Cool">
<text> How many ways is XML cool? Let me count the ways... </text> </message>
in HTML, the attribute name is followed by an equal sign and the
attribute value, and multiple attributes are separated by spaces.
Unlike HTML, however, in XML commas between attributes are not ignored
-- if present, they generate an error.
Since you could design a data structure like
equally well using either attributes or tags, it can take a
considerable amount of thought to figure out which design is best for
your purposes. The last part of this tutorial, Designing an XML Data Structure, includes ideas to help you decide when to use attributes and when to use tags.
One really big difference between XML and HTML is that an XML document is always constrained to be well formed.
There are several rules that determine when a document is well-formed,
but one of the most important is that every tag has a closing tag. So,
in XML, the
</to> tag is not optional. The
<to> element is never terminated by any tag other than
Note: Another important aspect of a well-formed document is that all tags are completely nested. So you can have
<message>..<to>..</to>..</message>, but never
<message>..<to>..</message>..</to>. A complete list of requirements is contained in the list of XML Frequently Asked Questions (FAQ) at
http://www.ucc.ie/xml/#FAQ-VALIDWF. (This FAQ is on the w3c "Recommended Reading" list at
though, it makes sense to have a tag that stands by itself. For
example, you might want to add a "flag" tag that marks message as
important. A tag like that doesn't enclose any content, so it's known
as an "empty tag". You can create an empty tag by ending it with
/> instead of
>. For example, the following message contains such a tag:
<message to="[email protected]" from="[email protected]" subject="XML Is Really Cool"> <flag/> Note: The empty tag saves you from having to code
<text> How many ways is XML cool? Let me count the ways... </text> </message>
in order to have a well-formed document. You can control which tags are
allowed to be empty by creating a Document Type Definition, or DTD.
We'll talk about that in a few moments. If there is no DTD, then the
document can contain any kinds of tags you want, as long as the
document is well-formed.
Comments in XML Files
XML comments look just like HTML comments:
<message to="[email protected]" from="[email protected]" subject="XML Is Really Cool"> <!-- This is a comment --> <text> How many ways is XML cool? Let me count the ways... </text> </message>
The XML Prolog
To complete this journeyman's introduction to XML, note that an XML file always starts with a prolog. The minimal prolog contains a declaration that identifies the document as an XML document, like this:
The declaration may also contain additional information, like this:
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
The XML declaration is essentially the same as the HTML header,
<html>, except that it uses
<?..?> and it may contain the following attributes:
- Identifies the version of the XML markup language used in the data. This attribute is not optional.
the character set used to encode the data. "ISO-8859-1" is "Latin-1"
the Western European and English language character set. (The default
is compressed Unicode: UTF-8.)
- Tells whether or not this document references an external entity or an external data type specification (see below). If there are no external references, then "yes" is appropriate
The prolog can also contain definitions of entities
(items that are inserted when you reference them from within the
document) and specifications that tell which tags are valid in the
document, both declared in a Document Type Definition (DTD)
that can be defined directly within the prolog, as well as with
pointers to external specification files. But those are the subject of
later tutorials. For more information on these and many other aspects
of XML, see the Recommended Reading list of the w3c XML page at
declaration is actually optional. But it's a good idea to include it
whenever you create an XML file. The declaration should have the
version number, at a minimum, and ideally the encoding as well. That
standard simplifies things if the XML standard is extended in the
future, and if the data ever needs to be localized for different
Everything that comes after the XML prolog constitutes the document's content.
An XML file can also contain processing instructions
that give commands or information to an application that is processing
the XML data. Processing instructions have the following format:
where the target is the name of the application that is expected to do the processing, and instructions is a string of characters that embodies the information or commands for the application to process.
the instructions are application specific, an XML file could have
multiple processing instructions that tell different applications to do
similar things, though in different ways. The XML file for a slideshow,
for example, could have processing instructions that let the speaker
specify a technical or executive-level version of the presentation. If
multiple presentation programs were used, the program might need
multiple versions of the processing instructions (although it would be
nicer if such applications recognized standard instructions).
target name "xml" (in any combination of upper or lowercase letters) is
reserved for XML standards. In one sense, the declaration is a
processing instruction that fits that standard. (However, when you're
working with the parser later, you'll see that the method for handling
processing instructions never sees the declaration.)
Why Is XML Important?
There are a number of reasons for XML's surging acceptance. This section lists a few of the most prominent.
XML is not a binary format, you can create and edit files with anything
from a standard text editor to a visual development environment. That
makes it easy to debug your programs, and makes it useful for storing
small amounts of data. At the other end of the spectrum, an XML front
end to a database makes it possible to efficiently store large amounts
of XML data as well. So XML provides scalability for anything from
small configuration files to a company-wide data repository.
tells you what kind of data you have, not how to display it. Because
the markup tags identify the information and break up the data into
parts, an email program can process it, a search program can look for
messages sent to particular people, and an address book can extract the
address information from the rest of the message. In short, because the
different parts of the information have been identified, they can be
used in different ways by different applications.
When display is important, the stylesheet standard, XSL, lets you dictate how to portray the data. For example, the stylesheet for:
- Start a new line.
- Display "To:" in bold, followed by a space
- Display the destination data.
To: [email protected]
course, you could have done the same thing in HTML, but you wouldn't be
able to process the data with search programs and address-extraction
programs and the like. More importantly, since XML is inherently
style-free, you can use a completely different stylesheet to produce
output in postscript, TEX, PDF, or some new format that hasn't even
been invented yet. That flexibility amounts to what one author
described as "future-proofing" your information. The XML documents you
author today can be used in future document-delivery systems that
haven't even been imagined yet.
of the nicer aspects of XML documents is that they can be composed from
separate entities. You can do that with HTML, but only by linking to
other documents. Unlike HTML, XML entities can be included "in line" in
a document. The included sections look like a normal part of the
document -- you can search the whole document at one time or download
it in one piece. That lets you modularize your documents without
resorting to links. You can single-source a section so that an edit to
it is reflected everywhere the section is used, and yet a document
composed from such pieces looks for all the world like a one-piece
to HTML, the ability to define links between documents is now regarded
as a necessity. The next section of this tutorial, XML and Related Specs,
discusses the link-specification initiative. This initiative lets you
define two-way links, multiple-target links, "expanding" links (where
clicking a link causes the targeted information to appear inline), and
links between two existing documents that are defined in a third.
mentioned earlier, regular and consistent notation makes it easier to
build a program to process XML data. For example, in HTML a
<dt> tag can be delimited by
</dl>. That makes for some difficult programming. But in XML, the
<dt> tag must always have a
</dt> terminator, or else it will be defined as a
<dt/> tag. That restriction is a critical part of the constraints that make an XML document well-formed.
(Otherwise, the XML parser won't be able to read the data.) And since
XML is a vendor-neutral standard, you can choose among several XML
parsers, any one of which takes the work out of processing XML data.
XML documents benefit from their hierarchical structure. Hierarchical
document structures are, in general, faster to access because you can
drill down to the part you need, like stepping through a table of
contents. They are also easier to rearrange, because each piece is
delimited. In a document, for example, you could move a heading to a
new location and drag everything under it along with the heading,
instead of having to page down to make a selection, cut, and then paste
the selection into a new location.
How Can You Use XML?
There are several basic ways to make use of XML:
- Traditional data processing, where XML encodes the data for a program to process
programming, where XML documents are containers that build interfaces
and applications from existing components
-- the foundation for document-driven programming, where the customized
version of a component is saved (archived) so it can be used later
where the DTD or schema that defines an XML data structure is used to
automatically generate a significant portion of the application that
will eventually process that data
Traditional Data Processing
is fast becoming the data representation of choice for the Web. It's
terrific when used in conjunction with network-centric Java-platform
programs that send and retrieve information. So a client/server
application, for example, could transmit XML-encoded data back and
forth between the client and the server.
In the future, XML
is potentially the answer for data interchange in all sorts of
transactions, as long as both sides agree on the markup to use. (For
example, should an email program expect to see tags named
The need for common standards will generate a lot of industry-specific
standardization efforts in the years ahead. In the meantime, mechanisms
that let you "translate" the tags in an XML document will be important.
Such mechanisms include projects like the RDF initiative, which defines "meta tags", and the XSL specification, which lets you translate XML tags into other XML tags.
Document-Driven Programming (DDP)
newest approach to using XML is to construct a document that describes
how an application page should look. The document, rather than simply
being displayed, consists of references to user interface components
and business-logic components that are "hooked together" to create an
application on the fly.
Of course, it makes sense to utilize the Java platform for such components. Both Java BeansTM for interfaces and Enterprise Java BeansTM
for business logic can be used to construct such applications. Although
none of the efforts undertaken so far are ready for commercial use,
much preliminary work has already been done.
Java programming language is also excellent for writing XML-processing
tools that are as portable as XML. Several Visual XML editors have been
written for the Java platform. For a listing of editors, processing
tools, and other XML resources, see the "Software" section of Robin
Cover's SGML/XML Web Page
8 June, 2006