eXtensible Markup Language
The eXtensible Markup Language (XML) is used to describe data and its structure.
Its purpose is to provide a method for packaging data and for transmitting it
between computers. In one sense XML is no different in purpose from text files
and databases -- it is used to create data stores. Its significance, though,
is in establishing a common data formatting standard that is recognizable and
shareable among widely diverse computer systems. In addition, it is text based;
therefore, it permits easy data exchange across the Web using common HTTP protocols.
XML Markup
XML is a data markup language that, like HTML, uses tags to describe the structure
of data. Unlike the HTML Web page markup language, it does not have predefined tags
with special meanings. XML tags are created by the data designer to fit the purpose
of the data structure. Different sets of tags would be used, for instance, to describe
a memo, a letter, a book chapter, a financial statement, an email message, a legal
document, a restaurant menu, a personnel record, a course catalog, a driver's license,
a television listing, or a thousand other data collections.
As you might recognize from the above listing, all variety of data structures can be
represented in XML. It is not limited to linear lines of text as is common to text files;
neither is it limited to relational information organized into the rows and columns of
database tables. Virtually any structure of data can be represented by XML. Common to
all of these data structures, however, are the XML standards that make them all
accessible by and transmittable between any computers on the Internet.
XML Data Structures
Much of the information that needs to be shareable between computers does, of course,
reside in databases. The following table shows the structure of information in a
"Personnel" table containing "Employee" records, each of which is composed of a "SSN"
field, a "FirstName" field, a "LastName" field, a "Salary" field, and a "Department"
field.
Personnel (table)
SSN |
FirstName |
LastName |
Salary |
Department |
111-11-1111 |
Ann |
Adams |
65000.00 |
Accounting |
222-22-2222 |
Beth |
Baker |
55000.00 |
Marketing |
333-33-3333 |
Cecil |
Carleton |
60000.00 |
Information Technology |
444-44-4444 |
David |
Davis |
59000.00 |
Information Technology |
555-55-5555 |
Ellen |
Edwards |
62000.00 |
Accounting |
This information is easily accessible within an internal processing environment composed
of similar hardware and software. Processing routines use known access methods to get to
the known location of the database through established permissions. However, problems arise
when this information is needed externally. Users at remote locations, especially those
outside the organization, may not have the knowledge or permissions to access it. Its access
is private to the owner but needed by the client. The issue, then, is how to make this
information available to outside clients without giving them access to the internal system.
The solution is through an XML data structure. Internal processing routines can privately
select and gather the needed information, package it as a simple text-based data structure
while retaining the data relationships, and ship the package of information to the client
across the Web. The client has access to the information without needing knowledge of or
permissions to the original data store, even if the client uses conflicting hardware and
software. The XML data is in a common format accessible through the common protocols of
the Web.
It is easy to represent the above database table as an XML data structure. As shown below,
the database table name <Personnel>
is used as the enclosing tag for
the entire structure, each record of which is identified by an <Employee>
tag.
Database field names have been applied as the XML tag names <SSN>
,
<FirstName>
, <LastName>
, and
<Department>
to identify the data elements corresponding to the
database fields that are permissible for outside viewing.
<Personnel>
<Employee>
<SSN>111-11-1111</SSN>
<FirstName>Ann</FirstName>
<LastName>Adams</LastName>
<Department>Accounting</Department>
</Employee>
<Employee>
<SSN>222-22-2222</SSN>
<FirstName>Beth</FirstName>
<LastName>Baker</LastName>
<Department>Marketing</Department>
</Employee>
<Employee>
<SSN>333-33-3333</SSN>
<FirstName>Cecil</FirstName>
<LastName>Carleton</LastName>
<Department>Information Technology</Department>
</Employee>
<Employee>
<SSN>444-44-4444</SSN>
<FirstName>David</FirstName>
<LastName>Davis</LastName>
<Department>Information Technology</Department>
</Employee>
<Employee>
<SSN>555-55-5555</SSN>
<FirstName>Ellen</FirstName>
<LastName>Edwards</LastName>
<Department>Accounting</Department>
</Employee>
</Personnel>
Listing 1-1. XML data structure.
This data structure can be saved as a simple text document to make it available to any
Web pages which need to retrieve this information. In fact, it has been saved as file
Personnel.xml in the same directory as this Web page. When you click the following button,
this XML document is opened in a separate browser window:
In this view of the data, elements are preceded by "+" and "-" symbols for expanding and
collapsing the structure. The output isn't particularly exciting, granted, but the
structured information is now available to any computer on the World Wide Web simply by
issuing the associated URL.
It was mentioned above that XML can represent a wide variety of data structures, not
just those that represent relational data. A slightly more complex structure is given
by the following button which shows a portion of this Web page coded in XML.
It is important to know that XML tags represent the structure of the information they
contain rather than its layout or styling. In the XML code for this Web page,
<Head1>
, <Head2>
, and <Paragraph>
tags, along with the enclose <Personnel>
information, identify sections
of content; they do not imply how the content looks or even whether the information will
become a Web page. The XML structure only identifies information content and its internal
relationships. Perhaps this content will become a Web page; perhaps it will be written to
CD-ROM, input to a typesetter for printing, or used simply to extract subsets of information
for subsequent processing. The information is in a format for a wide variety of uses.
Certainly, much of XML processing is in formatting the information for Web page display.
The following button, for example, transforms the above XML structure into a formatted
Web page by applying a style sheet to the marked up information.
A later tutorial describes how to format XML data for Web page display. At present, just
keep in mind that XML information represents structured content that can be transformed
and used for a variety of purposes.
XML Processing
The usefulness of an XML data structure is in the kinds of information processing activities
that can be applied to it. Although it is important to be able to transmit data in a common
format among different computer systems, the enclosed information must also be accessible
to computer programs to input the structure, search it for needed information, extract data
items for processing, update the structure with changed information, and display selected
information on a Web page or in other formats.
All of these typical information processing activities are available for application against
XML data structures. This processing can take place at the browser or at the server using
an XML processor, or parser, to navigate the data structure and extract data values from
it.
Browser-based XML Processing
At the browser, XML data can be embedded on a Web page for delivery to the client, or it can
be accessed through a URL that links to an external XML document. Browser scripts written in
JavaScript, can perform XML processing, or XML data can be formatted for display with special
style sheets compatible with the data structures.
Server-based XML Processing
At the server, XML data can be created and consumed with server-based programs such as PHP or
ASP.NET. Both environments allow you to work with XML documents and their data, performing
common read, write, search, update, and format transformations against the information. A
full complement of software classes permit processing of XML data in much the same way as
you would process file and database structures.
These tutorials explore the variety of ways to create, access, process, and output
XML-formatted information. Focus is on both browser-based processing and server-based
processing. The tutorials assume you have familiarity with HTML, CSS style sheets, and
Javascript programming in the browser.