eXtensible Markup Language

The eXtensible Markup Language (XML) is used to describe data and its structure. Its purpose is to provide a method for packaging data and for transmitting it between computers. In one sense XML is no different in purpose from text files and databases -- it is used to create data stores. Its significance, though, is in establishing a common data formatting standard that is recognizable and shareable among widely diverse computer systems. In addition, it is text based; therefore, it permits easy data exchange across the Web using common HTTP protocols.

XML Markup

XML is a data markup language that, like HTML, uses tags to describe the structure of data. Unlike the HTML Web page markup language, it does not have predefined tags with special meanings. XML tags are created by the data designer to fit the purpose of the data structure. Different sets of tags would be used, for instance, to describe a memo, a letter, a book chapter, a financial statement, an email message, a legal document, a restaurant menu, a personnel record, a course catalog, a driver's license, a television listing, or a thousand other data collections.

As you might recognize from the above listing, all variety of data structures can be represented in XML. It is not limited to linear lines of text as is common to text files; neither is it limited to relational information organized into the rows and columns of database tables. Virtually any structure of data can be represented by XML. Common to all of these data structures, however, are the XML standards that make them all accessible by and transmittable between any computers on the Internet.

XML Data Structures

Much of the information that needs to be shareable between computers does, of course, reside in databases. The following table shows the structure of information in a "Personnel" table containing "Employee" records, each of which is composed of a "SSN" field, a "FirstName" field, a "LastName" field, a "Salary" field, and a "Department" field.

**Personnel** (table)
SSN	FirstName	LastName	Salary	Department
111-11-1111	Ann	Adams	65000.00	Accounting
222-22-2222	Beth	Baker	55000.00	Marketing
333-33-3333	Cecil	Carleton	60000.00	Information Technology
444-44-4444	David	Davis	59000.00	Information Technology
555-55-5555	Ellen	Edwards	62000.00	Accounting

This information is easily accessible within an internal processing environment composed of similar hardware and software. Processing routines use known access methods to get to the known location of the database through established permissions. However, problems arise when this information is needed externally. Users at remote locations, especially those outside the organization, may not have the knowledge or permissions to access it. Its access is private to the owner but needed by the client. The issue, then, is how to make this information available to outside clients without giving them access to the internal system.

The solution is through an XML data structure. Internal processing routines can privately select and gather the needed information, package it as a simple text-based data structure while retaining the data relationships, and ship the package of information to the client across the Web. The client has access to the information without needing knowledge of or permissions to the original data store, even if the client uses conflicting hardware and software. The XML data is in a common format accessible through the common protocols of the Web.

It is easy to represent the above database table as an XML data structure. As shown below, the database table name <Personnel> is used as the enclosing tag for the entire structure, each record of which is identified by an <Employee> tag. Database field names have been applied as the XML tag names <SSN>, <FirstName>, <LastName>, and <Department> to identify the data elements corresponding to the database fields that are permissible for outside viewing.

<Personnel>
    <Employee>
        <SSN>111-11-1111</SSN>
        <FirstName>Ann</FirstName>
        <LastName>Adams</LastName>
        <Department>Accounting</Department>
    </Employee>
    <Employee>
        <SSN>222-22-2222</SSN>
        <FirstName>Beth</FirstName>
        <LastName>Baker</LastName>
        <Department>Marketing</Department>
    </Employee>
    <Employee>
        <SSN>333-33-3333</SSN>
        <FirstName>Cecil</FirstName>
        <LastName>Carleton</LastName>
        <Department>Information Technology</Department>
    </Employee>
    <Employee>
        <SSN>444-44-4444</SSN>
        <FirstName>David</FirstName>
        <LastName>Davis</LastName>
        <Department>Information Technology</Department>
    </Employee>
    <Employee>
        <SSN>555-55-5555</SSN>
        <FirstName>Ellen</FirstName>
        <LastName>Edwards</LastName>
        <Department>Accounting</Department>
    </Employee>
</Personnel>

Listing 1-1. XML data structure.

This data structure can be saved as a simple text document to make it available to any Web pages which need to retrieve this information. In fact, it has been saved as file Personnel.xml in the same directory as this Web page. When you click the following button, this XML document is opened in a separate browser window:

In this view of the data, elements are preceded by "+" and "-" symbols for expanding and collapsing the structure. The output isn't particularly exciting, granted, but the structured information is now available to any computer on the World Wide Web simply by issuing the associated URL.

It was mentioned above that XML can represent a wide variety of data structures, not just those that represent relational data. A slightly more complex structure is given by the following button which shows a portion of this Web page coded in XML.

It is important to know that XML tags represent the structure of the information they contain rather than its layout or styling. In the XML code for this Web page, <Head1>, <Head2>, and <Paragraph> tags, along with the enclose <Personnel> information, identify sections of content; they do not imply how the content looks or even whether the information will become a Web page. The XML structure only identifies information content and its internal relationships. Perhaps this content will become a Web page; perhaps it will be written to CD-ROM, input to a typesetter for printing, or used simply to extract subsets of information for subsequent processing. The information is in a format for a wide variety of uses.

Certainly, much of XML processing is in formatting the information for Web page display. The following button, for example, transforms the above XML structure into a formatted Web page by applying a style sheet to the marked up information.

A later tutorial describes how to format XML data for Web page display. At present, just keep in mind that XML information represents structured content that can be transformed and used for a variety of purposes.

XML Processing

The usefulness of an XML data structure is in the kinds of information processing activities that can be applied to it. Although it is important to be able to transmit data in a common format among different computer systems, the enclosed information must also be accessible to computer programs to input the structure, search it for needed information, extract data items for processing, update the structure with changed information, and display selected information on a Web page or in other formats.

All of these typical information processing activities are available for application against XML data structures. This processing can take place at the browser or at the server using an XML processor, or parser, to navigate the data structure and extract data values from it.

Browser-based XML Processing

At the browser, XML data can be embedded on a Web page for delivery to the client, or it can be accessed through a URL that links to an external XML document. Browser scripts written in JavaScript, can perform XML processing, or XML data can be formatted for display with special style sheets compatible with the data structures.

Server-based XML Processing

At the server, XML data can be created and consumed with server-based programs such as PHP or ASP.NET. Both environments allow you to work with XML documents and their data, performing common read, write, search, update, and format transformations against the information. A full complement of software classes permit processing of XML data in much the same way as you would process file and database structures.

These tutorials explore the variety of ways to create, access, process, and output XML-formatted information. Focus is on both browser-based processing and server-based processing. The tutorials assume you have familiarity with HTML, CSS style sheets, and Javascript programming in the browser.