XML Documents

XML is a tag-based language used to describe and structure data. There are, however, no prescribed tags to use. As the data designer it is your job to create the necessary tags to bring order to a set of data.

Hierarchical Data Structures

For the most part, sets of data can be visualized as a hierarchy of data items. That is, the overall structure of data is composed of logical subunits which, in turn, are composed of lower-level subunits, working down the hierarchy to the lowest level of individual data items. If you have worked with databases, for example, you can recognize a hierarchy in the structure of a database table. A table is a three-level hierarchy in which the "table" is comprised of "records" which, in turn, are comprised of "fields" of data. Consider the following database table structure with three records and three data fields.

**Table** (table)
Field 1	Field 2	Field 3
data 1.1	data 1.2	data 1.3
data 2.1	data 2.2	data 2.3
data 3.1	data 3.2	data 3.3

This table can be visualized as a hierarchical data structure in the follow way, with data fields contained within (and subordinate to) records, and with records contained within (and subordinate to) the overall table itself.

Table
    Record
        Field1
        Field2
        Field3
    Record
        Field1
        Field2
        Field3
    Record
        Field1
        Field2
        Field3

Listing 1-2. Hierarchal data structure.

This structure can be visualized graphically as a tree structure (more accurately an inverted tree with the "trunk" at the top and "leaves" at the bottom) as shown below. The principle is that data items and their combinations are contained within a hierarchy of logical relationships that describe the structure of information content.

Figure 1-1.

Tree structure.

XML Data Structures

An XML data structure describes this content and relationships in a similar manner, with XML tags, or elements, used to name and organize the data. The following XML data structure is a tree of data elements representing the hierarchical organization of the previous database table.

<Table>
    <Record>
        <Field1>data 1.1</Field1>
        <Field2>data 1.2</Field2>
        <Field3>data 1.3</Field3>
    </Record>
    <Record>
        <Field1>data 2.1</Field1>
        <Field2>data 2.2</Field2>
        <Field3>data 2.3</Field3>
    </Record>
    <Record>
        <Field1>data 3.1</Field1>
        <Field2>data 3.2</Field2>
        <Field3>data 3.3</Field3>
    </Record>
</Table>

Listing 1-3. Hierarchal organization.

As mentioned above there are no predefined XML tags for identifying and packaging data. It is up to the data designer to create XML elements and to organize them to best describe a data structure. If a data structure were to be designed for, say, a set of personnel records, it would make sense to create elements that name the data items and describe their relationships as was done for the previous <Personnel> example.

<Personnel>
    <Employee>
        <SSN>111-11-1111</SSN>
        <FirstName>Ann</FirstName>
        <LastName>Adams</LastName>
        <Salary>65000.00</Salary>
        <Department>Accounting</Department>
    </Employee>
    <Employee>
        <SSN>222-22-2222</SSN>
        <FirstName>Beth</FirstName>
        <LastName>Baker</LastName>
        <Salary>55000.00</Salary>
        <Department>Marketing</Department>
    </Employee>
    <Employee>
        <SSN>333-33-3333</SSN>
        <FirstName>Cecil</FirstName>
        <LastName>Carleton</LastName>
        <Salary>60000.00</Salary>
        <Department>Information Technology</Department>
    </Employee>
    <Employee>
        <SSN>444-44-4444</SSN>
        <FirstName>David</FirstName>
        <LastName>Davis</LastName>
        <Salary>59000.00</Salary>
        <Department>Information Technology</Department>
    </Employee>
    <Employee>
        <SSN>555-55-5555</SSN>
        <FirstName>Ellen</FirstName>
        <LastName>Edwards</LastName>
        <Salary>62000.00</Salary>
        <Department>Information Technology</Department>
    </Employee>
</Personnel>

Listing 1-4. Personnel hierarchal organization.

Here, a <Personnel> element is composed of <Employee> elements, each of which contains the person's <SSN>, <FirstName>, <LastName>, <Salary>, and <Department> elements. The tags clearly describe the content and organization of the structure of data. They can be visualized as the following hierarchical tree structure.

Figure 1-2.

Tree structure.

The above data structure is fairly simple. Some structures can become very complex, and XML is designed to bring order to that complexity. However, in most day-to-day Web activity simple structures like the above predominate. After all, a large part of XML processing has to do with converting database tables to XML structures since most Web-processed data reside in databases. In fact, modern database management systems include features for reading and writing XML data just for this purpose.

Parents and Children

XML element tags and the data items they enclose are often called XML nodes, packages of data that have structural relationships among one another. The full set of nodes and their hierarchical relationships are called XML documents.

When working with XML, the relationships between data elements and the logical units to which they belong are described as parent and child node relationships. Referring back to the previous <Personnel> document, the <Employee> element is a parent node; it contains the child nodes <SSN>, <FirstName>, <LastName>, <Salary>, and <Department>. Likewise, the <Employee> element is a child node relative to the overall <Personnel> parent node. In other words, a node in the data structure is a child if it is a member of a higher-level organizing node; an node is a parent if it contains lower-level nodes. The terms "parent" and "child" are simply shorthand ways of describing the hierarchical relationships among data elements.

The term sibling refers to nodes which have the same parent. In the above example, the child nodes <SSN>, <FirstName>, <LastName>, <Salary>, and <Department> are siblings (brothers and sisters) since they share the same <Employee> parent element. Likewise, <Employee> nodes are siblings of the <Personnel> node.

All XML documents must contain a single root node. The root element is the tag encompassing the entire data tree; it is the overall parent element. In the above example, <Personnel> is the root node since it contains all other parent and child nodes.

The above example shows how a typical database table can be reformulated as an XML data structure. However, XML structures can represent more complex data relationships, even those that cannot be conveniently packaged as relational database tables. One of the benefits of XML is that it is not limited to representing structured information. Semi-structured information -- containing optional, missing, or alternate data -- can be coded as XML. For these tutorials, though, primacy is given to structured information since it is the most common occurence.