XML Documents
XML is a tag-based language used to describe and structure data. There are, however,
no prescribed tags to use. As the data designer it is your job to create the
necessary tags to bring order to a set of data.
Hierarchical Data Structures
For the most part, sets of data can be visualized as a hierarchy of data items.
That is, the overall structure of data is composed of logical subunits which, in
turn, are composed of lower-level subunits, working down the hierarchy to the lowest
level of individual data items. If you have worked with databases, for example, you
can recognize a hierarchy in the structure of a database table. A table is a three-level
hierarchy in which the "table" is comprised of "records" which, in turn, are comprised
of "fields" of data. Consider the following database table structure with three records
and three data fields.
Table (table)
Field 1 |
Field 2 |
Field 3 |
data 1.1 |
data 1.2 |
data 1.3 |
data 2.1 |
data 2.2 |
data 2.3 |
data 3.1 |
data 3.2 |
data 3.3 |
This table can be visualized as a hierarchical data structure in the follow way, with data
fields contained within (and subordinate to) records, and with records contained within
(and subordinate to) the overall table itself.
Table
Record
Field1
Field2
Field3
Record
Field1
Field2
Field3
Record
Field1
Field2
Field3
Listing 1-2. Hierarchal data structure.
This structure can be visualized graphically as a tree structure (more accurately an inverted
tree with the "trunk" at the top and "leaves" at the bottom) as shown below. The principle is
that data items and their combinations are contained within a hierarchy of logical relationships
that describe the structure of information content.
Figure 1-1.
Tree structure.
XML Data Structures
An XML data structure describes this content and relationships in a similar manner, with XML
tags, or elements, used to name and organize the data. The following XML data structure is a
tree of data elements representing the hierarchical organization of the previous database table.
<Table>
<Record>
<Field1>data 1.1</Field1>
<Field2>data 1.2</Field2>
<Field3>data 1.3</Field3>
</Record>
<Record>
<Field1>data 2.1</Field1>
<Field2>data 2.2</Field2>
<Field3>data 2.3</Field3>
</Record>
<Record>
<Field1>data 3.1</Field1>
<Field2>data 3.2</Field2>
<Field3>data 3.3</Field3>
</Record>
</Table>
Listing 1-3. Hierarchal organization.
As mentioned above there are no predefined XML tags for identifying and packaging data.
It is up to the data designer to create XML elements and to organize them to best describe
a data structure. If a data structure were to be designed for, say, a set of personnel records,
it would make sense to create elements that name the data items and describe their relationships
as was done for the previous <Personnel>
example.
<Personnel>
<Employee>
<SSN>111-11-1111</SSN>
<FirstName>Ann</FirstName>
<LastName>Adams</LastName>
<Salary>65000.00</Salary>
<Department>Accounting</Department>
</Employee>
<Employee>
<SSN>222-22-2222</SSN>
<FirstName>Beth</FirstName>
<LastName>Baker</LastName>
<Salary>55000.00</Salary>
<Department>Marketing</Department>
</Employee>
<Employee>
<SSN>333-33-3333</SSN>
<FirstName>Cecil</FirstName>
<LastName>Carleton</LastName>
<Salary>60000.00</Salary>
<Department>Information Technology</Department>
</Employee>
<Employee>
<SSN>444-44-4444</SSN>
<FirstName>David</FirstName>
<LastName>Davis</LastName>
<Salary>59000.00</Salary>
<Department>Information Technology</Department>
</Employee>
<Employee>
<SSN>555-55-5555</SSN>
<FirstName>Ellen</FirstName>
<LastName>Edwards</LastName>
<Salary>62000.00</Salary>
<Department>Information Technology</Department>
</Employee>
</Personnel>
Listing 1-4. Personnel hierarchal organization.
Here, a <Personnel>
element is composed of <Employee>
elements,
each of which contains the person's <SSN>
, <FirstName>
,
<LastName>
, <Salary>
, and <Department>
elements. The tags clearly describe the content and organization of the structure of data. They
can be visualized as the following hierarchical tree structure.
Figure 1-2.
Tree structure.
The above data structure is fairly simple. Some structures can become very complex, and XML is
designed to bring order to that complexity. However, in most day-to-day Web activity simple
structures like the above predominate. After all, a large part of XML processing has to do with
converting database tables to XML structures since most Web-processed data reside in databases.
In fact, modern database management systems include features for reading and writing XML data
just for this purpose.
Parents and Children
XML element tags and the data items they enclose are often called XML nodes,
packages of data that have structural relationships among one another. The full set of nodes
and their hierarchical relationships are called XML documents.
When working with XML, the relationships between data elements and the logical units to which
they belong are described as parent and child node relationships.
Referring back to the previous <Personnel>
document, the <Employee>
element is a parent node; it contains the child nodes <SSN>
,
<FirstName>
, <LastName>
, <Salary>
, and
<Department>
. Likewise, the <Employee>
element is a child
node relative to the overall <Personnel>
parent node. In other words, a node
in the data structure is a child if it is a member of a higher-level organizing node; an node
is a parent if it contains lower-level nodes. The terms "parent" and "child" are simply shorthand
ways of describing the hierarchical relationships among data elements.
The term sibling refers to nodes which have the same parent. In the above example,
the child nodes <SSN>
, <FirstName>
, <LastName>
,
<Salary>
, and <Department>
are siblings (brothers and sisters)
since they share the same <Employee>
parent element. Likewise,
<Employee>
nodes are siblings of the <Personnel>
node.
All XML documents must contain a single root node. The root element is the tag encompassing the
entire data tree; it is the overall parent element. In the above example, <Personnel>
is the root node since it contains all other parent and child nodes.
The above example shows how a typical database table can be reformulated as an XML data structure.
However, XML structures can represent more complex data relationships, even those that cannot be
conveniently packaged as relational database tables. One of the benefits of XML is that it is not
limited to representing structured information. Semi-structured information -- containing optional,
missing, or alternate data -- can be coded as XML. For these tutorials, though, primacy is given
to structured information since it is the most common occurence.