Introduction to DTD
A Document Type Definition (DTD) is a document that specifies the legal building blocks
for an XML file. The DTD defines the structure of an XML file and contains a list of
elements and attributes that can make up the XML document.
Since XML allows users to structure, store, and transport data from all sorts of
applications, it is important to ensure that the shared data is in a format that
can be used and properly handled by all parties. The use of a DTD is important
because it enables independent groups of people to agree to use a standard DTD for
interchanging data. Also, applications can use a standard DTD to verify that data
received from the outside world is valid.
The use of a DTD specifies, in effect, the syntax of an XML file. XML files that
follow the syntax rules specified by a DTD are said to be valid.
Recall, an XML document can contain any number of elements or tags. An XML element
is everything from the element's start tag (
to the element's end tag (
</element>). An element can
contain other elements, simple text or a mixture of both. Elements can also have
attributes. In order to define the legal elements that an XML file can contain,
the DTD ELEMENT declaration is used.
In a DTD, elements are declared using the following syntax:
<!ELEMENT element-name (element-content)>
Listing 2-1. DTD element declaration.
element-name refers to the name of the XML element and
element-content refers to
the contents of the element. The content can include text or additional tag
elements. An example is shown below:
Listing 2-2. DTD elements.
The XML file above includes an Employee element.
The Employee element contains five child elements.
Each child element contains text data (a specific first name, last name,
salary, etc...). The DTD below is used to validate this document:
<!ELEMENT Employee (SSN,FirstName,LastName,Salary,Department)>
<!ELEMENT SSN (#PCDATA)>
<!ELEMENT FirstName (#PCDATA)>
<!ELEMENT LastName (#PCDATA)>
<!ELEMENT Salary (#PCDATA)>
Listing 2-3. Employee child elements.
With root or parent elements (those elements that contain other elements or tags),
element-content is specified by listing each of the child
elements in the order in which they are nested inside of the parent element.
element-content is defined as #PCDATA
(parsed character data) when only text is found between the element start tag and
element closing tag. #PCDATA text will be parsed by the parser.
When declaring child elements, it is also possible to specify the number of times
the element can occur within the parent or root element. The "*" sign is used to
declare that a child element can occur zero or more times inside of a parent element.
The "?" sign declares that the child element can occur zero or one time inside of
the parent element. The "+" sign states that the child element must occur one or
more times inside of the parent element.