XML Schemas
What is a Schema
Like a DTD, a Schema is used to define the legal building blocks for an XML file.
Unlike DTDs, Schemas are based on XML. Schemas can be viewed as an alternate to DTDs.
Because of their flexiblity, it is likely that Schemas will replace DTDs in the
future because unlike DTDs they are richer and more powerful, they are written in
XML, they support data types, and they are extensible to future additions.
The XML Schema language is also referred to as XML Schema Definition (XSD). This
section will introdue the major elements that make up a Schema Definition and
demonstrate how to use a Schema with an XML file.
The <schema>
Element
The <schema>
Element is the root element of every XML Schema
Definition (XSD). A sample Schema definition is shown below:
sample.xsd
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
Schema content...
</xsd:schema>
Listing 3-1. Sample Schema definition.
The <schema>
tag includes the a namespace attribute. The
namespace indicates that the elements and data types used in the schema come
from the "http://www.w3.org/2001/XMLSchema" namespace. It also specifies that
the elements and data types that come from the "http://www.w3.org/2001/XMLSchema".
Here the namespace is prefixed with xsd
. It is a convention to
use xsd
or xs
as a prefix for the XML Schema
namespace, but that decision is purely personal. One can choose to use a prefix
ABC for the XML Schema namespace, which is legal, but doesn't make much sense.
Using meaningful namespace prefixes add clarity to the XML document.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
Listing 3-2. XML Schema namespace.
In the sample code above, xmlns
is like a reserved word, which
is used only to declare a namespace. In other words, xmlns is used for binding
namespaces, and is not itself bound to any namespace. Therefore, the above example
is read as binding the prefix "xsd" with the namespace
"http://www.w3.org/2001/XMLSchema." The prefix will be used to identify all of
the elements that make up the schema.
One of the primary motivations for defining an XML namespace is to avoid naming
conflicts when using and re-using multiple vocabularies. XML Schema is used to
create a vocabulary for an XML instance, and uses namespaces heavily. Thus,
having a sound grasp of the namespace concept is essential for understanding
XML Schema and instance validation overall.
Referencing a Schema
After a Schema definition is created, it can be associated with an XML file.
This is referred to as an instance of the Schema. A sample XML document,
including a Schema reference is shown below:
<?xml version="1.0" ?>
<TreeInventory xmlns="http://www.trees.com"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.trees.com trees.xsd">">
<Tree tid="1">
<ItemNumber>1047</ItemNumber>
<TreeName>
<CommonName>Colorado Blue Spruce</CommonName>
<ScientificName>Picea pungens</ScientificName>
</TreeName>
<Description>A magnificent sight of silver blue-green spruce.</Description>
<Price>1.99</Price>
<Quantity>50</Quantity>
<Type>Evergreen</Type>
<Picture>1047.jpg</Picture>
</Tree>
</TreeInventory>
Listing 3-3. Referencing a Schema.
The reference to the Schema file is added to the opening root element of the XML
file. The first part, xmlns="http://www.trees.com"
specifies
the default namespace declaration. This declaration tells the schema-validator
that all the elements used in this XML document are declared in the
"http://www.trees.com" namespace. XML namespaces are used for providing uniquely
named elements and attributes in an XML instance. They are defined by a W3C
recommendation called Namespaces in XML. An XML instance may contain element or
attribute names from more than one XML Vocabulary. If each vocabulary is given a
namespace then the ambiguity between identically named elements or attributes can
be resolved.
In the example above, the XML instance containes references to a tree name and a
type. Both the TreeName and the Type element could have a child element "id".
References to the element "id" would therefore be ambiguous unless the two
identically named but semantically different elements were brought under
namespaces that would differentiate them.
The second part,
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
indicates the the location of the Schema instance that will be used. The second
value is the location of the XML schema to use for that namespace:
xsi:schemaLocation="http://www.trees.com trees.xsd
In simple cases, an XML Schema is not required to have a namespace. To specify the
location for an XML Schema that does not have a target namespace, use the
noNamespaceSchemaLocation
attribute. The XML Schema referenced
in this attribute cannot have a target namespace. Because this attribute does not
take a list of URLs, you can only specify one schema location.
<?xml version="1.0" ?>
<TreeInventory xsi:noNamespaceSchemaLocation="Trees.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">">
<Tree tid="1">
<ItemNumber>1047</ItemNumber>
<TreeName>
<CommonName>Colorado Blue Spruce</CommonName>
<ScientificName>Picea pungens</ScientificName>
</TreeName>
<Description>A magnificent sight of silver blue-green spruce.</Description>
<Price>1.99</Price>
<Quantity>50</Quantity>
<Type>Evergreen</Type>
<Picture>1047.jpg</Picture>
</Tree>
</TreeInventory>
Listing 3-4. Specifying Schema location.
In the example above, no target namespace is specified. The first part,
xsi:noNamespaceSchemaLocation
specifies the location of the
Schema file to be associated with the XML document.
According to the World Wide Web Consortium (W3C) XML Schema Recommendation,
XML instance documents can have both xsi:schemaLocation
and
xsi:noNamespaceSchemaLocation
attributes specified.