Multi-Schema Project:
Zero, One, or Many Namespaces?
(A Collectively Developed Set of Schema Design Guidelines)
Table of Contents
Issue:
In a project where multiple schemas are created, should we give each schema a different targetNamespace, or should we give all the schemas the same targetNamespace, or should some of the schemas have no targetNamespace? Introduction
In a typical project many schemas will be created. The schema designer is then confronted with this issue: "shall I define one targetNamespace for all the schemas, or shall I create a different targetNamespace for each schema, or shall I have some schemas with no targetNamespace?" What are the tradeoffs? What guidance would you give someone starting on a project that will create multiple schemas?Here are the three design approaches for dealing with this issue:
To describe and judge the merits of the three design approaches it will be useful to take an example and see each approach "in action".
Example: XML Data Model of a Company
Imagine a project which involves creating a model of a company using XML Schemas. One very simple model is to divide the schema functionality along these lines:"A company is comprised of people and products."Here are the company, person, and product schemas using the three design approaches.
[1] Heterogeneous Namespace Design
This design approach says to give each schema a different targetNamespace. Below are the three schemas designed using this design approach. Observe that each schema has a different targetNamespace.Product.xsd
Person.xsd
Company.xsd
Note the three namespaces that were created by the schemas:[2] Homogeneous Namespace Design
This design approach says to create a single, umbrella targetNamespace for all the schemas. Below are the three schemas designed using this approach. Observe that all schemas have the same targetNamespace.Product.xsd
Person.xsd
Company.xsd
Note that all three schemas have the same targetNamespace:Also note the mechanism used for accessing components in other schemas which have the same targetNamespace: <include>. When accessing components in a schema with a different namespace the <import> element is used, as we saw above in the Heterogeneous Design.
[3] Chameleon Namespace Design
This design approach says to give the "main" schema a targetNamespace, and the "supporting" schemas have no targetNamespace. In our example, the company schema is the main schema. The person and product schemas are supporting schemas. Below are the three schemas using this design approach:Product.xsd (no targetNamespace)
Person.xsd (no targetNamespace)
Company.xsd (main schema, uses the no-namespace-schemas)
There are two things to note about this design approach:First, as shown above, a schema is able to access components in schemas that have no targetNamespace, using <include>. In our example, the company schema uses the components in Product.xsd and Person.xsd (and they have no targetNamespace).
Second, note the chameleon-like characteristics of schemas with no targetNamespace:
- The components in the schemas with no targetNamespace get namespace-coerced. That is, the components "take-on" the targetNamespace of the schema that is doing the <include>
- For example, ProductType in Products.xsd gets implicitly coerced into the company targetNamespace. This is the reason that the Product element was able to reference ProductType in the default namespace using type="ProductType". Ditto for the PersonType in Person.xsd.
"Chameleon effect" ... This is a term coined by Henry Thompson to describe the ability of components in a schema with no targetNamespace to take-on the namespace of other schemas. This is powerful!
Impact of Design Approach on Instance Documents
Above we have shown how the schemas would be designed using the three design approaches. Let's turn now to the instance document. Does an instance document differ depending on the design approach? All of the above schemas have been designed to expose the namespaces in instance documents (as directed by: elementFormDefault="qualified"). If they had instead all used elementFormDefault="unqualified" then instance documents would all have this form:It is when the schemas expose their namespaces in instance documents that differences appear. In the above schemas, they all specified elementFormDefault="qualified", thus exposing their namespaces in instance documents. Let's see what the instance documents look like for each design approach:[1] Company.xml (conforming to the multiple targetNamespaces version)
Note that:- there needs to be a namespace declaration for each namespace
- the elements must all be uniquely qualified (explicitly or with a default namespace)
[2] Company.xml (conforming to the single, umbrella targetNamespace version)
Since all the schemas are in the same namespace the instance document is able to take advantage of that by using a default namespace.[3] Company.xml (conforming to the main targetNamespace with supporting no-targetNamespace version)
Both of the schemas that have no targetNamespace take on the the company targetNamespace (ala the Chameleon effect). Thus, all components are in the same targetNamespace and the instance document takes advantage of this by declaring a default namespace.
<redefine> - only Applicable to Homogeneous and Chameleon Namespace Designs
The <redefine> element is used to enable access to components in another schema, while simultaneously giving the capability to modify zero or more of the components. Thus, the <redefine> element has a dual functionality:- it does an implicit <include>. Thus it enables access to all the components in the referenced schema
- it enables you to redefine zero or more of the components in the referenced schema, i.e., extend or restrict components
Example. Consider again the Company.xsd schema above. Suppose that it wishes to use ProductType in Product.xsd. However, it would like to extend ProductType to include a product ID. Here's how to do it using redefine:Now the <Product> element in instance documents will contain both <Type> and <ID>, e.g.,The <redefine> element is very powerful. However, it can only be used with schemas with the same targetNamespace or with no targetNamespace. Thus, it only applies to the Homogenous Namespace Design and the Chameleon Namespace Design.
Default Namespace and the Chameleon Namespace Design
If a schema is going to <include> a no-namespace schema (Chameleon schema) then it must have specified the targetNamespace as the default namespace. This is discussed fully in DefaultNamespace.html. Avoiding Name Collisions of Chameleon Components
Name collisions
When a schema uses Chameleon components those components become part of the including schema's targetNamespace, just as though the schema author had typed the element declarations and type definitions inline. If the schema <include>s multiple no-namespace schemas then there will be a chance of name collisions. In fact, the schema may end up not being able to use some of the no-namespace schemas because their use results in name collisions with other Chameleon components. To demonstrate the name collision problem, consider this example:Suppose that there are two schemas with no targetNamespace:
Schema 1 creates no-namespace elements A and B. Schema 2 creates no-namespace elements A, and C. Now if schema 3 <include>s these two no-namespace schemas there will be a name collision:This schema has a name collision - A is defined twice. [Note: it's not an error to have two elements in the same symbol space, provided they have the same type. However, if they have a different type then it is an error, i.e., name collision.]Namespaces are the standard way of avoiding such collisions. Above, if instead the components in 1.xsd and 2.xsd resided in different namespaces then 3.xsd could have <import>ed them and there would be no name collision. [Recall that two elements/types can have the same name if the elements/types are in different namespaces.]
How do we address the name collision problem that the Chameleon design presents? That's next.
Resolving Namespace Collisions using Proxy Schemas
There is a very simple solution to the namespace collision problem: for each no-namespace schema create a companion namespaced-schema (a "proxy schema") that <include>s the no-namespace schema. Then, the main schema <import>s the proxy schemas. Here's an example to demonstrate this approach:With this approach we avoid name collisions. This design approach has the added advantage that it also enables the proxy schemas to customize the Chameleon components using <redefine>.Thus, this approach is a two-step process:
- Create the Chameleon schemas
- Create a proxy schema for each Chameleon schema
The "main" schema <import>s the proxy schemas.The advantage of this two-step approach is that it enables applications to decide on a domain (namespace) for the components that it is reusing. Furthermore, applications are able to refine/customize the Chameleon components. This approach requires an extra step (i.e., creating proxy schemas) but in return it provides a lot of flexibility.
Contrast the above two-step process with the below one-step process where the components are assigned to a namespace from the very beginning:
This achieves the same result as the above two-step version. In this example, the components are not Chameleon. Instead, A, B, and C were hardcoded with a namespace from the very beginning of their life. The downside of this approach is that if main.xsd wants to <redefine> any of the elements it cannot. Also, applications are forced to use a domain (namespace) defined by someone else. These components are in a rigid, static, fixed namespace.
Creating Tools for Chameleon Components
Tools for Chameleon Components
We have seen repeatedly how Chameleon components are able to blend in with the schemas that use them. That is, they adopt the namespace of the schema that <include>s them. How do you write tools for components that can assume so many different faces (namespaces)?Consider this no-namespace schema:
Suppose that we wish to create a tool, T, which must process the two Chameleon components A and B, regardless of what namespace they reside in. The tool must be able to handle the following situation: imagine a schema, main.xsd, which <include>s 1.xsd. In addition, suppose that main.xsd has its own element called A (in a different symbol space, so there's no name collision). For example:How would the tool T be able to distinguish between the Chameleon component A and the local A in an instance document?Chameleon Component Identification
One simple solution is that when you create Chameleon components assign them a global unique id (a GUID). The XML Schema spec allows you to add an attribute, id, to all element, attribute, complexType, and simpleType components. Note that the id attribute is purely local to the schema. There is no representation in the instance documents. This id attribute could be used by a tool to "locate" a Chameleon component, regardless of what "face" (namespace) it currently wears. That is, the tool can open up an instance document using DOM, and the DOM API will provide the tool access to the id value for all components in the instance document.Question: What if the network is down and the DOM is not able to access the schema to bring in the schema information? There is no difference than with today's tools where there are DTDs. For example, suppose that your instance document has a DOCTYPE declaration which asserts that the instance document conforms to a DTD. If you run a tool like an XSL Processor (e.g., XT) and the DTD is not accessible then XT will crash. The solution of course is to make a local copy of the DTD. Likewise, for our Chameleon component tool the solution is to make a local copy of the instance document's schema and all the schemas that it imports/includes.
Best Practice
Above we explored the "design space" for this issue. We looked at the three design approaches in action, both schemas and instance documents. So which design is better? Under what circumstances?When you are reusing schemas that someone else created you should <import> those schemas, i.e., use the Heterogeneous Namespace design. It is a bad idea to copy those components into your namespace, for two reasons: (1) soon your local copies would get out of sync with the other schemas, and (2) you lose interoperability with any existing applications that process the other schema's components.
The interesting case (the case we have been considering throughout this discussion) is how to deal with namespaces in a collection of schemas that you created. Here's our guidelines for this case:Use the Chameleon Design
- with schemas which contain components that have no inherent semantics by themselves,
- with schemas which contain components that have semantics only in the context of an <include>ing schema.
- when you don�t want to hardcode a namespace to a schema, rather you want <include>ing schemas to be able to provide their own application-specific namespace to the schema
Example. A repository of components - such as a schema which defines an array type, or vector, linked list, etc - should be declared with no targetNamespace (i.e., Chameleon).
As a rule of thumb, if your schema just contains type definitions (no element declarations) then that schema is probably a good candidate for being a Chameleon schema.
Use the Homogeneous Namespace Design
- when all of your schemas are conceptually related
- when there is no need to visually identify in instance documents the origin/lineage of each element/attribute. In this design all components come from the same namespace, so you loose the ability to identify in instance documents that "element A comes from schema X". Oftentimes that's okay - you don't want to categorize elements/attributes differently. This design approach is well suited for those situations.
Use the Heterogeneous Namespace Design
- when there are multiple elements with the same name. (Avoid name collision)
- when there is a need to visually identify in instance documents the origin/lineage of each element/attribute. In this design the components come from different namespaces, so you have the ability to identify in instance documents that "element A comes from schema X".
Lastly, as we have seen, in a schema each component can be uniquely identified with an id attribute (this is NOT the same as providing an id attribute to an element in instance documents. We are talking here about a schema-internal way of identifying each schema component.) Consider identifying each schema component using the id attribute. This will enable a finer degree of traceability than is possible using namespaces. The combination of namespaces plus the schema id attribute is a powerful tandem for visually and programmatically identifying components.