Stylus Studio XML Editor

Table of contents

Appendices

2.5 Datatype dichotomies

Datatype dichotomies

It is useful to categorize the datatypes defined in this specification along various dimensions, forming a set of characterization dichotomies.

Atomic vs. list vs. union datatypes[top]

Atomic vs. list vs. union datatypes

The first distinction to be made is that between atomic, list and union datatypes.

  • Atomic datatypes are those having values which are regarded by this specification as being indivisible.

  • List datatypes are those having values each of which consists of a finite-length (possibly empty) sequence of values of an atomic datatype.

  • Union datatypes are those whose value spaces and lexical spaces are the union of the value spaces and lexical spaces of one or more other datatypes.

For example, a single token which matches [Nmtoken] from [XML] could be the value of an atomic datatype ([NMTOKEN]); while a sequence of such tokens could be the value of a list datatype ([NMTOKENS]).

Atomic datatypes[top]

Atomic datatypes

atomic datatypes can be either primitive or derived. The value space of an atomic datatype is a set of "atomic" values, which for the purposes of this specification, are not further decomposable. The lexical space of an atomic datatype is a set of literals whose internal structure is specific to the datatype in question.

List datatypes[top]

List datatypes

Several type systems (such as the one described in [ISO11404]) treat list datatypes as special cases of the more general notions of aggregate or collection datatypes.

list datatypes are always derived. The value space of a list datatype is a set of finite-length sequences of atomic values. The lexical space of a list datatype is a set of literals whose internal structure is a white space separated sequence of literals of the atomic datatype of the items in the list (where whitespace matches [S] in [XML]).

The atomic datatype that participates in the definition of a list datatype is known as the itemType of that list datatype.

NOTE: 
<simpleType name='sizes'>
  <list itemType='decimal'/>
</simpleType>
<cerealSizes xsi:type='sizes'> 8 10.5 12 </cerealSizes>

A list datatype can be derived from an atomic datatype whose lexical space allows whitespace (such as [string] or [anyURI]). In such a case, regardless of the input, list items will be separated at whitespace boundaries.

NOTE: 
<simpleType name='listOfString'>
  <list itemType='string'/>
</simpleType>
<someElement xsi:type='listOfString'>
this is not list item 1
this is not list item 2
this is not list item 3
</someElement>

In the above example, the value of the someElement element is not a list of length 3; rather, it is a list of length 18.

When a datatype is derived from a list datatype, the following constraining facets apply:

For each of length, maxLength and minLength, the unit of length is measured in number of list items. The value of whiteSpace is fixed to the value collapse.

The [Canonical Lexical Representation] for the list datatype is defined as the lexical form in which each item in the list has the canonical lexical representation of its itemType.

Union datatypes[top]

Union datatypes

The value space and lexical space of a union datatype are the union of the value spaces and lexical spaces of its memberTypes. union datatypes are always derived. Currently, there are no built-in union datatypes.

NOTE: 

A prototypical example of a union type is the [maxOccurs attribute] on the [element element] in XML Schema itself: it is a union of nonNegativeInteger and an enumeration with the single member, the string "unbounded", as shown below.

  <attributeGroup name="occurs">
    <attribute name="minOccurs" type="nonNegativeInteger"
    	default="1"/>
    <attribute name="maxOccurs">
      <simpleType>
        <union>
          <simpleType>
            <restriction base='nonNegativeInteger'/>
          </simpleType>
          <simpleType>
            <restriction base='string'>
              <enumeration value='unbounded'/>
            </restriction>
          </simpleType>
        </union>
      </simpleType>
    </attribute>
  </attributeGroup>

Any number (greater than 1) of atomic or list datatypes can participate in a union type.

The datatypes that participate in the definition of a union datatype are known as the memberTypes of that union datatype.

The order in which the memberTypes are specified in the definition (that is, the order of the <simpleType> children of the <union> element, or the order of the [QName]s in the memberTypes attribute) is significant. During validation, an element or attribute's value is validated against the memberTypes in the order in which they appear in the definition until a match is found. The evaluation order can be overridden with the use of [xsi:type].

NOTE: 

For example, given the definition below, the first instance of the <size> element validates correctly as an [integer], the second and third as [string].

  <xsd:element name='size'>
    <xsd:simpleType>
      <xsd:union>
        <xsd:simpleType>
          <xsd:restriction base='integer'/>
        </xsd:simpleType>
        <xsd:simpleType>
          <xsd:restriction base='string'/>
        </xsd:simpleType>
      </xsd:union>
    </xsd:simpleType>
  </xsd:element>
  <size>1</size>
  <size>large</size>
  <size xsi:type='xsd:string'>1</size>

The [Canonical Lexical Representation] for a union datatype is defined as the lexical form in which the values have the canonical lexical representation of the appropriate memberTypes.

NOTE: 

A datatype which is atomic in this specification need not be an "atomic" datatype in any programming language used to implement this specification. Likewise, a datatype which is a list in this specification need not be a "list" datatype in any programming language used to implement this specification. Furthermore, a datatype which is a union in this specification need not be a "union" datatype in any programming language used to implement this specification.

Primitive vs. derived datatypes[top]

Primitive vs. derived datatypes

Next, we distinguish between primitive and derived datatypes.

  • Primitive datatypes are those that are not defined in terms of other datatypes; they exist ab initio.

  • Derived datatypes are those that are defined in terms of other datatypes.

For example, in this specification, [float] is a well-defined mathematical concept that cannot be defined in terms of other datatypes, while a [integer] is a special case of the more general datatype [decimal].

There exists a conceptual datatype, whose name is anySimpleType, that is the simple version of the ur-type definition from [structural-schemas]. anySimpleType can be considered as the base type of all primitive types. The value space of anySimpleType can be considered to be the union of the value spaces of all primitive datatypes.

The datatypes defined by this specification fall into both the primitive and derived categories. It is felt that a judiciously chosen set of primitive datatypes will serve the widest possible audience by providing a set of convenient datatypes that can be used as is, as well as providing a rich enough base from which the variety of datatypes needed by schema designers can be derived.

In the example above, [integer] is derived from [decimal].

NOTE: 

A datatype which is primitive in this specification need not be a "primitive" datatype in any programming language used to implement this specification. Likewise, a datatype which is derived in this specification need not be a "derived" datatype in any programming language used to implement this specification.

As described in more detail in [XML Representation of Simple Type Definition Schema Components], each user-derived datatype must be defined in terms of another datatype in one of three ways: 1) by assigning constraining facets which serve to restrict the value space of the user-derived datatype to a subset of that of the base type; 2) by creating a list datatype whose value space consists of finite-length sequences of values of its itemType; or 3) by creating a union datatype whose value space consists of the union of the value space its memberTypes.

Derived by restriction[top]

Derived by restriction

A datatype is said to be derived by restriction from another datatype when values for zero or more constraining facets are specified that serve to constrain its value space and/or its lexical space to a subset of those of its base type.

Every datatype that is derived by restriction is defined in terms of an existing datatype, referred to as its base type. base types can be either primitive or derived.

Derived by list[top]

Derived by list

A list datatype can be derived from another datatype (its itemType) by creating a value space that consists of a finite-length sequence of values of its itemType.

Derived by union[top]

Derived by union

One datatype can be derived from one or more datatypes by unioning their value spaces and, consequently, their lexical spaces.

Built-in vs. user-derived datatypes[top]

Built-in vs. user-derived datatypes
  • Built-in datatypes are those which are defined in this specification, and can be either primitive or derived;

  • User-derived datatypes are those derived datatypes that are defined by individual schema designers.

Conceptually there is no difference between the built-in derived datatypes included in this specification and the user-derived datatypes which will be created by individual schema designers. The built-in derived datatypes are those which are believed to be so common that if they were not defined in this specification many schema designers would end up "reinventing" them. Furthermore, including these derived datatypes in this specification serves to demonstrate the mechanics and utility of the datatype generation facilities of this specification.

NOTE: 

A datatype which is built-in in this specification need not be a "built-in" datatype in any programming language used to implement this specification. Likewise, a datatype which is user-derived in this specification need not be a "user-derived" datatype in any programming language used to implement this specification.