Stylus Studio XML Editor

Table of contents

Appendices

2 Sequence Normalization

Sequence Normalization

delCThe XQuery 1.0 and XPath 2.0 Data Model is richer and less constrained than XML. There are valid instances of the data model that have no direct analog in XML. In particular, instances of the data model can contain typed values, sequences, and sequences of typed values. And whereas XML deals only with "documents", instances of the data model can have as their root any node type, simple value, or sequence and may even be empty.

delCThis section describes how to convert an arbitrary instance of the data model into one of several simplified forms. We then describe how these forms are serialized. This greatly simplifies the sections which follow. A serializer is not REQUIRED to implement serialization of arbitrary instances of the data model in this way, provided it produces the same results as this conceptual model.

  1. If the instance of the data model contains any typed or untyped atomic values, or sequences that contain typed or untyped atomic values, convert them to strings: obtain the lexical representation of each value by casting it to an xs:string and replace the value with its string representation. If the value cannot be cast to xs:string, serialization of the instance of the data model is undefined.

  2. If adjacent strings occur in a sequence, replace both values with their concatenation separated by a single space.

  3. If empty sequences occur, replace them with the empty string.

  4. To complete the simplification, perform the following steps interactively iteratively until a simplest form is reached:

    1. If the instance of the data model has as its root an attribute or namespace node, or a QName value, or if it has as its root a sequence which contains one of these items, serialization is undefined.

    2. If the instance of the data model has as its root a single document node, or an element, processing instruction, comment, or text node, or a sequence of only element, processing instruction, comment, and text nodes, it is already in its simplest form.

    3. If the instance of the data model has as its root a sequence of document nodes, or a sequence which contains document nodes, replace each document node with its children in document order.

    4. If the instance of the data model has as its root a string value, or a sequence which contains one or more string values, replace each string value with a text node that contains the same string.

delCIf there are any remaining string values among the children of elements in the instance of the data model, replace them with text nodes that contain the same string values and merge adjacent text nodes.

addD An instance of the data model that is input to the serialization process is a sequence. Prior to serializing a sequence using any of the output methods whose behavior is specified by this document ([serparam]) the serializer MUST first place that input sequence into a normalized form compute a normalized sequence for serialization; it is the normalized sequence that is actually serialized. The purpose of this sequence normalization step is to create a sequence that can be serialized as a well-formed XML document or external general parsed entity, that also reflects the content of the input sequence to the extent possible.

The normalized form sequence for serialization is constructed by applying all of the following rules in order, with the initial sequence being input to the first step, and the sequence that results from any step being used as input to the subsequent step. For any implementation-defined output method, it is implementation-defined whether this sequence normalization process takes place.

addEWhere the process of converting the input sequence to a normalized form sequence indicates that a value MUST be cast to xs:string, that operation is as defined in [casting-to-string] of [FANDO]. The steps in computing the normalized sequence are:

  1. addG If the sequence that is input to serialization is empty, create a sequence S1 that consists of a zero-length string. Otherwise, copy each item in the sequence that is input to serialization to create the new sequence S1.

    delGReplace an empty sequence with a zero-length string.

  2. addG For each item in S1, if the item is atomic, obtain the lexical representation of the item by casting it to an xs:string and copy the string representation to the new sequence; otherwise, copy the item, which will be a node, to the new sequence. The new sequence is S2.

    delGIf the instance of the data model contains any atomic values, or sequences that contain atomic values, convert the atomic values to strings: obtain the lexical representation of each value by casting it to an xs:string and replace the value with its string representation. It is a serialization error if the value cannot be cast to xs:string.

  3. addG For each subsequence of adjacent strings in S2, copy a single string to the new sequence equal to the values of the strings in the subsequence concatenated in order, each separated by a single space. Copy all other items to the new sequence. The new sequence is S3.

    delGReplace all adjacent strings in the sequence with a single string equal to the values of the strings concatenated, each separated by a single space.

  4. addG For each item in S3, if the item is a string, create a text node in the new sequence whose string value is equal to the string; otherwise, copy the item to the new sequence. The new sequence is S4.

    delGReplace any string in the sequence with a text node whose string value is equal to the string.

  5. addG For each item in S4, if the item is a document node, copy its children to the new sequence; otherwise, copy the item to the new sequence. The new sequence is S5.

    delGReplace any document node in the sequence with its children.

  6. addG It is a serialization error if an item in S5 is an attribute node or a namespace node. Otherwise, construct a new sequence, S6, that consists of a single document node and copy all the items in the sequence, which are all nodes, as children of that document node.

    delGIt is a serialization error if an item in the sequence is an attribute node or a namespace node. Otherwise, create a new document node and make all the items in the sequence, which are all nodes, children of that document node.

addGS6 is the normalized sequence.

addCThe tree rooted at the document node that is created by the final step of this sequence normalization process is the instance of the data model to which the rules of the appropriate output method are applied. If the sequence normalization process results in a serialization error, the processor serializer MUST signal the error.

NOTE: 

addCThe sequence normalization process for a sequence $seq is equivalent to constructing a document node using the XSLT instruction:

<xsl:result-document>
  <xsl:copy-of select="$seq"/>
</xsl:result-document>

addCor the XQuery expression:

document {
  for $s in $seq return
    if ($s instance of document-node())
    then $s/child::node()
    else $s
}

delDand then serializing the document node as described in [xml-output], [xhtml-output], [html-output], [text-output], or in an implementation-defined manner.

addCThis process will fail results in a serialization error with certain sequences, for example sequences containing parentless attribute and namespace nodes, or atomic values of types that cannot be cast to a string, such as xs:QName. and xs:NOTATION Such a failure results in a serialization error; the processor serializer MUST signal the error.