Table of contents
Appendices
|
2 Sequence Normalization
Sequence Normalization
delCThe XQuery 1.0 and XPath 2.0 Data Model is richer and less
constrained than XML. There are valid instances of the data model
that have no direct analog in XML. In particular, instances of the data model
can contain typed values, sequences, and sequences of typed
values. And whereas XML deals only with "documents", instances
of the data model can have as their root any node type, simple value, or
sequence and may even be empty.
delCThis section describes how to convert an arbitrary
instance of the data model
into one of several simplified forms. We then describe how
these forms are serialized. This greatly simplifies the sections
which follow. A serializer is not
REQUIRED to implement
serialization of arbitrary instances of the data model in this way, provided
it produces the same results as this conceptual model.
-
If the instance of the data model contains any
typed or untyped
atomic
values,
or sequences that contain
typed or untyped
atomic
values,
convert
them to strings: obtain the lexical representation of each value by
casting it to an xs:string and replace the value with its
string representation. If the value cannot
be cast to xs:string, serialization of the instance of
the data model is undefined.
-
If adjacent strings occur in a sequence, replace both values
with their concatenation separated by a single space.
-
If empty sequences occur, replace them with the empty string.
-
To complete the simplification, perform the following steps
interactively
iteratively
until a simplest form is reached:
-
If the instance of the data model has as its root an attribute or
namespace node, or a QName value,
or if it has as its root a sequence
which contains one of these items, serialization is undefined.
-
If the instance of the data model has as its root
a single document
node, or an element, processing instruction, comment, or text node, or
a sequence of only element, processing instruction, comment, and text nodes,
it is already in its simplest form.
-
If the instance of the data model has as its root a sequence of
document nodes, or a sequence which contains document nodes, replace
each document node with its children in document order.
-
If the instance of the data model has as its root a string value, or
a sequence which contains one or more string values, replace each
string value with a text node that contains the same string.
delCIf there are any remaining string values among the children of elements in
the instance of the data model, replace them with text nodes that contain the same
string values and merge adjacent text nodes.
addD
An instance of the data model that is input to the serialization
process is a sequence.
Prior to serializing a sequence using any of
the output methods whose behavior is specified by this document
([serparam])
the
serializer MUST first
place that input sequence into a normalized
form
compute a normalized sequence
for serialization; it
is the normalized sequence that is actually serialized.
The purpose of this sequence normalization step is
to create a sequence that can be serialized as a
well-formed XML document or external general parsed entity, that
also reflects the content of the input sequence to the extent
possible.
The normalized
form
sequence
for serialization is constructed by applying all
of the following rules in order, with the initial sequence being
input to the first step, and the sequence that results from any
step being used as input to the subsequent step.
For any
implementation-defined
output method,
it is
implementation-defined
whether this sequence normalization
process takes place.
addEWhere the process of converting the input sequence
to a normalized
form
sequence
indicates that a value MUST be cast to
xs:string, that operation is as
defined in [casting-to-string] of
[FANDO].
The steps in computing the normalized sequence
are:
-
addG
If the sequence that is input to serialization is
empty, create a sequence S1 that consists of a
zero-length string. Otherwise, copy each item in the sequence that is
input to serialization to create the new sequence S1.
delGReplace an empty sequence with a zero-length
string.
-
addG
For each item in S1, if the item is atomic, obtain the
lexical representation of the item by casting it to an xs:string
and copy the string representation to the new sequence; otherwise, copy the
item, which will be a node, to the new sequence.
The new sequence is S2.
delGIf the instance of the data model contains any atomic values,
or
sequences that contain atomic values,
convert the atomic values
to strings: obtain the lexical representation of each value by
casting it to an xs:string and replace the value
with its string
representation. It is a serialization error if the value
cannot be cast to xs:string.
-
addG
For each subsequence of adjacent strings in S2,
copy a single string to the new sequence equal to the values of the
strings in the subsequence concatenated in order, each separated by a
single space. Copy all other items to the new sequence. The new
sequence is S3.
delGReplace all adjacent strings in the sequence with a single
string equal to the values of the strings concatenated, each
separated by a single space.
-
addG
For each item in S3, if the item is a string,
create a text node in the new sequence whose string value is equal to
the string; otherwise, copy the item to the new sequence. The new
sequence is S4.
delGReplace any string in the sequence with a text node whose
string value is equal to the string.
-
addG
For each item in S4, if the item is a document node,
copy its children to the new sequence; otherwise, copy the item to the new
sequence. The new sequence is S5.
delGReplace any document node in the sequence with its
children.
-
addG
It is a serialization error if an item in S5 is an
attribute node or a namespace node. Otherwise, construct a new sequence,
S6, that consists of a single document node and
copy all the items in the sequence, which are all nodes, as children of
that document node.
delGIt is a serialization error if an item in the sequence
is an attribute node or a namespace node. Otherwise, create a
new document node and make all the items in the sequence, which
are all nodes, children of that document node.
addGS6 is the normalized sequence.
addCThe tree rooted at the document node that is
created by the final step of this sequence
normalization process is the
instance of the data model to which the rules of the appropriate
output method are applied. If the sequence
normalization process results
in a serialization error, the
processor
serializer
MUST signal the error.
NOTE:
addCThe
sequence
normalization process for a sequence $seq is equivalent
to constructing a document node using the
XSLT instruction:
<xsl:result-document>
<xsl:copy-of select="$seq"/>
</xsl:result-document>
addCor the XQuery expression:
document {
for $s in $seq return
if ($s instance of document-node())
then $s/child::node()
else $s
}
delDand then serializing
the document node as described in [xml-output],
[xhtml-output], [html-output],
[text-output], or in an
implementation-defined
manner.
addCThis process
will fail
results in a serialization error
with certain sequences,
for example sequences containing parentless attribute and namespace
nodes, or atomic values of types that cannot
be cast to a string, such as xs:QName.
and xs:NOTATION
Such a failure results in a
serialization error; the
processor
serializer
MUST signal the
error.
|