Table of contents
Appendices
|
5 XML Output Method
XML Output Method
The xml output method outputs the
instance of the data model
normalized sequence
as an XML entity that
MUST satisfy the rules for
either a well-formed XML document entity or a well-formed XML
external general parsed entity, or both.
A serialization error results if the
processor
serializer
unless the
processor
serializer
is unable to
satisfy those rules,
except for contents modified by
due to either serialization errors or the
requirements of
the character expansion phase of serialization,
as described in [serphases],
which could result in the serialized output
being not well-formed but will not result in a serialization error. If a
serialization error results, the
processor
serializer
MUST signal the error.
If the
processor
serializer
is unable to satisfy those
requirements for any other reason, a serialization error results. The
processor
serializer
MUST signal the error.
Many of the requirements for the
serialized form of the instance of the data model with the xml output
method are described using the verb "should"; the
processor
serializer
might not be able to meet the requirements of the xml
output method due to:
-
serialization errors;
-
specification of character mapping, as determined by the
use-character-maps parameter, whose expansion results
in XML that is not well-formed; or
-
disabled output escaping, that results in XML that is not
well-formed.
delEIn all other circumstances,
the serialized form
MUST comply with the requirements described for the xml
output method.
If the document node of the
instance of the data model
normalized sequence
has a single element
node child and no text node children,
and
then
the serialized output
is a well-formed XML document entity, and the serialized output
MUST conform to the
appropriate version of the
XML Namespaces Recommendation [XMLNAMES]
or [XMLNAMES11].
If the
instance of the data model
normalized sequence
does not take this form,
and
then
the serialized
output is a well-formed XML external general parsed entity,
then the serialized output
must be an
entity
which, when referenced within a trivial XML document wrapper
like this:
<?xml version="version"?>
<!DOCTYPE doc [
<!ENTITY e SYSTEM "entity-URI">
]>
<doc>&e;</doc>
where entity-URI is a URI for the entity,
and the value of the version
pseudo-attribute is the value of the version
parameter, produces a
document which MUST itself be a
well-formed XML document conforming
to the
corresponding version of the
XML Namespaces Recommendation [XMLNAMES]
or [XMLNAMES11].
In addition, the output MUST
be such that if a new tree was
constructed by parsing the XML document and converting it into an
instance of the data model
as specified in this document
[DataModel], then the new
instance of the data model
sequence
would be the same as the
starting
instance of the data model
normalized sequence
that resulted from the
sequence
normalization process
described in [serdm], with the following possible
exceptions:
-
If the document was produced by adding a document wrapper, as
described above, then it will contain an extra doc
element as the document element.
-
The order of attribute and namespace nodes in the two trees MAY be
different.
-
The base URIs of nodes in the two trees MAY be
different.
The following properties of corresponding nodes
in the two trees MAY be different:
-
the base-uri property of document nodes and element nodes;
-
the document-uri and unparsed-entities properties of document
nodes;
-
the type-name and typed-value properties of element and attribute
nodes;
-
the nilled property of element nodes;
-
the content property of text nodes, due to the effect of the
indent and use-character-maps
parameters.
-
The new tree MAY contain additional attributes and text nodes resulting from the
expansion of default and fixed values in its DTD or schema.
-
The type annotations of the nodes in the two trees MAY be
different. Type annotations in a result tree are discarded when the
tree is serialized. Any new type annotations obtained by parsing the
document will
be derived by processing
depend on whether
the serialized XML document
is assessed
against a schema, and this MAY result in type annotations that are
either more or less precise than
different from
those in the original result tree.
NOTE:
addEIn order to influence the type annotations in the
instance of the data model that would result from processing a serialized XML document,
the author of the XSLT stylesheet, XQuery expression or other process
might wish to create the instance of the data model that is input to the
serialization process so that it makes use of mechanisms provided by
[XMLSCHEMA], such as xsi:type and
xsi:schemaLocation attributes. The serialization process
will not automatically create such attributes in the serialized
document if those attributes were not part of the result tree that is
to be serialized.
addESimilarly, it is possible that an element node in
the instance of the data model that is to be serialized has the nilled
property with the value true, but no xsi:nil
attribute. The serialization process will not create such an attribute
in the serialized document simply to reflect the value of the property.
The value of the nilled property has no direct effect on
the serialized result.
delEIn order to permit such type annotations
to be available in a data model that results from processing a
serialized XML document, the process that creates the input instance
of the data model could create it so that the serialized form
uses mechanisms provided by [XMLSCHEMA], such as the
xsi:type and xsi:schemaLocation
attributes.
-
addCAdditional namespace nodes MAY be present
in the new tree if the serialization process
did not undeclare
undeclared
one or more
namespaces,
as described in [xml-undeclare-NS],
and the starting instance of the data model contained an element node
with a namespace node that declared some prefix, but a child element
of that node did not have any namespace node that declared the same prefix.
addE
Additional namespace nodes MAY also be present
in the new tree if the serialization process had to add namespace
declarations for attribute or element content of type xs:QName.
The original tree MAY contain namespace nodes
that are not present in the new tree, as the process of creating an instance
of the data model MAY ignore namespace declarations in some circumstances.
See [const-infoset-element] and
[const-psvi-element] of [DataModel]
for additional information.
Ed. Note:
We've talked about this a couple of times now, but I'm still concerned
about the preceding paragraph. This text was added in response to issue
qt-Feb0059-01.
Today, XSLT describes a
namespace fixup procedure that ensures namespaces exist
for values of type xs:QName.
Is the resolution for this issue requiring
the same thing for serialization? Or something else?
-
addEIf the indent parameter has
the value yes,
-
additional text nodes consisting of
whitespace characters MAY be present in the new tree; and
-
text nodes in the original tree that contained only whitespace
characters MAY correspond to text nodes in the new tree that contain additional
whitespace characters that were not present in the original tree
See [xml-indent] for more information on the
indent parameter.
-
addCAdditional nodes MAY be present in the
new tree
due to the effect of character mapping in the
character expansion phase,
and the values of attribute nodes and text nodes in the
new tree MAY be different from those in the original tree, due to
the effects of URI expansion, character mapping
and Unicode normalization in
the character expansion phase of serialization.
NOTE:
addEThe use-character-maps parameter can
cause arbitrary characters to be inserted into the serialized XML document
in an unescaped form, including characters that would be considered to be
part of XML markup. Such characters could result in arbitrary new element
nodes, attribute nodes, and so on, in the new tree that results from
processing the serialized XML document.
A consequence of this rule is that certain
whitespace
characters
SHOULD
MUST be output as character
references, to ensure that they survive
the round trip through serialization and parsing.
Specifically, CR, NEL and LINE
SEPARATOR characters in text nodes MUST be output respectively as
"
", "…", and
"
", or their equivalents; while CR, NL, TAB, NEL and
LINE SEPARATOR characters in attribute nodes MUST be output respectively
as "
", "
", "	",
"…", and "
", or their
equivalents.
In addition, the non-whitespace control characters
#x1 through #x1F and #x7F through #x9F in text nodes and attribute nodes MUST be
output as character references.
Specifically, CR
characters in text nodes SHOULD
MUST be written as

 or an equivalent; while CR, NL, and TAB
characters in attribute nodes SHOULD
MUST be output respectively as

, 
, and
	, or their equivalents.
For example, an attribute with the value "x" followed by "y"
separated by a newline will result in the output
"x
y" (or with any equivalent character
reference). The XML output cannot be "x" followed by a literal newline
followed by a "y" because after parsing, the attribute value would be
"x y" as a consequence of the XML attribute normalization
rules.
NOTE:
addEXML 1.0 did not permit
an XML processor to normalize
NEL or LINE SEPARATOR characters to a LINE FEED character. However, if
a document entity that specifies version 1.1 invokes an external general
parsed entity with no text declaration or a text declaration that specifies
version 1.0, the external parsed entity is processed according to the rules
of XML 1.1. For this reason, NEL and LINE SEPARATOR characters in text and
attribute nodes MUST always be escaped using character references,
or CDATA sections
regardless of the value of the version
parameter.
addG
XML 1.0 permitted control characters in the range #x7F through #x9F
to appear as literal characters in an XML document, but XML 1.1
requires such characters, other than NEL,
to be escaped as character references. An
external general parsed entity with no text declaration or a text
declaration that specifies a version pseudo-attribute with value
1.0 that is invoked by an XML 1.1 document entity MUST
follow the rules of XML 1.1. Therefore, the non-whitespace control
characters in the ranges #x1 through #x1F and #x7F through #x9F,
other than NEL, MUST
always be escaped, regardless of the value of the version parameter.
delETo anticipate the proposed changes to
end-of-line handling in
XML 1.1, a serializer MAY also output the characters x85 and x2028
as character references. This will not affect the way they are
interpreted by an XML 1.0 parser.
It is a serialization error to request the output of a document
type declaration, or of a standalone parameter, if the
instance of the data model contains text nodes or multiple element nodes as children
of the root node. The
processor
serializer
MAY signal the error, or MAY recover
MUST either signal the error, or recover
by ignoring the request to output a document type declaration or
standalone parameter.
The result of serialization using the XML output method is not
guaranteed to be well-formed XML if character maps have been specified
(see [character-maps]).
or if nodes in the instance of the
data model contain characters that are invalid in XML (introduced, perhaps, by
calling a user-written extension function: this is an error, but the
processor
serializer
is not REQUIRED to signal it).
|