Appendices

5 XML Output Method

XML Output Method

The xml output method outputs the instance of the data model normalized sequence as an XML entity that MUST satisfy the rules for either a well-formed XML document entity or a well-formed XML external general parsed entity, or both. A serialization error results if the processor serializer unless the processor serializer is unable to satisfy those rules, except for contents modified by due to either serialization errors or the requirements of the character expansion phase of serialization, as described in [serphases], which could result in the serialized output being not well-formed but will not result in a serialization error. If a serialization error results, the processor serializer MUST signal the error. If the processor serializer is unable to satisfy those requirements for any other reason, a serialization error results. The processor serializer MUST signal the error. Many of the requirements for the serialized form of the instance of the data model with the xml output method are described using the verb "should"; the processor serializer might not be able to meet the requirements of the xml output method due to:

serialization errors;
specification of character mapping, as determined by the use-character-maps parameter, whose expansion results in XML that is not well-formed; or
disabled output escaping, that results in XML that is not well-formed.

delEIn all other circumstances, the serialized form MUST comply with the requirements described for the xml output method.

If the document node of the instance of the data model normalized sequence has a single element node child and no text node children, and then the serialized output is a well-formed XML document entity, and the serialized output MUST conform to the appropriate version of the XML Namespaces Recommendation [XMLNAMES] or [XMLNAMES11]. If the instance of the data model normalized sequence does not take this form, and then the serialized output is a well-formed XML external general parsed entity, then the serialized output must be an entity which, when referenced within a trivial XML document wrapper like this:


<?xml version="version"?>
<!DOCTYPE doc [
<!ENTITY e SYSTEM "entity-URI">
]>
<doc>&e;</doc>

where entity-URI is a URI for the entity, and the value of the version pseudo-attribute is the value of the version parameter, produces a document which MUST itself be a well-formed XML document conforming to the corresponding version of the XML Namespaces Recommendation [XMLNAMES] or [XMLNAMES11].

In addition, the output MUST be such that if a new tree was constructed by parsing the XML document and converting it into an instance of the data model as specified in this document [DataModel], then the new instance of the data model sequence would be the same as the starting instance of the data model normalized sequence that resulted from the sequence normalization process described in [serdm], with the following possible exceptions:

If the document was produced by adding a document wrapper, as described above, then it will contain an extra doc element as the document element.
The order of attribute and namespace nodes in the two trees MAY be different.
The base URIs of nodes in the two trees MAY be different. The following properties of corresponding nodes in the two trees MAY be different:
- the base-uri property of document nodes and element nodes;
- the document-uri and unparsed-entities properties of document nodes;
- the type-name and typed-value properties of element and attribute nodes;
- the nilled property of element nodes;
- the content property of text nodes, due to the effect of the indent and use-character-maps parameters.
The new tree MAY contain additional attributes and text nodes resulting from the expansion of default and fixed values in its DTD or schema.
The type annotations of the nodes in the two trees MAY be different. Type annotations in a result tree are discarded when the tree is serialized. Any new type annotations obtained by parsing the document will be derived by processing depend on whether the serialized XML document is assessed against a schema, and this MAY result in type annotations that are either more or less precise than different from those in the original result tree.

NOTE:
addEIn order to influence the type annotations in the instance of the data model that would result from processing a serialized XML document, the author of the XSLT stylesheet, XQuery expression or other process might wish to create the instance of the data model that is input to the serialization process so that it makes use of mechanisms provided by [XMLSCHEMA], such as xsi:type and xsi:schemaLocation attributes. The serialization process will not automatically create such attributes in the serialized document if those attributes were not part of the result tree that is to be serialized.

addESimilarly, it is possible that an element node in the instance of the data model that is to be serialized has the nilled property with the value true, but no xsi:nil attribute. The serialization process will not create such an attribute in the serialized document simply to reflect the value of the property. The value of the nilled property has no direct effect on the serialized result.

delEIn order to permit such type annotations to be available in a data model that results from processing a serialized XML document, the process that creates the input instance of the data model could create it so that the serialized form uses mechanisms provided by [XMLSCHEMA], such as the xsi:type and xsi:schemaLocation attributes.
addCAdditional namespace nodes MAY be present in the new tree if the serialization process did not undeclare undeclared one or more namespaces, as described in [xml-undeclare-NS], and the starting instance of the data model contained an element node with a namespace node that declared some prefix, but a child element of that node did not have any namespace node that declared the same prefix.

addE Additional namespace nodes MAY also be present in the new tree if the serialization process had to add namespace declarations for attribute or element content of type xs:QName. The original tree MAY contain namespace nodes that are not present in the new tree, as the process of creating an instance of the data model MAY ignore namespace declarations in some circumstances. See [const-infoset-element] and [const-psvi-element] of [DataModel] for additional information.

Ed. Note: We've talked about this a couple of times now, but I'm still concerned about the preceding paragraph. This text was added in response to issue qt-Feb0059-01. Today, XSLT describes a namespace fixup procedure that ensures namespaces exist for values of type xs:QName. Is the resolution for this issue requiring the same thing for serialization? Or something else?
addEIf the indent parameter has the value yes,
- additional text nodes consisting of whitespace characters MAY be present in the new tree; and
- text nodes in the original tree that contained only whitespace characters MAY correspond to text nodes in the new tree that contain additional whitespace characters that were not present in the original tree
See [xml-indent] for more information on the indent parameter.
addCAdditional nodes MAY be present in the new tree due to the effect of character mapping in the character expansion phase, and the values of attribute nodes and text nodes in the new tree MAY be different from those in the original tree, due to the effects of URI expansion, character mapping and Unicode normalization in the character expansion phase of serialization.

NOTE:
addEThe use-character-maps parameter can cause arbitrary characters to be inserted into the serialized XML document in an unescaped form, including characters that would be considered to be part of XML markup. Such characters could result in arbitrary new element nodes, attribute nodes, and so on, in the new tree that results from processing the serialized XML document.

A consequence of this rule is that certain whitespace characters SHOULD MUST be output as character references, to ensure that they survive the round trip through serialization and parsing. Specifically, CR, NEL and LINE SEPARATOR characters in text nodes MUST be output respectively as "", "", and " ", or their equivalents; while CR, NL, TAB, NEL and LINE SEPARATOR characters in attribute nodes MUST be output respectively as "", "
", "	", "", and " ", or their equivalents. In addition, the non-whitespace control characters #x1 through #x1F and #x7F through #x9F in text nodes and attribute nodes MUST be output as character references. Specifically, CR characters in text nodes SHOULD MUST be written as  or an equivalent; while CR, NL, and TAB characters in attribute nodes SHOULD MUST be output respectively as , 
, and 	, or their equivalents.

For example, an attribute with the value "x" followed by "y" separated by a newline will result in the output "x
y" (or with any equivalent character reference). The XML output cannot be "x" followed by a literal newline followed by a "y" because after parsing, the attribute value would be "x y" as a consequence of the XML attribute normalization rules.

NOTE:
addEXML 1.0 did not permit an XML processor to normalize NEL or LINE SEPARATOR characters to a LINE FEED character. However, if a document entity that specifies version 1.1 invokes an external general parsed entity with no text declaration or a text declaration that specifies version 1.0, the external parsed entity is processed according to the rules of XML 1.1. For this reason, NEL and LINE SEPARATOR characters in text and attribute nodes MUST always be escaped using character references, or CDATA sections regardless of the value of the version parameter.

addG XML 1.0 permitted control characters in the range #x7F through #x9F to appear as literal characters in an XML document, but XML 1.1 requires such characters, other than NEL, to be escaped as character references. An external general parsed entity with no text declaration or a text declaration that specifies a version pseudo-attribute with value 1.0 that is invoked by an XML 1.1 document entity MUST follow the rules of XML 1.1. Therefore, the non-whitespace control characters in the ranges #x1 through #x1F and #x7F through #x9F, other than NEL, MUST always be escaped, regardless of the value of the version parameter.

delETo anticipate the proposed changes to end-of-line handling in XML 1.1, a serializer MAY also output the characters x85 and x2028 as character references. This will not affect the way they are interpreted by an XML 1.0 parser.

It is a serialization error to request the output of a document type declaration, or of a standalone parameter, if the instance of the data model contains text nodes or multiple element nodes as children of the root node. The processor serializer MAY signal the error, or MAY recover MUST either signal the error, or recover by ignoring the request to output a document type declaration or standalone parameter.

The result of serialization using the XML output method is not guaranteed to be well-formed XML if character maps have been specified (see [character-maps]). or if nodes in the instance of the data model contain characters that are invalid in XML (introduced, perhaps, by calling a user-written extension function: this is an error, but the processor serializer is not REQUIRED to signal it).

[Next Chapter] [Home]

Table of contents

Appendices

5 XML Output Method