Table of contents
Appendices
|
4 Phases of Serialization
Phases of Serialization
Following the
sequence
normalization process described in
[serdm],
serialization can be regarded as involving
four
three
phases of processing.
For an
implementation-defined output method,
any of these phases MAY be skipped or MAY be performed in a different
order than is specified here.
For the output methods defined in this
specification,
these phases are carried out sequentially as follows:
-
Markup generation produces the
character representation of
start and end tags for elements, and other constructs such as XML
declarations, processing instructions, and so on. This is influenced
by the parameters method, doctype-system,
doctype-public, include-content-type,
indent, omit-xml-declaration,
standalone,
undeclare-namespaces
and version.
those parts
of the serialized result that describe the structure of the normalized
sequence. In the cases of the XML, HTML and XHTML
output methods, this phase produces the character representations of the
following:
-
addGthe document type declaration;
-
addGstart tags and end tags (except for
attribute values, whose representation is produced by the character
expansion phase);
-
addGprocessing instructions; and
-
addGcomments.
addGIn the cases of the XML and XHTML output methods,
this phase also produces the following:
In the case of the text output method, this phase has no effect.
-
Character expansion is concerned with the
representation of characters appearing in text and attribute nodes in
the normalized sequence. The
substitution processes that apply are listed below, in priority
order: a character that is handled by one process in this list will
be unaffected by processes appearing later in the list,
except that a character affected by Unicode
normalization MAY be affected by creation of CDATA sections and by
character escaping:
-
URI escaping (in the case of URI-valued attributes in the
HTML and XHTML output methods), as determined by the
escape-uri-attributes parameter
-
Character mapping, as determined by the
use-character-maps parameter.
Text nodes that are children of elements
specified by the cdata-section-elements parameter are not
affected by this step.
-
Unicode normalization, if requested by the
delEnormalize-unicode
addEnormalization-form
parameter.
Unicode normalization is
applied to the character stream that results after all markup
generation and character expansion has taken place.
addEFor the definitions of the various normalization
forms, see [CHARMOD]
addEThe meanings associated with the possible values of
the normalization-form parameter are as follows:
-
NFC specifies the serialized result will be
in Unicode Normalization Form C.
-
NFD specifies the serialized eenult will be
in Unicode Normalization Form D.
-
NFKC specifies the serialized result will be
in Unicode Normalization Form KC.
-
NFKD specifies the serialized result will be
in Unicode Normalization Form KD.
-
fully-normalized specifies the serialized result
will be in fully normalized form.
-
none specifies that no Unicode normalization will
be applied.
-
An implementation-defined value
has an implementation-defined
effect.
delG
NOTE:
addFAny characters produced under the effect
of the use-character-maps parameter are not subject to
Unicode normalization. If the normalization-form
parameter has a value other than none and the
use-character-maps parameter is not empty, the whole
of the serialized document MAY NOT be in the normalization form
specified by the normalization-form parameter.
-
Creation of CDATA sections, as determined by the
cdata-section-elements parameter. Note that this is also
affected by the encoding parameter, in that characters
not present in the selected encoding cannot be represented in a CDATA
section.
-
Escaping according to XML or HTML rules
of special characters
that cannot be represented in the
selected encoding.
For example replacing < with
<
-
Encoding, as controlled by the
encoding parameter,
This
converts the character stream
produced by the previous phases into a byte stream.
NOTE:
addESerialization is only defined in terms of
encoding the result as a stream of bytes. However, a
processor
serializer
MAY provide an option that allows the encoding phase to be skipped, so
that the result of serialization is a stream of Unicode characters.
The effect of any such option is
implementation-defined, and a
processor
serializer
is not REQUIRED to support such an option.
|