7.1 The Influence of Serialization Parameters upon the HTML Output Method
The Influence of Serialization Parameters upon the HTML Output Method
HTML Output Method: Markup for Elements[top]
HTML Output Method: Markup for Elements
The html output method
SHOULD
MUST NOT output an element
differently from the xml output method unless the
expanded-QName of the element has a null namespace URI; an element
whose expanded-QName has a non-null namespace URI
SHOULD
MUST be output as
XML. If the expanded-QName of the element has a null namespace URI,
but the local part of the expanded-QName is not recognized as the name
of an HTML element, the element SHOULD
MUST be output in the same way as a
non-empty, inline element such as span. In particular:
-
If the result tree contains namespace nodes for namespaces other than the
XML namespace, the HTML output method will
MUST represent these namespaces using
attributes named xmlns or xmlns:prefix
in the same way as the XML output method would represent them when the
version parameter is set to 1.0.
-
If the result tree contains elements or attributes whose names have a
non-null namespace URI, the HTML output method
will
MUST generate
namespace-prefixed QNames for these nodes in the same way as the XML output
method would do when the version parameter is set to 1.0.
-
Where special rules are defined later in this section for
serializing specific HTML elements and attributes, these rules
are never
MUST NOT be
applied to an element or attribute whose name has a non-null
namespace URI. However, the generic rules for the HTML output method
that apply to all elements and attributes, for example the rules for
escaping special characters in the text and the rules for indentation,
MUST be used also for namespaced elements and attributes.
-
When serializing an element whose name is not defined in the
HTML specification, but that is in the null namespace, the HTML output
method
SHOULD
MUST
apply the same rules (for example, indentation rules) as
when serializing a span element. The descendants of such
an element SHOULD
MUST be serialized as if they were descendants of a
span element.
-
When serializing an element whose name is in a non-null
namespace, the HTML output method SHOULD
MUST apply the same rules (for
example, indentation rules) as when serializing a div
element. The descendants of such an element
SHOULD
MUST be serialized as if
they were descendants of a div element.
The html output method SHOULD
MUST NOT output an end-tag
for empty elements. For HTML 4.0, the empty elements are
area, base, basefont,
br, col, frame,
hr, img, input,
isindex, link, meta and
param. For example, an element written as
<br/> or <br></br> in an
XSLT stylesheet SHOULD
MUST be output as <br>.
The html output method SHOULD
MUST recognize the names of
HTML elements regardless of case. For example, elements named
br, BR or Br SHOULD
MUST all be
recognized as the HTML br element and output without an
end-tag.
The html output method SHOULD
MUST NOT perform escaping for
the content of the script and style
elements.
For example, a script element
created by an XQuery direct element constructor or an XSLT
literal result element, such as:
<script>if (a < b) foo()</script>
or
<script><![CDATA[if (a < b) foo()]]></script>
SHOULD
MUST be output as
<script>if (a < b) foo()</script>
A common requirement is to output a script element
as shown in the example below:
<script type="text/javascript">
document.write ("<em>This won't work</em>")
</script>
This is illegal HTML, for the reasons explained in section B.3.2 of
the HTML 4.01 specification. Nevertheless, it is possible to output
this fragment, using either of the following constructs:
Firstly, by use of a script element
created by an XQuery direct element constructor or an
XSLT literal result element:
<script type="text/javascript">
document.write ("<em>This won't work</em>")
</script>
Secondly, by constructing the markup from ordinary text characters:
<script type="text/javascript">
document.write ("<em>This won't work</em>")
</script>
As the HTML specification points out, the correct way to write this
is to use the escape conventions for the specific scripting language.
For JavaScript, it can be written as:
<script type="text/javascript">
document.write ("<em>This will work<\/em>")
</script>
The HTML 4.01 specification also shows examples of how to write
this in various other scripting languages. The escaping MUST be done
manually, it will not be done by the serializer.
HTML Output Method: Writing Attributes[top]
HTML Output Method: Writing Attributes
The html output method
MUST NOT escape
"<" characters occurring in attribute values.
If the indent parameter has the value
yes, then the html output method MAY add or
remove whitespace as it outputs the instance of the data model, so long as it does
not change how an HTML user agent would render the output.
Unless
If
the escape-uri-attributes parameter
is specified and
has the value
no
yes,
the html output method
SHOULD
MUST
escape non-ASCII characters in URI attribute values using the
method
defined by Section 5.4 of
[XLINK], except that relative URIs MUST NOT be absolutized.
RECOMMENDED in [RFC2396] (section 2.4.1).
NOTE:
This escaping is deliberately confined to non-ASCII characters,
because escaping of ASCII characters is not always appropriate, for
example when URIs or URI fragments are interpreted locally by the HTML
user agent. Even in the case of non-ASCII characters, escaping can
sometimes cause problems. More precise control of URI escaping is
therefore available by setting escape-uri-attributes to
no, and controlling the escaping of URIs by means of the
fn:escape-uri function defined in [FANDO].
The html output method
MUST output boolean
attributes (that is attributes with only a single allowed value that
is equal to the name of the attribute) in minimized form.
For example, a start-tag created
using the following XQuery direct element constructor or XSLT
literal result element
<OPTION selected="selected">
MUST be output as
<OPTION selected>
The html output method SHOULD
MUST NOT escape a
& character occurring in an attribute value
immediately followed by a { character (see Section
B.7.1 of the HTML 4.0 Recommendation).
For example, a start-tag created
using the following XQuery direct element constructor or XSLT
literal result element
<BODY bgcolor='&{{randomrbg}};'>
SHOULD
MUST be output as
<BODY bgcolor='&{randomrbg};'>
HTML Output Method: Indentation[top]
HTML Output Method: Indentation
If the indent attribute has the value
yes, then the html output method MAY add or
remove whitespace as it outputs the result tree, so long as it does
not change the way that a conforming HTML user agent would render the output.
The
default value is yes.
NOTE:
This rule can be satisfied by observing the
following constraints:
Whitespace MUST only be added before or after an element,
or adjacent to an existing whitespace character.
Whitespace MUST NOT be added or removed adjacent to an inline element.
The inline elements are those included in the %inline
category of any of the HTML 4.01
DTD's,
as well as the INS and
DEL elements if they are used as inline elements
(i.e., if they do not contain element children).
Whitespace MUST NOT be added or removed inside a formatted element,
the formatted elements being pre, script,
style, and textarea.
Note that the HTML definition of whitespace is different from the XML definition:
see section 9.1 of the HTML 4.01 specification.
HTML Output Method: Writing Character Data[top]
HTML Output Method: Writing Character Data
The html output method MAY output a character using a
character entity reference in preference to using a numeric character
reference, if an entity is defined for the character in the version of
HTML that the output method is using. Entity references and character
references SHOULD be used only where the character is not present in
the selected encoding, or where the visual representation of the
character is unclear (as with , for
example).
When outputting a sequence of whitespace characters in the
instance of the data model, within an element where whitespace is treated normally
(but not in elements such as pre and
textarea), the html output method
is free to
MAY
represent it using any sequence of whitespace that will be treated
as whitespace
in the same way
by an HTML user agent.
See section 3.5 of [xhtml-mod] for some additional information
on handling of whitespace by an HTML user agent.
Certain characters, specifically the control characters #x7F-#x9F,
are legal in XML but not in HTML. It is a
serialization error to use the HTML
output method when such characters appear in the instance of the data model. The
processor
serializer
MAY signal the error, but is not REQUIRED to do so. If it
does not signal the error, it MAY copy the offending characters into
the serialized output, creating invalid HTML.
The html output method SHOULD
MUST terminate processing
instructions with > rather than
?>.
HTML Output Method: Encoding[top]
HTML Output Method: Encoding
The encoding parameter specifies the
preferred
encoding to be used.
Processors
Serializers
are
REQUIRED to support values of UTF-8 and
UTF-16. A serialization error occurs if an output
encoding other than UTF-8 or UTF-16 is
requested and the serializer
does not support that encoding. The
processor
serializer
MUST signal the error.
If there is a HEAD element,
then unless
and
the include-content-type parameter
is specified and
has the value
no
yes,
the html output method
MUST add a META element
immediately after the start-tag
of the HEAD element specifying the character encoding
actually used.
For example,
<HEAD>
<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
...
The content type MUST
be set to the value given for the
media-type parameter; the default value is
text/html.
addCIf the instance of the data model includes a head
element that has a meta element child, the
processor
serializer
SHOULD replace any content attribute of the meta
element, or add such an attribute, with the value as described above,
rather than output a new meta element.
It is possible that the instance of the data model will contain a character that
cannot be represented in the encoding that the
processor
serializer
is using for
output. In this case, if the character occurs in a context where HTML
recognizes character references, then the character SHOULD
MUST be output
as a character entity reference or decimal numeric character
reference; otherwise (for example, in a script or
style element or in a comment), the
processor
serializer
SHOULD
MUST
signal a serialization error.
HTML Output Method: Document Type Declaration[top]
HTML Output Method: Document Type Declaration
If the doctype-public or doctype-system
parameters are specified, then the html output method
SHOULD
MUST
output a document type declaration immediately before the first
element. The name following <!DOCTYPE
SHOULD
MUST be
HTML or html. If the
doctype-public parameter is specified, then the output
method SHOULD
MUST output PUBLIC
followed by the specified
public identifier; if the doctype-system parameter is
also specified, it SHOULD
MUST also output the specified
system identifier
following the public identifier. If the doctype-system
parameter is specified but the doctype-public parameter
is not specified, then the output method
SHOULD
MUST output
SYSTEM followed by the specified system identifier.
HTML Output Method: Unicode Normalization[top]
HTML Output Method: Unicode Normalization
The
delEnormalize-unicode
addEnormalization-form
parameter is applicable for the
html output method.
The values NFC and
none MUST be supported by the
processor
serializer.
A serialization error results if the value of the normalization-form
parameter specifies a normalization form that is not supported by the
processor
serializer;
the
processor
serializer
MUST signal the error.
HTML Output Method: Other Parameters[top]
HTML Output Method: Other Parameters
The media-type parameter is applicable for the
html output method.
See [serparam] for more
information.
delGThe use-character-maps parameter is applicable for the
xml output method.
The use-character-maps parameter is applicable for the
html output method.
See [character-maps] for more
information.
addGThe byte-order-mark parameter is
applicable for the html output method. See
[serparam] for more information.
|