Stylus Studio XML Editor

Table of contents

Appendices

3.1 URI Reference Encoding and Escaping

URI Reference Encoding and Escaping

The set of characters allowed in xml:base attributes is the same as for XML, namely [Unicode]. However, some Unicode characters are disallowed from URI references, and thus processors Must, May, etc. encode and escape these characters to obtain a valid URI reference from the attribute value.

The disallowed characters include all non-ASCII characters, plus the excluded characters listed in Section 2.4 of [RFC2396], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [RFC2732]. Disallowed characters Must, May, etc. be escaped as follows:

  1. Each disallowed character is converted to UTF-8 [RFC2279] as one or more bytes.

  2. Any bytes corresponding to a disallowed character are escaped with the URI escaping mechanism (that is, converted to %HH, where HH is the hexadecimal notation of the byte value).

  3. The original character is replaced by the resulting character sequence.