[Home] [By Thread] [By Date] [Recent Entries]

  • To: xml-dev@l...
  • Subject: Re: BOM requirement in UTF-16
  • From: Richard Tobin <richard@c...>
  • Date: Sun, 16 Mar 2003 18:31:41 GMT
  • Cc:
  • In-reply-to: <UXVca.124745$qi4.62176@rwcrnsc54>
  • Organization: HCRC, University of Edinburgh
  • References: <Xns933E11D6E632Bgustafl@127.0.0.1> <qKzca.89518$3D1.3540@sccrnsc01> <b50b5q$1eku$1@p...>

>If a BOM appears, it determines the encoding.

According to which standard?  Unicode says (section 13.6):

  Where the character set information is explicitly marked, such as in
  UTF-16BE or UTF-16LE, then all U+FEFF characters, even at the very
  beginning of text, are to be interpreted as zero width no-break
  spaces.

>XML's whitespace vocabulary is very limited.  Such a character is not
>allowed in an XML document, so the document would not be well-formed.

You're right, it would not be allowed at the start of a document
because it is not an XML whitespace character.  (It is allowed in text
content however.)

-- Richard

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member