[Home] [By Thread] [By Date] [Recent Entries]

  • From: Jesús Quiroga <jquiroga@p...>
  • To: Tim Bray <tbray@t...>
  • Date: Fri, 16 Mar 2001 23:13:38 +0100

At 14:10 16/03/01, you wrote:

>Hmmm, I wonder if current perl includes U+0085 in what
>matches \s?  Etc.....


All Unicode separator characters are expected to be matched by \s eventually,
while searching in UTF-8 strings (Camel III, p. 168).

Perl 5.6.0 doesn't include U+0085 in \s yet.


>Also, unlike (almost?) all the other XML errata, changing this
>would actively break pretty well every deployed piece of XML
>software in the world.  -Tim


This is not an error in the XML 1.0 spec, IMHO. Apparently, U+0085 was
assigned in Unicode 3.0, and XML 1.0 is based on Unicode 2.0.

XML 1.0 could not possibly comply in 1998 with a standard published in 2000.

The difficult question is if any change in Unicode should trigger an
instantaneous XML revision, or not. IBM thinks it should.

Unfortunately, if U+0085 is included as whitespace in the XML spec, it won't be
XML 1.0 anymore.




Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member