Re: Gag me with a blunt 

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

From: Tim Bray <tbray@t...>
To: xml-dev@l...
Date: Sun, 18 Mar 2001 17:49:37 -0800

At 06:00 PM 16/03/01 -0500, John Cowan wrote:
>XML 1.0 chose not to implement non-ASCII whitespace characters from Unicode 2.0.

To be honest, I don't think we ever articulated the principle,
or proceeded from it.

However, the discussion over Ideographic Space, U+3000, was long 
and agonized, thus the decision to omit it from the production for 
"S" was not lightly taken at all.  It's hard to see how you could 
let in U+0085 without letting in ideospace and a bunch more 
characters that have in some respect the characteristics of 
white-space-ness.  Someone sent me a note offline giving a long 
list of such items, and it's pretty clear that letting in U+0085 
could start us down a slippery slope.  

Note (although no processor other than Lark ever did this as
far as I know) that if you want to build a DFA-based XML
processor, you can use the trick of recognizing all the syntax
characters with a 7-bit state table and a remarkably small 
amount of clever sidestepping is required to deal with all
the non-ASCII characters.  -Tim

Follow-Ups:
- Re: Gag me with a blunt 
  - From: John Cowan <jcowan@r...>

References:
- Re: Gag me with a blunt 
  - From: Jesús Quiroga <jquiroga@p...>
- Re: Gag me with a blunt 
  - From: John Cowan <cowan@m...>

Prev by Date: RE: SAX2 parser abort
Next by Date: Re: Gag me with a blunt 
Previous by thread: Re: Gag me with a blunt 
Next by thread: Re: Gag me with a blunt 
Index(es):
- Date
- Thread

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >