[Home] [By Thread] [By Date] [Recent Entries]

  • To: "'xml-dev'" <xml-dev@l...>
  • Subject: RE: Character Entities: An XML Core WG View
  • From: "Jelks Cabaniss" <jelks@j...>
  • Date: Fri, 1 Nov 2002 01:31:50 -0500
  • Importance: Normal
  • In-reply-to: <3DC206EA.9050408@t...>

Tim Bray wrote:

> I see your point, but there are all these people out there
> who keep saying they want a way to give funny characters
> human-readable names and don't want to use elements because
> they think structure and content are different.  No matter
> how many times they are told that they shouldn't really need
> the names and that if they did they should use elements,
> they keep refusing to take our word for this, so we're gonna
> have to do something.  Sigh.

> The WG's approach does at least have the virtue that it works with 
> existing software.  

Indeed.

> I despise entities in general more and more with each passing year, 
> but it's pretty clearly character entities that are the bit that 
> just won't go away; I seem to recall weeping with James Clark over 
> this into our 18th or 19th glasses of red wine at the last XML 
> conference.

Because they don't round trip after parsing?  Or because of having to
expand the entities before you can use them?

> I know I don't when I'm in rdhead or oweenie mode - &#xbabe; does the 
> job fine - 

It does, but &#xnnn;'s scattered throughout a document are hard to
proof.  That's the only reason people want names (and not as
elements!:).
 
> but people who want to edit XML by hand really want to be able to use 
> &euro; and the like.

Yes.  In fifteen or so years, when purely ASCII/ANSI/ISO-* editors are
history, I doubt if anyone will care, but I don't see the point in axing
the internal subset at this point in time.  I'm not sure I see the point
of axing it in the future either.

> Once again, sigh.  I haven't seen a better idea, but one would be 
> welcome.  Hmm, has anyone suggested
> 
> &#uCYRILLIC-CAPITAL-LETTER-TSE; (aka &#x426;) or
> &#uPARTIAL-DIFFERENTIAL; (aka&#x2202;)

Again, why exactly -- except for "round-tripping" -- is a huge built-in
Unicode character reference database (that changes with every rev of
Unicode) better than having the convenience of being able to declare
&Tse; and the few others you might want in the internal subset?  


/Jelks


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member