FW: UTF-8+names

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

To: "[Public XML-DEV]" <xml-dev@l...>
Subject: FW: UTF-8+names
From: "Alessandro Triglia" <sandro@m...>
Date: Mon, 20 Oct 2003 15:56:40 -0400
Importance: Normal

I wrote:
> 
> Another fact that I think has been overlooked is the following.
> 
> The following fragment of XML (encoded in UTF-8+names but 
> displayed as if it were encoded in UTF-8) contains exactly 18 
> Unicode characters:
> 
> 	<a>one&nbsp;two&lt;</a>
> 
> because   &nbsp;   counts as one character and   &lt;   
> counts as 4 characters.
> 
> The UTF-8+names encoding of this fragment of XML occupies 23 
> bytes.  The UTF-8 encoding occupies 19 bytes.

... and, by the way, the following fragment of XML is different from the one above (although it *looks* the same in this email) and contains 23 Unicode characters instead of 18:

	<a>one&nbsp;two&lt;</a>

The UTF-8 encoding of this fragment of XML occupies 23 bytes.  The UTF-8+names encoding is longer than that because the first ampersand must be encoded as the three ASCII bytes    & & ;   so that the XML entity reference  &nbsp;  is not mistaken for the pseudo-entity  &nbsp;

Alessandro

Prev by Date: RE: UTF-8+names
Next by Date: Re: Game of Life: an XSLT implementation
Previous by thread: RE: UTF-8+names
Next by thread: inconsistent naming of styles in OpenOffice.org
Index(es):
- Date
- Thread

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >