[Home] [By Thread] [By Date] [Recent Entries]
I wrote: > > Another fact that I think has been overlooked is the following. > > The following fragment of XML (encoded in UTF-8+names but > displayed as if it were encoded in UTF-8) contains exactly 18 > Unicode characters: > > <a>one two<</a> > > because counts as one character and < > counts as 4 characters. > > The UTF-8+names encoding of this fragment of XML occupies 23 > bytes. The UTF-8 encoding occupies 19 bytes. ... and, by the way, the following fragment of XML is different from the one above (although it *looks* the same in this email) and contains 23 Unicode characters instead of 18: <a>one two<</a> The UTF-8 encoding of this fragment of XML occupies 23 bytes. The UTF-8+names encoding is longer than that because the first ampersand must be encoded as the three ASCII bytes & & ; so that the XML entity reference is not mistaken for the pseudo-entity Alessandro
|

Cart



