[Home] [By Thread] [By Date] [Recent Entries]
Hi Folks,
You are, of course, familiar with the ASCII character encoding scheme and with the UTF-8 character encoding scheme.
Perhaps you are less familiar with the encoding scheme called windows-1252.
You can create XML documents that uses windows-1252:
<?xml version="1.0" encoding="windows-1252"?>
In the windows-1252 encoding scheme the Euro sign (â¬) is hex 80.
Suppose you want to have this data in your XML document:
â¬43.00
Instead of using the actual Euro character, you choose to use a numeric character reference, like so:
€43.00
Here's your XML document:
<?xml version="1.0" encoding="windows-1252"?>
<Transaction>
<Amount>€43.00</Amount>
</Transaction>
Next, you save the XML document to your hard-drive, open a browser, and drag/drop the XML document into the browser. What will the browser display? Will it display this:
â¬43.00
Scroll down for the answer ....
Answer: The browser will display this:
43.00
You will not see the Euro sign.
Why?
This is very important:
Numeric character references (such as €)
are interpreted as Unicode characters â no matter
what encoding you use for your document.
So € is not referencing a windows-1252 character; rather it is referencing a Unicode character. And in Unicode hex 80 corresponds to a control character.
Yikes!
If you want the Euro sign in that windows-1252 encoded XML document, then you must use the Unicode numeric character code for the Euro sign (in Unicode the Euro sign is hex 20AC):
<?xml version="1.0" encoding="windows-1252"?>
<Transaction>
<Amount>€43.00</Amount>
</Transaction>
If you drag and drop that into a browser you will see the desired result:
â¬43.00
I learned the above from reading Richard Ishida's outstanding paper:
Using character escapes in markup and CSS
http://www.w3.org/International/questions/qa-escapes
/Roger
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



