[Home] [By Thread] [By Date] [Recent Entries]
Hi,
I've get some interesting problems with JDK's (1.4 and 1.5)
TransformerHandler and surrogate pairs...:
Consider:
public void testOut() throws Exception {
ByteArrayOutputStream out = new ByteArrayOutputStream();
SAXTransformerFactory stf = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler th = stf.newTransformerHandler();
th.getTransformer().setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,
"yes");
th.setResult(new StreamResult(out));
th.startDocument();
th.startElement("", "foo", "foo", new AttributesImpl());
char c[] = "\udc00\ud800".toCharArray();
th.characters(c, 0, c.length);
th.endElement("", "foo", "foo");
th.endDocument();
byte bytes[] = out.toByteArray();
for (int i = 0; i < bytes.length; i++) {
System.out.println(i + ": " + bytes[i] + " " + ((char)bytes[i]));
}
}
This yields:
0: 60 <
1: 102 f
2: 111 o
3: 111 o
4: 62 >
5: -19 ?
6: -80 ?
7: -128 ?
8: -19 ?
9: -96 ?
10: -128 ?
11: 60 <
12: 47 /
13: 102 f
14: 111 o
15: 111 o
16: 62 >
That is, the surrogate pair has been serialized as two separate unicode
characters. It seems that this problem is old (see
<http://issues.apache.org/jira/browse/XALANJ-2132>), so why does it
still occur in recent JDKs?
Best regards, Julian
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



