[Home] [By Thread] [By Date] [Recent Entries]
At 01:46 PM 16/03/01 +0700, James Clark wrote: >> Has anyone seen this thing? >> http://www.w3.org/TR/newline >> I have a horrid suspicion that it's actually correct. > >I'm not convinced. The XML spec says that Unicode character #x85 is not >a whitespace characters. It appears from the Note that EBCDIC text >files on IBM mainframes represent newline by a byte with code 0x85. The >solution appears obvious to me: the EBCDIC encoding table used by the >XML parser should map byte 0x85 to Unicode character 0xA. This feels much better. And upon reflection, the thought of XML files which have been through a mainframe starting to percolate around the system with U+0085 embedded inside start tags makes me nervous; I can see a lot of people sitting in front of windows and unix boxes looking baffled because their existing program broke in response to a human-invisible stimulus. Hmmm, I wonder if current perl includes U+0085 in what matches \s? Etc..... Also, unlike (almost?) all the other XML errata, changing this would actively break pretty well every deployed piece of XML software in the world. -Tim
|

Cart



