[Home] [By Thread] [By Date] [Recent Entries]
Liam, hello. On 2010 Dec 9, at 16:49, Liam R E Quin wrote: > On Thu, 2010-12-09 at 12:13 +0000, Norman Gray wrote: > [...] >> ...and later, Liam Quinn wrote: > > Actually it was me (Liam Quinn is someone else) Ah, apologies. (several someone elses, by the look of it) >>> The most frequent change request I hear is to remove the strict syntax >>> requirements and make every XML implementation include some sort of >>> HTML-like expert system to do the parsing, automatically "correcting" >>> errors like missing quotes off attribute values. > > Please note, I'm *not* advocating such a change, but rather saying that > it's the request I hear most often. It didn't sound to me like you _were_ advocating it; sorry for not making that clearer. > [...] >> If 'XML-bis' were defined using lexer events, with strings defined as >> sequences of unicode code points, then a JIS-encoded document with >> missing quotes could be (required to be) handled by the lexer, >> entirely transparently. In other words, why is file/wire encoding >> anything to do with XML? > Because XML is about file interchange. > > If your XML processor won't read my XML document, we've failed. I think that separating out the lexing makes this easier, not harder. I could imagine a standard declaring that an XML parser shall process a stream of unicode codepoints. The standard might note that this does imply that there's some sort of shim between the XML and the file I/O, but declare that what's in this shim is none of its concern. The obvious content of that shim would of course be nothing more than the platform's UTF-8 reading support, but if someone wanted to be funky and support something else, in a context where all the necessary information was available (for example, from an HTTP header), then the XML standard isn't about to stop them. I'm not _necessarily_ advocating this as a vital ingredient, but it would surely short-circuit a certain amount of agonising about which UTF-* variants to accommodate, and separates parsing layers quite naturally. [A more out-there position is to define XML in terms of a sequence of SAX events, or equivalent, but that obviously stops being a file-interchange standard] All the best, Norman -- Norman Gray : http://nxg.me.uk
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



