[Home] [By Thread] [By Date] [Recent Entries]
On 16/12/2010 04:32, Henri Sivonen wrote: > On Dec 14, 2010, at 05:17, David Carlisle wrote: > >> I've no complaint with html5 having defined fixup rules to give >> consistent error recovery from overlapping markup and other >> horrors, but I think the fact that it parses well formed XML and >> produces different trees is just wrong. > > The easiest proof why it has to be this way is: There are Web pages > that rely on the<html> tag getting implied per HTML 4 when not > present in the source text. Therefore, the HTML5 parsing algorithm > always outputs a tree whose root element is html. There are XML > documents whose root element is not html. Therefore, it has to be > that there are well-formed XML documents that parse into different > trees using an XML parser and an HTML parser. Yes I nearly mentioned those cases as an exception:-) But you give the example that's almost reasonable (html/head/body/tbody implication) while not responding to the cases that actually cause the problems as they affect the parsing of arbitrarily small fragments, namely /> and the different handling of end tags for individual void elements. It would have been possible to also stop implying html start tags if you had been prepared to have a "more standards mode" implied by (say) <!doctype html> there were reasons for not doing that, but it's a choice made, not an absolute rule that it would have been impossible to have a sensible grammar for html. David ________________________________________________________________________ The Numerical Algorithms Group Ltd is a company registered in England and Wales with company number 1249803. The registered office is: Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom. This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. ________________________________________________________________________
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



