[Home] [By Thread] [By Date] [Recent Entries]
[Robert Mena] > Hi, I am developing an application that will have to > build a DOM tree of html pages. > > I'll use such DOM trees to perform some > analysis/comparisons. > > Since most of the time I'll find ill-formed documents > I'd like to know if there are any parsers out there > that "accept" this flaws and builds the tree anyway. > > I've tried domxml (php) with no luck. The usual answer is to preprocess with Tidy - see http://www.w3.org/People/Raggett/tidy/ You may also want to look at NekoHTML, at http://www.apache.org/~andyc/ This work processed html, including fixing up some problems, and uses the Xerxes JNI so you can build a DOM. Cheers, Tom P
|

Cart



