[Home] [By Thread] [By Date] [Recent Entries]


[Robert Mena]

> Hi, I am developing an application that will have to
> build a DOM tree of html pages.
>
> I'll use such DOM trees to perform some
> analysis/comparisons.
>
> Since most of the time I'll find ill-formed documents
> I'd like to know if there are any parsers out there
> that "accept" this flaws and builds the tree anyway.
>
> I've tried domxml (php) with no luck.

The usual answer is to preprocess with Tidy - see

http://www.w3.org/People/Raggett/tidy/

You may also want to look at NekoHTML, at

http://www.apache.org/~andyc/

This work processed html, including fixing up some problems, and uses the
Xerxes JNI so you can build  a DOM.

Cheers,

Tom P



Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member