[Home] [By Thread] [By Date] [Recent Entries]
If you want to scrap everything (transforming HTML into XML), then Tidy is the right way to go (as mentioned in a previous posting). If you want to extract only SOME HTML information and map it to XML, then you should look at W4F (http://db.cis.upenn.edu/W4F/). There are a couple of on-line examples that show how to build XML gateways that transform on-the-fly HTML into XML. The XML can then be used by other applications. http://db.cis.upenn.edu/W4F/Examples/XML-Gateway/ There is also an interesting related article in JavaWorld: http://www.javaworld.com/javaworld/jw-03-2001/jw-0316-webdb.html Regards, Arnaud
|

Cart



