Home > XML IDE - XML Editor > XML Editor Key Features > XHTML Tools > HTML Tidy
HTML TidyHTML Tidy is a program originally created by Dave Raggett for turning HTML into something that can be parsed as XML. This capability is extremely useful as it allows XSLT and XQuery programs to fetch HTML pages and act upon them — expanding the realm of reachable documents to include the vast number of HTML pages available on the internet. Within Stylus Studio, HTML Tidy is used in these places: the HTML-to-XSLT Wizard and the HTMLTidy Converter. HTML to XSLT WizardHTML Tidy is used within the HTML-to-XSLT Wizard (File|Document Wizards|XSLT Editor|HTML to XSLT) to create a stub HTML file that would generate the given HTML file, suitable for splicing in your own code for generating the details. This way, if you had either an existing HTML template or one custom-created, you could use that as the basis for a report or other output, and stick your transformation or reporting code right in the middle. The HTML Tidy ConverterThis is likely the far more interesting case. Using this converter, any piece of reachable HTML can be used as a source for XSLT, XQuery, or even more complex operations like the XML Publisher. The trick is to prepend the converter scheme to the URL, like this: converter:HTMLTidy?http://..... Now, anywhere we reference the original URL, instead of HTML, our process will see XML — thanks to the on-the-fly conversion from HTML to XML. Let's put this into practice now with some demonstrations: HTML Tidy and XSLTSuppose that you wanted to wrap a web query, such as the http://www.weather.com/weather/tenday/your zip code here So, for the area where Stylus Studio's headquarters is, we'd issue http://www.weather.com/weather/tenday/01730 (Note that since The Weather Channel website (as well as the weather itself!) changes, here is a cached copy of just the HTML without any images.) But how would we automate it? To fetch the HTML and turn it into XML, we'll use the converter trick from above, and set that as our input source to plain ol' fashioned XSLT. So feeding converter:HTMLTidy?http://www.weather.com/weather/tenday/01730 to
will yield
Congratulations! You've just scraped existing Web content, using the HTML Tidy Converter to produce new HTML via XML! But you could do anything with the source data once it's in XML form. Download a copy of Stylus Studio® XML Enterprise Suite and try this with your own location, or investigate other web sites. Mine your own company's intranet for information that rightly belongs in other locations. The possibilities are only limited by the number of HTML pages on the internet! HTML Tidy and XQueryThe equivalent program for your XQuery weather report using the HTML Tidy converter would be:
What's notable here is you can see how you can directly embed the converter URL right into the source program. It can also be passed in as context or as a parameter, giving you maximum flexibility. And the output of this sample is identical to the XSLT and HTML Tidy above. To learn more about the advanced XSLT and XQuery tools, as well as the other options for deploying XML applications, see the various pages on this site or download and run a free evaluation copy of Stylus Studio® right now. XML and Weather
|