[Home] [By Thread] [By Date] [Recent Entries]

Subject: Re: Converting HTML to plain text
From: Larry Kollar <kollar@xxxxxxxxxx>
Date: Tue, 22 Jun 2004 20:41:17 -0400

I can constrain HTML pages to be valid XML. So, the hard part is solved.
But still I don't know of a good solution to covert it to plain text. ...

If tables weren't an issue, I think "lynx -dump file.html" would work for you.
To deal with tables, you could try converting to groff format and using
groff's "tbl" pre-processor to format your tables.
--
Larry Kollar k o l l a r @ a l l t e l . n e t
"The hardest part of all this is the part that requires thinking."
-- Paul Tyson, on xml-doc



Current Thread
Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member