[Home] [By Thread] [By Date] [Recent Entries]
On 17/01/2025 23:16, dvint@xxxxxxxxx wrote:
First off, is anyone aware of a good way to merge a bunch of HTML techdoc pages into a single HTML so a PDF file can be generated with something like Prince or Weasyprint? If I understand you right, you want to catenate the contents of the <body> elements and create a new HTML file. Assuming the HTML files share a common <head> (ie you only want it once): 1. Make sure they are all well-formed XHTML/HTML5 (use Tidy) 2. Copy the Document Type Declaration, the <html> start-tag, the whole of the <head> element, and the <body> start tag into your target.html 3. For f in *.html; do lxgrep 'body/*' $f >>target.html; done 4. Append </body></html> to target.html lxgrep is part of the LTxml2 utilities from https://www.ltg.ed.ac.uk/software/ltxml2/ Peter
|

Cart



