Hi Folks,
Thank you for your recommendations on how to check a bunch of XHTML files for
well-formedness. Here's what I found:
1. I was unable to obtain an EXE for the xml parser that Richard Tobin
created, RXP. This page
http://www.cogsci.ed.ac.uk/~richard/rxp.html
has a link to an EXE of RXP:
ftp://ftp.cogsci.ed.ac.uk/pub/richard/rxp.exe
However, that link does not work.
Anyone know where I can get the EXE of RXP?
2. Next, I tried xmlwf. I discovered that you must first download and install
EXPAT:
https://libexpat.github.io/
That results in downloading: expat-win32bin-2.2.10.exe
Next, double click on it and expat will be installed on your system. Find the
folder where expat was installed. In there is a bin folder and in the bin
folder is xmlwf.exe
I ran xmlwf on a folder that contains 10,000 XHTML files. Wow! It checked all
of them in a couple seconds. However, the error messages are poor. For
example, here is one of the error messages:
xhtml\htmloutput10.xhtml:206:2: mismatched tag
Compare that to the error message I get when I run my super-simple XSLT
program on the XHTML file:
Error on line 206 column 3 of htmloutput10.xhtml:
SXXP0003 Error reported by XML parser: The element type "input" must be
terminated by the
matching end-tag "</input>".
I find the latter error message to be more helpful.
Perhaps there is a flag that can be set in xmlwf to output more verbose/useful
error messages?
/Roger
-----Original Message-----
From: Liam R. E. Quin liam@xxxxxxxxxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
Sent: Tuesday, February 16, 2021 8:52 PM
To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
Subject: [EXT] Re: Use XSLT to check a bunch of XHTML files
forwell-formedness?
On Tue, 2021-02-16 at 21:42 +0000, Martin Honnen martin.honnen@xxxxxx
wrote:
> On 16.02.2021 22:10, Liam R. E. Quin liam@xxxxxxxxxxxxxxxxx wrote:
> > On Tue, 2021-02-16 at 21:04 +0000, Martin Honnen
> > martin.honnen@xxxxxx
> > wrote:
> > >
> > > In theory I think that should check with doc-available if the file
> > > is well-formed or not. Haven't tested however.
> >
> > It catches some problems, but will try to load the DTD.
>
> I thought Saxon has all the important W3C DTDs internalized.
It might, but last time i did this i was texting files with other DTDs,
including JATS (various different versions, too, each needing a different
catalogue file).
--
Liam Quin,B https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text
Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations: B http://www.fromoldbooks.org
|