> On 11 Oct 2016, at 21:00, Bridger Dyson-Smith bdysonsmith@xxxxxxxxx
<xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> <?xml version="1.0" encoding="iso-8859-1"?>
> <documents>
> <document>The reality of the effect of natural ventilation in a residential
attic cavity has been the topic of many debates and scholarly reports since
the 1930C"b,b"s.</document>
> </documents>
It looks very much like
1) in the XML header you claim the document is ISO-8859-1 encoded, while
really
2) it is not. I can see that one character, that b , was decoded as three
(C"b,b"). Had the document really been encoded with ISO-8859-1, any decoding
would have ended up with at most one character (because ISO-8859-1 does not
use multibyte characters).
try to replace biso-8859-1b in the xml header with butf-8b, does that
work?
Regards, Soren
|