Subject: RE: possible workarounds to process files with invalid character encoding ...
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Fri, 12 Dec 2008 21:26:38 -0000
|
If you're capable of writing a Java Reader that will process this file into
a stream of characters, then you can get Saxon to use this Reader by
nominating a custom UnparsedTextURIResolver.
Alternatively, I suspect you can do it at the Java level by registering an
encoding name for the encoding and associating it with a decoder for that
encoding - but I'm not familiar with the details.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Matthias Einbrodt [mailto:matthias.einbrodt@xxxxxxxxxxxxx]
> Sent: 12 December 2008 21:14
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: possible workarounds to process files with
> invalid character encoding ...
>
> Hello,
>
> I'm trying to transform a textfile with xslt using the
> unparsed-text and tokenize functions. Unfortunately the text
> file consists of characters which are encoded with a non
> Unicode compliant encoding scheme. So as expected my Saxon
> Processor (version 9.1.0.3 Basic) shows me a
> *MalformedInputException *when I want to parse the file.
>
> Now my question is if there are any "workarounds" to make
> Saxon process the file anyway. Maybe by:
>
> (1) Writing a sort of plugin that let's Saxon support also
> non Unicode compliant encodings;
>
> (2) By adding in some way Metadata to the input file which
> Saxon or another XSLT Parser can handle and that specifies a
> mapping of the used character encodings to the appropriate
> code points of a Unicode compliant encoding.
>
> And if there exists such a workaround is it even worth trying
> to implement it or would someone be better of preprocessing
> the file with a custom Java-Program or by even trying to
> modify the program that creates such text-files in such a way
> that it uses a Unicode-compliant encoding scheme rather than
> it's own custom one?
>
> What are your opinions?
>
> Best Regard
>
> Matthias Einbrodt
|