[Home] [By Thread] [By Date] [Recent Entries]

  • To: xml-dev@l...
  • Subject: Sax filter encoding problem
  • From: leo zhu <leozhuca@y...>
  • Date: Fri, 13 Feb 2004 09:26:16 -0800 (PST)

I am trying to write a simple sax filter in Java to
experiment spliting large xml file into small ones.
But I found I couldn't get same content as that in
original xml file. 

For example, I have a xml file as following which was
encoded as "UTF-8".

<?xml version="1.0" encoding="utf-8"?>
<Root>
<Record>“Multipurpose“ abcd</Record>
<Record>the “banana oil” test</Record>
</Root>

It looks OK when I used IE to browse it but after I
used it as input file and run my sax program (just use
sax API to write same file to output file), the
content changed to as followings:

<Root>
<Record>“Multipurpose“ abcd</Record>
<Record>the “banana oil” test</Record>
</Root>

I checked the respective binary: "e2 80 9c " changed
to "93" and "e2 80 9d" changed to "94"! It's not what
I wanted and also I got error when I tried to use IE
to browse it!

At this time, I used 

parser.parse(new InputSource(new File
(input_file_name).toURL().toString ()));

in my program.

And then, I tried another way:

FileReader in_file = new FileReader(input_file_name);
parser.parse(new InputSource(new File
(args[0]).toURL().toString ())));

after running my program, the output looks like:

<Root>
<Record>“Multipurpose“ abcd</Record>
<Record>the “banana oilâ€? test</Record>
</Root>

Speaking with binary, "e2 80 9c " is OK but "e2 80 9d"
still was changed to "94". It's also illegl character
when I use IE to browse it.

Can any body tell me how to handle this problem? And
which way is best way to wrap the input file in
inputsource? Any reply would be appreciated!

My test program likes following:

public class parseXML extends DefaultHandler {

       public void startElement(java.lang.String
namespaceURI,
		java.lang.String localName, java.lang.String qName,
Attributes atts)
	{
             ......
        }

        public void characters(char[] ch, int start,
int length)
	{
           for(int i=0; i<length; i++){
                                                      
                   System.out.print(ch[start+i]);
                }
                .......
        }
......
}


Thanks.

Leo



__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member