- From: Michael Glavassevich <mrglavas@c...>
- To: xml-dev@l...
- Date: Fri, 20 May 2011 21:28:23 -0400
John Cowan <cowan@c...> wrote on 05/20/2011 06:59:04 PM:
> Mike Sokolov scripsit:
>
> > BOM in UTF-8 seems to cause problems with some XML parsers
> > (incl. Xerces 2.9.1). They seem to believe it is white space in the
> > prolog. To deal with this, we have had to insert a processor prior to
> > our parser which checks for BOM and strips it out.
>
> Support for the 8-BOM was not explicitly required until the XML 1.0
> Third Edition of 2004. Xerces 2.9.1 may be out of date.
What doesn't work? Xerces has known how to handle the UTF-8 BOM for much longer than that. All releases since 2003 [1] have supported it.
Note that you need to the let parser use its own encoding support for the InputStream.
Don't pass in a UTF-8 Reader from the JDK. The JDK UTF-8 InputStreamReader [2] apparently doesn't recognize the BOM and perhaps never will.
> --
> XQuery Blueberry DOM John Cowan
> Entity parser dot-com cowan@c...
> Abstract schemata http://www.ccil.org/~cowan
> XPointer errata
> Infoset Unicode BOM --Richard Tobin
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@l...
> subscribe: xml-dev-subscribe@l...
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[1] http://svn.apache.org/viewvc/xerces/java/trunk/src/org/apache/xerces/impl/XMLEntityManager.java?r1=318934&r2=318940&diff_format=h
[2] http://bugs.sun.com/view_bug.do?bug_id=4508058
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@c...
E-mail: mrglavas@a...
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
|