[Home] [By Thread] [By Date] [Recent Entries]
On Tue, Dec 05, 2006 at 11:24:55AM -0800, Redefined Horizons wrote:
> I'm nearing the completion of an open source XML parser in Java. (It's
> an event-based, pull parser.)
why? do we need more parsers? :-)
[...]
> I'm having some trouble figuring out how to handle "newline"
> characters in XML text files on different platforms. I typically
> ignore all whitespace in the parser, but I wanted to count newline
> characters to aid in errror reporting.
You can't ignore whitespace, you have to return it to the application,
except when it's explicitly ignorable because a DTD says so, or when
it's e.g. inside a tag matching the S production.
> I've taken a look at the XML specs, but didn't completely understand
> what they had to say about newline characters.
Can you ask a more specific question? Are you asking when normalization
happens? By newline do you mean the character at Unicode code point 10?
Remember that the spaces inside the desc element in:
<desc>his socks were <em>very</em> <pattern>argyle</pattern>.</desc>
are all important, including the one between </em> and <pattern>.
For error reporting, line counting depends on the platform, and
should probably correspond to using a native text editor on that
platform -- as that's what users will have to use when they
get an error.
Liam
--
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/ * http://www.fromoldbooks.org/
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



