[Home] [By Thread] [By Date] [Recent Entries]
How are people dealing with UTF-8 vs. unicode vs. Latin-1? I have been working on a lexer (using Flex) that assumes the input stream is either Latin-1 or UTF-8 and returns byte strings to the caller. Since Java chars are Unicode, I assume that the Java XML parsers are doing the opposite, right? Is there any consensus on what form PCDATA or GI names should take when they are returned to the application? On a related note, when do character entities get replaced - in the lexer or later on? My reading of the draft is that the scanner must do the replacement if the examples of rescanning are to work. /cco Chris Olds colds@n... xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@i... the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@i...)
|

Cart



