[Home] [By Thread] [By Date] [Recent Entries]


* Bob Foster <bob@o...> [2005-08-13 02:55]:

> Alan Gutierrez wrote:

> >     I'm implementing B-Tree to index XML documents. I'd like a
> >     to use maximum character value as a boundry, or failing that a
> >     minimum character value.

> I believe the current Unicode character range, and the one that was 
> effective for the XML 1.0 standard, is 0x20-0x10000 (note 17 bits) plus 
> the control characters, '\t' and '\n' and minus the surrogate pair range 
> and 0xFFFF and 0xFFFE. The fact that Java doesn't have much support for 
> the surrogate pairs, which are the only way to express character values 
> greater than 0xFFFF, doesn't mean they won't appear in XML documents.

    It gives me something to Google about, "surrogate pairs". I see
    Jaxen has some code to convert them. 

    Am I seeing that with Unicode in Java, you need to work with
    String and not with individual char? That puts a dent in my
    algorithm, which advanced along the characters in the string.

> So the answer is, no there's no single 16-bit maximum character value. 
> The test requires access to at least the next character and a little code.

    Is zero the absolute minimum? If so I could build reverse indices.

    Thanks for your help, Bob, Derek, and Robert. I'm not getting
    any feedback at comp.lang.java.programmer.

--
Alan Gutierrez - alan@e...
    - http://engrm.com/blogometer/index.html
    - http://engrm.com/blogometer/rss.2.0.xml

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member