[Home] [By Thread] [By Date] [Recent Entries]


* Tom Moog <tmoog@s...> [2005-08-13 21:53]:


> On Aug 13 07:19, Alan Gutierrez <alan-xml-dev@e...> wrote:
> >
> > Subject: Re:  XML Max Character Value
> >
> > * Bob Foster <bob@o...> [2005-08-13 02:55]:
> > 
> > > Alan Gutierrez wrote:
> > 
> > > >     I'm implementing B-Tree to index XML documents. I'd like a
> > > >     to use maximum character value as a boundry, or failing that a
> > > >     minimum character value.
> > 
> > > I believe the current Unicode character range, and the one that was 
> > > effective for the XML 1.0 standard, is 0x20-0x10000 (note 17 bits) plus 
> > > the control characters, '\t' and '\n' and minus the surrogate pair range 
> > > and 0xFFFF and 0xFFFE.

> The maximum for xml is 0x10ffff.

> You may want to think in terms of utf-8 encoding.

> One characteristic of utf-8 is that it preserves the order of
> strings.  In other words, if code(A) < code(B), then utf-8(A)
> utf-8(B) when compared as a sequence of unsigned 8 bit bytes.

    That sounds good. For text data like XSLT dates, '2005-08-10',
    where locale and colation might not matter, I'll want to use the
    simplest, smallest representation possible. Maybe not the best
    example, since there is binary representation.

    In any case...

    I've reworked my algorithm so that it starts from a head node
    that is an implicit least value node. The conditionals only
    apply to subsequent nodes, which are built from inserted values.
    
    Thus, I've removed the need for a sentinal.  I'll only ever be
    testing against characters found within the XML document.

    Thank you everyone who responded, I'm sure I'm going want to ask
    more questions later about collation.

--
Alan Gutierrez - alan@e...
    - http://engrm.com/blogometer/index.html
    - http://engrm.com/blogometer/rss.2.0.xml

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member