[Home] [By Thread] [By Date] [Recent Entries]


I'd looked at BreakIterator way back when it was still at Taligent. I can't
recall why I chose not to go with it at the time (efficiency concerns?), but
it looks worth revisiting. Thanks for the suggestion.
Howard

-----Original Message-----
From: J.Pietschmann [mailto:j3322ptm@y...]
Sent: Sunday, December 07, 2003 2:00 AM
To: Howard Katz; xml-dev@l...
Subject: Re:  ANN: XQEngine 0.61


Howard Katz wrote:
> All my word breaking is delegated to a class called (surprise)
WordBreaker,
> which implements a very simple algorithm that uses Java's
> Character.isLetterOrDigit() function to determine where words begin and
end.
> This works well for Western languages. If you want to optimize for a
> non-Western language, you can override WordBreaker and implement word
> breaking in whatever way makes sense for your particular language or
> languages of interest. That's the theory at any rate ...

Have a look at java.text.BreakIterator, which helps to implement
line and word breaking along the Unicode standards (most notably
UTR14).

J.Pietschmann


-----------------------------------------------------------------
The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
initiative of OASIS <http://www.oasis-open.org>

The list archives are at http://lists.xml.org/archives/xml-dev/

To subscribe or unsubscribe from this list use the subscription
manager: <http://lists.xml.org/ob/adm.pl>


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member