[Home] [By Thread] [By Date] [Recent Entries]
Howard Katz wrote: > All my word breaking is delegated to a class called (surprise) WordBreaker, > which implements a very simple algorithm that uses Java's > Character.isLetterOrDigit() function to determine where words begin and end. > This works well for Western languages. If you want to optimize for a > non-Western language, you can override WordBreaker and implement word > breaking in whatever way makes sense for your particular language or > languages of interest. That's the theory at any rate ... Have a look at java.text.BreakIterator, which helps to implement line and word breaking along the Unicode standards (most notably UTR14). J.Pietschmann
|

Cart



