Subject: RE: Re: text() word lists
From: "Michael Kay" <mhk@xxxxxxxxx>
Date: Mon, 9 Feb 2004 11:27:37 -0000
|
>
> Not that I understand it,
> but ( and ) seem to be included Michael?
> <word>) - 71</word>
> <word>(this - 11</word>
>
>
> Is it modify by updating
> for $w in tokenize(string(.), '[\s.?!,]+')[.] return
> line?
>
> for $w in tokenize(string(.), '[\s.?!, )(]+')[.] return
> seems to work.
I only spent five minutes on this: producing a decent natural language
tokenizer takes a little bit longer than that! Obviously its easy to
write a more intelligent regex, I was only trying to illustrate the
principles.
Michael Kay
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
|