RE: [xsl] Re: text() word lists

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

Subject: RE: Re: text() word lists
From: James Cummings <James.Cummings@xxxxxxxxxxxxxx>
Date: Mon, 9 Feb 2004 10:21:00 +0000 (GMT)

On Mon, 9 Feb 2004 David.Pawson@xxxxxxxxxxx wrote:

> I said:
>       Is it possible to remove all numbers too?
>     Or is that a part of the lexicographers toolset?

It can be (I'm reliably informed by a linguist sitting
a few desks away), in that someone might be analysing the
text of (say) a motoring magazine. "The A1-M1 link road"
(for UK readers) or "a V6 Engine...or I could have had a V8".
where any comparisons don't make sense without the numbers.

So what is the best way to parameterise these to allow
turning on/off the removal of numbers?  And while
we're at it, turning on/off the removal of hyphens or
other possibly-word-forming characters?

> <xsl:template match="/">
> <frequencies>
> <xsl:for-each-group group-by="." select="
>    for $w in tokenize(string(.), '[\s.?!,)(]+')[.] return lower-case($w)">
>   <xsl:sort select="count(current-group())" order="descending"/>
>   <xsl:analyze-string select="current-grouping-key()" regex="[0-9]+">
>     <xsl:non-matching-substring>
>       <word><xsl:value-of select="current-grouping-key(), '  -  ',
> count(current-group())"/></word>
>     </xsl:non-matching-substring>
>     <xsl:matching-substring/>
>   </xsl:analyze-string>
> </xsl:for-each-group>
> </frequencies>
>
> </xsl:template>
>
> Seems to work nicely.
>   Thanks Michael, very useful.
>
> regards DaveP

---
Dr James Cummings, Oxford Text Archive, University of Oxford
James.Cummings at ota.ahds.ac.uk http://users.ox.ac.uk/~jamesc/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Current Thread
RE: Re: text() word lists David . Pawson - Mon, 9 Feb 2004 03:39:50 -0500 (EST) Michael Kay - Mon, 9 Feb 2004 06:27:37 -0500 (EST) <Possible follow-ups> David . Pawson - Mon, 9 Feb 2004 04:11:23 -0500 (EST) David Carlisle - Mon, 9 Feb 2004 05:00:22 -0500 (EST) James Cummings - Mon, 9 Feb 2004 05:21:37 -0500 (EST) <= David Carlisle - Mon, 9 Feb 2004 05:56:01 -0500 (EST)

<- Previous	Index	Next ->
Re: Re: text() word lists, David Carlisle	Thread	Re: Re: text() word lists, David Carlisle
RE: converting flat xml data , Andreas L. Delmelle	Date	RE: converting flat xml data , Andreas L. Delmelle
	Month

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >