Re: [xsl] [XSLT2.0] xsl:analyze-string@regex syntax too limi

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

Subject: Re: [XSLT2.0] xsl:analyze-string@regex syntax too limited
From: Gunther Schadow <gunther@xxxxxxxxxxxxxxxxxxxxxx>
Date: Thu, 16 Dec 2004 18:14:56 -0500

Thanks, good find. The only problem now is that this issue needs to be 
adressed in java.util.regex.

Colin Paul Adams wrote:

>>>>>>"Gunther" == Gunther Schadow <gunther@xxxxxxxxxxxxxxxxxxxxxx> writes:
> 
> 
>     Gunther> The boundary matcher matches a zero-width substring
>     Gunther> between a character matching the character class
>     Gunther> [A-Za-z_0-9] and a character matching the character class
>     Gunther> [^A-Za-z_0-9] or vice versa.  </quote>
> 
>     Gunther> This is pretty clear. It may not make the
>     Gunther> internationalization people very happy because I can't do
>     Gunther> word-boundary matches on Hindi text. That's a true
>     Gunther> concern.
> 
> So address it. Unicode report TR18 says (for Level 1 support):
> 
> RL1.4  	Simple Word Boundaries
> 	To meet this requirement, an implementation shall extend the word boundary mechanism so that:
> 
>    1.
> 
>       The class of <word_character> includes all the Alphabetic values from the Unicode character database, from UnicodeData.txt [UData]. See also Annex C: Compatibility Properties.
>    2.
> 
>       Non-spacing marks are never divided from their base characters, and otherwise ignored in locating boundaries. 
> 
> Level 2 provides more general support for word boundaries between
> arbitrary Unicode characters which may override this behavior.
> 
> Level 1 support should certainly be met.

-- 
Gunther Schadow, M.D., Ph.D.                  gschadow@xxxxxxxxxxxxxxx
Associate Professor           Indiana University School of Informatics
Regenstrief Institute, Inc.      Indiana University School of Medicine
tel:1(317)630-7960                       http://aurora.regenstrief.org

Current Thread
RE: [XSLT2.0] xsl:analyze-string@regex syntax too limited, (continued) Michael Kay - 15 Dec 2004 19:48:45 -0000 Gunther Schadow - 15 Dec 2004 22:42:14 -0000 Colin Paul Adams - 16 Dec 2004 07:25:41 -0000 Michael Kay - 16 Dec 2004 09:19:37 -0000 Gunther Schadow - 16 Dec 2004 23:56:37 -0000 <= Colin Paul Adams - 17 Dec 2004 06:41:07 -0000

<- Previous	Index	Next ->
RE: [XSLT2.0] xsl:analyze-str, Michael Kay	Thread	Re: [XSLT2.0] xsl:analyze-str, Colin Paul Adams
XSL 1.1 Second Working Draft, Klaas_Bals	Date	RE: no attributes outputed wh, Michael Kay
	Month

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >