RE: [xsl] XSLT script to report Unicode characters and code

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

Subject: RE: XSLT script to report Unicode characters and code blocks in file?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 29 May 2008 21:32:56 +0100

I wrote a transformation that uses unparsed-text() and regex processing to
create an XML version of the Unicode database; once you've got that, you can
easily look up what code block a particular character falls into because
it's part of the data for each character. (Well, most of the characters.
Some of the non-BMP entries share a single entry for a large group of
characters, which needs a bit of care).

Michael Kay
http://www.saxonica.com/

> -----Original Message-----
> From: David Sewell [mailto:dsewell@xxxxxxxxxxxx] 
> Sent: 29 May 2008 20:45
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject:  XSLT script to report Unicode characters and 
> code blocks in file?
> 
> I'm working on a simple XSLT 2.0 script to list all distinct 
> Unicode characters used in a file. That part of the script 
> takes very few lines, thanks to distinct-values(), 
> codepoints-to-string(), and string-to-codepoints().
> 
> However, I'd also like to group the output by code block:
> 
> http://www.fileformat.info/info/unicode/block/index.htm
> 
> Best way I can see to do it is to write a local function that 
> tests the codepoint value and uses lots and lots of 
> <xsl:when> case tests to determine which block the character 
> falls into. Not hard but a bit tedious. Has anyone invented 
> this wheel already?
> 
> DS
> 
> --
> David Sewell, Editorial and Technical Manager ROTUNDA, The 
> University of Virginia Press PO Box 801079, Charlottesville, 
> VA 22904-4318 USA
> Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
> Email: dsewell@xxxxxxxxxxxx   Tel: +1 434 924 9973
> Web: http://rotunda.upress.virginia.edu/

Current Thread
XSLT script to report Unicode characters and code blocks in file? David Sewell - 29 May 2008 19:43:08 -0000 Michael Kay - 29 May 2008 20:33:23 -0000 <= David Carlisle - 29 May 2008 21:28:35 -0000 David Carlisle - 29 May 2008 21:13:58 -0000 Colin Paul Adams - 30 May 2008 05:41:30 -0000 Michael Kay - 30 May 2008 07:10:27 -0000

<- Previous	Index	Next ->
XSLT script to report Unicode, David Sewell	Thread	Re: XSLT script to report Uni, David Carlisle
XSLT script to report Unicode, David Sewell	Date	Re: XSLT script to report Uni, David Carlisle
	Month

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >