Subject: RE: XSLT script to report Unicode characters and code blocks in file?
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 29 May 2008 21:32:56 +0100
|
I wrote a transformation that uses unparsed-text() and regex processing to
create an XML version of the Unicode database; once you've got that, you can
easily look up what code block a particular character falls into because
it's part of the data for each character. (Well, most of the characters.
Some of the non-BMP entries share a single entry for a large group of
characters, which needs a bit of care).
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: David Sewell [mailto:dsewell@xxxxxxxxxxxx]
> Sent: 29 May 2008 20:45
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: XSLT script to report Unicode characters and
> code blocks in file?
>
> I'm working on a simple XSLT 2.0 script to list all distinct
> Unicode characters used in a file. That part of the script
> takes very few lines, thanks to distinct-values(),
> codepoints-to-string(), and string-to-codepoints().
>
> However, I'd also like to group the output by code block:
>
> http://www.fileformat.info/info/unicode/block/index.htm
>
> Best way I can see to do it is to write a local function that
> tests the codepoint value and uses lots and lots of
> <xsl:when> case tests to determine which block the character
> falls into. Not hard but a bit tedious. Has anyone invented
> this wheel already?
>
> DS
>
> --
> David Sewell, Editorial and Technical Manager ROTUNDA, The
> University of Virginia Press PO Box 801079, Charlottesville,
> VA 22904-4318 USA
> Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
> Email: dsewell@xxxxxxxxxxxx Tel: +1 434 924 9973
> Web: http://rotunda.upress.virginia.edu/
|