First reply got blocked for being too long..
Unicode 16 appears to have 850 characters with category P...
<xsl:stylesheet version="3.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="text"/>
<xsl:variable name="u"
select="doc('
https://raw.githubusercontent.com/w3c/xml-entities/refs/heads/gh-pages/unicod
e.xml'
)"/>
<xsl:template name="P">
<xsl:value-of select="'Unicode', $u/*/@unicode,' '"/>
<xsl:for-each
select="$u//character[unicodedata/@category[starts-with(.,'P')]]">
<xsl:value-of select="
' ',
replace(@id,'U0?','U+'),
codepoints-to-string(@dec),
description
"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
then
saxon10 -it:P rc5.xsl
produces
Unicode 16
U+0021 ! EXCLAMATION MARK
U+0022 " QUOTATION MARK
U+0023 # NUMBER SIGN
U+0025 % PERCENT SIGN
U+0026 & AMPERSAND
U+0027 ' APOSTROPHE
...
On Fri, 28 Feb 2025 at 12:53, Roger L Costello costello@xxxxxxxxx <
xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> ChatGPT:
>
> > Please list all the characters denoted by the regular expression \p{P}
>
> The regular expression \p{P} is a Unicode property that matches any
> punctuation character. This includes a wide range of symbols used in
> writing to separate sentences and words, indicate pauses, or convey other
> structural or expressive functions. Here is a list of some common
> punctuation characters that \p{P} would match:
>
> Period: .
> Comma: ,
> Question mark: ?
> Exclamation mark: !
> Colon: :
> Semicolon: ;
> Quotation marks: " and '
> Apostrophe: '
> Hyphen: -
> Dash: b (em dash) and b (en dash)
> Parentheses: ( and )
> Brackets: [ and ]
> Braces: { and }
> Slash: / and \
> Vertical bar: |
> Ellipsis: b&
> Angle brackets: < and >
>
> This list is not exhaustive, as Unicode includes many other punctuation
> marks used in various languages and scripts. The \p{P} property covers all
> such characters, including those specific to certain languages or
> typographic conventions.
>
> > How many characters are in \p{P}
>
> The exact number of characters matched by \p{P} can vary depending on the
> version of the Unicode standard being used, as new punctuation characters
> may be added in future updates. As of Unicode 15.0, which was released in
> September 2022, there are over 100 punctuation characters. This includes
> common punctuation marks like periods, commas, and quotation marks, as well
> as less common ones used in various languages and scripts.
>
> For the most precise and up-to-date count, you would need to refer to the
> Unicode Character Database for the specific version you are interested in.
> The database provides detailed information about each character, including
> its properties and classifications.
|