Thanks, Terry.
I get: Stylesheet compilation failed with 1 error(s):
Error 1 at line 27:48 : xsl:result-document is disabled when extension
functions are disabled
https://xsltfiddle.liberty-development.net/ej9EGcD/10
Cheers, Manuel
Terry Badger terry_badger@xxxxxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx>
escreveu no dia sC!bado, 11/05/2019 C (s) 17:12:
> Try this. It is easier for me to understand.
> <?xml version="1.0"?>
> <!-- terry badger 2019-05-11 use regex to separate types of text then
> repackage in new collection order -->
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> version="2.0">
> <xsl:output encoding="utf-8" indent="yes"/>
> <xsl:strip-space elements="*"/>
> <!--
>
==========================================================================-->
>
> <!--variable with content regrouped into multiple parts for each seg
> -->
> <xsl:variable name="packaged">
> <xsl:element name="wrapper">
> <xsl:for-each select="//seg">
> <xsl:copy>
> <xsl:attribute name="xml:lang"
> select="parent::*/@xml:lang"/>
> <xsl:analyze-string select="." regex="<br>">
> <xsl:non-matching-substring>
> <xsl:element name="part">
> <xsl:copy-of select="."/>
> </xsl:element>
> </xsl:non-matching-substring>
> </xsl:analyze-string>
> </xsl:copy>
> </xsl:for-each>
> </xsl:element>
> </xsl:variable>
> <!--
>
==========================================================================-->
>
> <!-- start at root and output a result document to make it easier to
> see -->
> <xsl:template match="/">
> <xsl:result-document href="output.xml">
> <xsl:apply-templates/>
> </xsl:result-document>
> </xsl:template>
> <!--
>
==========================================================================-->
>
> <xsl:template match="tmx | body | header">
> <xsl:copy>
> <xsl:copy-of select="@*"/>
> <xsl:apply-templates/>
> </xsl:copy>
> </xsl:template>
> <!--
>
==========================================================================-->
>
> <xsl:template match="tu">
> <xsl:for-each select="$packaged/wrapper/seg[1]/part">
> <xsl:variable name="part-order" select="position()"/>
> <xsl:element name="tu">
> <xsl:attribute name="tuid" select="position()"/>
> <xsl:for-each select="$packaged/wrapper/seg">
> <xsl:element name="tuv">
> <xsl:attribute name="xml:lang"
> select="@xml:lang"/>
> <xsl:element name="seg">
> <xsl:value-of
> select="normalize-space(part[position() = $part-order])"/>
> </xsl:element>
> </xsl:element>
> </xsl:for-each>
> </xsl:element>
> </xsl:for-each>
> </xsl:template>
> </xsl:stylesheet>
>
> Terry
>
>
>
>
>
>
> On bThursdayb, bMayb b9b, b2019b b04b:b16b:b36b
bPMb bEDT, Martin Honnen
> martin.honnen@xxxxxx <xsl-list-service@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>
>
>
>
> Am 09.05.2019 um 21:55 schrieb Martin Honnen martin.honnen@xxxxxx:
> > Am 09.05.2019 um 21:42 schrieb Manuel Souto Pico terminolator@xxxxxxxxx:
> >>
> >>
> >> @Martin, your example works really well. I had to edit the expression,
> >> as in my real files sometimes they have used lists instead of
> >> linebreaks:
> >>
> >> <xsl:param name="lb"
> >> as="xs:string"></?(li|ul|br)\s*/?></xsl:param>
> >>
> >> However, I can see what I would also need to split at the end of
> >> sentences when there's no escaped tag but just final punctuation. To
> >> avoid the transformation eating the punctuation, I have tried with a
> >> lookbehind assertion but it seems it's not supported:
> >>
> >> <xsl:param name="lb"
> >> as="xs:string">(?<=[.!?])\s|</?(li|ul|br)\s*/?></xsl:param>
> >>
> >> Any ideas?
> >>
> >
> > In general, if there is markup, it might be better to try to parse it,
> > in your initial sample you seemed to have simple HTML empty element
> > syntax with <br> elements, now with the adapted regular expression it
> > seems you expect opening and closing tags.
> >
> > If you know the escaped markup is an XML fragment then I would try to
> > parse it with the "parse-xml-fragment" function, if it is HTML, then I
> > would look into using David Carlisle's HTML parser implementation done
> > in pure XSLT 2 or use an extension function like the commercial editions
> > of Saxon offer.
> >
> > After parsing, you can then apply normal templates or grouping
> > constructs.
> >
> An adaption of the previous suggestion, but now with escaped XML syntax
> in the sample input, to then use parse-xml-fragment, is at
>
> https://xsltfiddle.liberty-development.net/ej9EGcD/5
>
> and does
>
> <xsl:template match="tu">
> <xsl:variable name="split">
> <xsl:apply-templates mode="split"/>
> </xsl:variable>
> <xsl:for-each-group select="$split/tuv/seg" group-by="position()
> mod count($split/tuv[1]/seg)">
> <tu tuid="{position()}">
> <xsl:apply-templates select="current-group()/snapshot()/.."/>
> </tu>
> </xsl:for-each-group>
> </xsl:template>
>
> <xsl:mode name="split" on-no-match="shallow-copy"/>
>
> <xsl:template match="seg" expand-text="yes" mode="split">
> <xsl:for-each-group select="parse-xml-fragment(.)/node()"
> group-ending-with="br">
> <seg>{.}</seg>
> </xsl:for-each-group>
> </xsl:template>
>
> For HTML parsing you would need to use an extension or David Carlisle's
> HTML parser available on Github, but the approach then is the same. Of
> course handling different elements like various list constructs needs
> more code but once you have a tree you can process the "normal" XSLT way
> you can write more templates and/or more modes for various processing
> steps to address more complex input structures.
|