[Home] [By Thread] [By Date] [Recent Entries]

  • From: Rick Jelliffe <rjelliffe@a...>
  • To: Uche Ogbuji <uche@o...>
  • Date: Mon, 3 Nov 2014 15:13:53 +1100


Do you want to repair the file? Perhaps this could work:

Make an xslt2 null transform.
Make a template for the description element. 
In that template do a text substitution on data content  to replace " &amp;" with  some unlikely single character, eg &#x4000; convert to a sequence of codepoints with string-to-codepoints(), and put that into a variable.
Iterate over each codepoint in the variable, outputting it as a character, and when you find 0x4000; output it in xsl:text with disable-output-escaping to true.

On 30/10/2014 11:28 PM, "Uche Ogbuji" <uche@o...> wrote:
On Thu, Oct 30, 2014 at 5:16 AM, Gareth Oakes <goakes@g...> wrote:
>I'm sure someone must have written a nice little python script or
>something similar to do this sometime, anyway I have some XML with
>stuff like
>
><description>PJ&amp;nbsp;72 fra &amp;Ouml;rsj&amp;ouml; Belysning er
>en funktionel lampe&amp;nbsp;som kan justeres efter eget behov.
>Fremstillet af lakeret metal og&amp;nbsp;f&aring;s i mange
>farver.&amp;nbsp;I serien f&amp;aring;s skrivebordslamper, gulvlamper,
>loftslamper.&amp;nbsp;&amp;nbsp;</description>
>
>anyway, rather than sitting down and writing a solution for this
>problem I am supposing someone has written it in the past, and I can
>just use that.

I'm guessing you want the &amp;s to become ampersands? I'm pretty sure the
regular expression /&amp;/&/g would work in most environments.

Could be dangerous because a plain old &amp; would reduce to a WF error after that transform, and those are pretty common. Unless, that is, you know that &amp; has been "psychoescaped" to &amp;amp; . Can't tell from the sample given.

In other words, the problem is underspecified to provide an off-the shelf solution; it depends on knowing the original pattern reliably, so it might indeed be that writing a bit of code is best.


--
Uche Ogbuji                                       http://uche.ogbuji.net
Founding Partner, Zepheira                  http://zepheira.com
Author, _Ndewo, Colorado_                 http://uche.ogbuji.net/ndewo/
Founding editor, Kin Poetry Journal      http://wearekin.org
http://copia.ogbuji.net    http://www.linkedin.com/in/ucheogbuji    http://twitter.com/uogbuji


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member