Subject: remove tags + CDATA tag out of big xml file
From: bw <bwakkie@xxxxxxxxx>
Date: Fri, 29 Jan 2010 12:02:10 +0100
|
Hello,
I have a big xml feed out of my content management system that
includes wysiwyg html tags inside CDATA tags.
I am looking for a way to remove the CDATA and only get the text.
CURRENT:
<add>
<doc>
<some_title>My title</some_title>
<content><![CDATA[
<p>The <strong>keyword</strong> is nice to have but is not needed to
include in a solr feed</p><p><table cellspacing="2" cellpadding="2"
border="1" width="100%"><tbody><tr><td>Étape 1 :</td></tr>
]]></content>
</doc>
<doc>
....
</doc>
</add>
WANTED:
<add>
<doc>
<some_title>My title</some_title>
<content>The keyword is nice to have but is not needed to
include in a solr feed</content>
</doc>
<doc>
....
</doc>
</add>
Cheers
--
[Bb](astia{2}n)?\s?[Ww](ak{2}ie)?$
|