Subject: Re: remove tags + CDATA tag out of big xml file
From: Michael Ludwig <milu71@xxxxxx>
Date: Fri, 29 Jan 2010 23:18:03 +0100
|
bw schrieb am 29.01.2010 um 12:02:10 (+0100):
> Hello,
>
> I have a big xml feed out of my content management system that
> includes wysiwyg html tags inside CDATA tags.
>
> I am looking for a way to remove the CDATA and only get the text.
> <content><![CDATA[
> <p>The <strong>keyword</strong> is nice to have but is not needed to
> include in a solr feed</p> ...
Looks like this feed is for Solr (an indexer), which won't do anything
useful with the markup anyway. Someone has defined <title> and <content>
as fields for the indexer but has forgotten to strip the markup from the
source. That source markup in CDATA has no purpose in a feed for Solr
and should not have been included in the first place.
--
Michael Ludwig
|