[Home] [By Thread] [By Date] [Recent Entries]
Steven,
This is about as classic a case of overlap as one is likely to see. At 02:31 AM 6/8/2006, you wrote: I've got some XML that looks like this: Unless you can find a way to narrow down the range of your possible inputs (say, to avoid the kind of overlapping just shown), and even then, you are really going to find this tough going. The problem works directly at XML/XSLT's Achilles' heel, namely the notion that everything we need to work with fits nicely into the document tree. I'm not saying it's impossible to deal with ... rather, that this is an area of active research. If I didn't have to do this at scale, I might be inclined to start with tag-writing techniques -- which ordinarily I would stay very far away from, as they violate the spirit of XSLT, and usually make for nothing but trouble -- and brace myself for a fair amount of cleanup by hand or otherwise. If I did have to do this at scale (and maybe even if not), I would try very hard to specify more constraints on the input; then I'd use either tag-writing (quick, dirty and dangerous) or pipelining/grouping methods to handle the range of pseudo-tag pairs I was prepared to accept. I might use Schematron or a similar analytic validation strategy to help enforce those constraints. For example, in this case it might be possible to flatten the hierarchy first, perhaps calculating offsets to determine where ranges were co-terminous, then use grouping methods to restore the hierarchy, only with the extra information embedded. In your examples, it might be possible to do something considerably less than this -- though I do wonder why one of your implicit ranges gets marked on the <li> element as <li platform="api">...</li>, while another comes out on a <ph> element as <ph platform="ot">...</ph> -- but you haven't suggested to us what you want to happen with a case such as <ul><?Fm Condstart API_Only?>
<li>defined in your enterprise WSDL file</li>
<li><?Fm Condstart OT_Only?>
available in the EntityNames[] array
<?Fm Condend API_Only?> in the Session3 object
<?Fm Condend OT_Only?></li>
<li>in your organization configuration</li>
<li>valid with your security access ....Notice in this case the ranges "actually" overlap, as there's text content that belongs to both the "API_Only" and "OT_Only" ranges ... will this never happen? (If not, maybe your problem can be simplified.) There's a fair amount of literature on the general topic of overlapping structures in markup, and several different approaches to dealing with it, but none so mature that anything like an off-the-shelf solution is readily available. Given the right search terms, Google might point you to http://mulberrytech.com/Extreme/Proceedings/html/2004/Piez01/EML2004Piez01.html or any of a number of other papers that have been written on this topic. Good luck, Wendell
|

Cart



