Subject: Re: Generic stylesheet to flatten XML hierarchy
From: Sara Mitchell <samitchell6@xxxxxxxxx>
Date: Mon, 7 Dec 2009 10:49:01 -0800 (PST)
|
I know that this may not work in every case. Basically the rules are:
*
every attribute on an element becomes a column in a row
* every element that
has data content becomes a column in a row
* repeating elements define a row
-- with the further restriction that if there are hierarchical levels of
repeating elements (nested), the final lowest level of repeating elements
defines a row and ancestor levels get repeated
* hierarchical relationships
get flattened
* siblings at any level that don't repeat get repeated in each
row
I'm going to try one last possible solution using keys and XPath, I
think, and if that does not work I may move on to Michael Kay's suggestion of
a meta-stylesheet.
Thanks to everyone for the ideas.
--- On Fri, 12/4/09,
C. M. Sperberg-McQueen <cmsmcq@xxxxxxxxxxxxxxxxx> wrote:
> From: C. M.
Sperberg-McQueen <cmsmcq@xxxxxxxxxxxxxxxxx>
> Subject: Re: Generic
stylesheet to flatten XML hierarchy
> To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@xxxxxxxxxxxxxxxxx>
> Date: Friday,
December 4, 2009, 6:35 PM
> On 4 Dec 2009, at 12:37 , Sara
> Mitchell wrote:
>
> > ...
> >
> > With input like this:
> > <rss ...some attributes>
> > ...
> > </rss>
> >
> > I would like XML output like this:
> >
> > <root>
> >
<row>
> > <rss-attr1>value</rss-attr1>
> > ...
> > </row>
> > <row>...again
rss attributes, channel
> attributes, non-repeating children of channel
followed by
> fields for second item </row>
> > ...more rows ...
> > </root>
>
> I'm having trouble seeing exactly what should be going on
> here,
> because
I can't see anything in your sample input (elided
> here
> without loss of
generality) that gives rise to the name
> 'rss-attr1'. It's hard to correlate
input with output
> if
> all the values are spelled 'value' and some details
in one
> half of the input / output pair correspond to ellipses in
> the
>
other.
>
>
>
> >
> > This example is for a single level of repeating
>
descendants, but my solution has to be able to handle any
> level of repeating
descendants. More over, the stylesheet
> has no knowledge of the structure of
the input document.
>
> My very strong gut reaction here is to suspect that
such
> an
> absolutely generic transformation is unlikely to produce
> helpful
> (or: meaningful) output in some unknown but possibly large
> percentage of
cases.
>
> Perhaps the transformation you have in mind is intended to
> work
generically on all XML documents that follow certain
> conventions in
structuring the information they represent?
> Can you say what those
conventions are?
>
> Perhaps you have a very clear understanding of the
>
transform you
> want, but so far this discussion has not elicited a clear
>
description from you. The following questions are
> intended to
> try to
elicit some more clarity.
>
> In a generic XML document, there are elements
with
> parents,
> left and right siblings, children, descendants, and
>
attributes.
>
> In a generic table, there are rows and columns. Each
> row
but
> the first or last has a predecessor and a successor, and
> ditto
> each
column but the first or last.
>
> What is the relationship between the
elements, attributes,
> containment and sibling relations in the input, and
the
> rows and columns and their sequence relations in the
> output?
>
>
Given your output table, should I expect to have all the
> information present
in the XML? Can I recreate the
> XML from
> your table?
>
> Do all your rows
have the same number of columns? (I
> suppose
> they must, or it's not much
of a table, but perhaps I'd
> better check?)
>
> When does an XML document
give rise to a single row in the
> output
> table? When does it give rise to
exactly three
> rows? When
> does the resulting table have exactly one
column?
>
> What information do the labels of columns convey?
>
> What
tables would you want to produce for the documents
>
> (1) <e/>
> (2) <e><e
n="23"/><e
> n="45">Pax</e></e>
> (3) <table>
> <row a="1" b="2"
>
c="34">998</row>
> <row a="2" b="22"
> c="34">999</row>
> <row a="3"
b="2"
> c="3">1000</row>
> <row a="4" b="24"
> c="">1001</row>
> <row
a="5" x="Viva Villa!"
> c="34">998</row>
> </table>
> (4) <p>This isn't
mixed content, because the schema
> says I'm a string.</p>
>
> ?
>
>
> >
>
> I have a solution that works ok by traversing the
> input document in doc
order -- but it does not handle the
> siblings of repeating nodes that are not
themselves
> repeating.
> >
> > I have thought of doing this the opposite
way, get a
> key of all repeating nodes and process only those at the
> lowest
depth to generate rows. I haven't actually
> written the logic.
>
> I gather
that the tables you want to generate have
> something
> to do with multiple
occurrences of elements with the same
> name.
> Does adjacency matter, or
would
>
>
> <a><b/><b/><b/><c/><c/><c/></a>
>
> be treated differently from
>
>
> <a><b/><c/><b/><c/><b/><c/></a>
>
> ? (Assume if you like, for
purposes of discussion,
> that the b and c
> and a elements all have
interesting attributes.)
>
> >
> > Any better ideas would be welcome.
>
>
Your example reminds me of the contortions I've seen
> people
> go to trying
to represent structured information in RFC
> 822
> attribute-value pairs. So
the best idea I have at the
> moment
> is: Save yourself! Don't do it!
>
>
But probably you know exactly what you're doing, there is a
> perfectly
>
reasonable algorithm for what you want, and I just haven't
> understood.
>
>
hth
>
> --****************************************************************
>
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
> *
http://www.blackmesatech.com
> * http://cmsmcq.com/mib
> * http://balisage.net
> ****************************************************************
>
>
>
>
>
> --~------------------------------------------------------------------
>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
> To
unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
> or e-mail:
<mailto:xsl-list-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx>
> --~--
|