Re: [xsl] use xsl to generate statistics of collection of XM

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

Subject: Re: use xsl to generate statistics of collection of XML documents.
From: Wendell Piez <wapiez@xxxxxxxxxxxxxxxx>
Date: Mon, 19 Oct 2009 13:20:18 -0400

John,

At 11:54 AM 10/19/2009, you wrote:

I am wondering if someone has written some XSL to generate statistics
of a collection of XML documents.  Thus it would provide per node
statistics (usage), and node relationships statistics (order/ nesting).

Sure, I've done this and so, I imagine, have others on this list.

  My goal would be to generate new sample XML documents from
the statistics.    This would be similar to generating XML documents
from probabilistic production rules--but the generated documents
should pass either a DTD or Schema validator.  I do realize that there
are semantics that need to be accounted for.  That would be a future
goal.  I've tried generating sample documents from a schema using
XMLSpy--does it have some way of recording probability into the schema?

This is a tougher nut to crack, but I don't see why it couldn't be done. A stylesheet could process a report generated in XML to generate arbitrary samples. (The archives of this list would provide help with the randomizing aspect.) You would perhaps want to make sure in doing so that you limited, for example, how deeply a result document would nest, assuming you had content models allowing for recursion.

I don't know of any commercial tools that do this, quite.

If you work on this, consider also the stylesheet that would make a "pathological" variant document -- one that had at least one example of every attested construct from a set of documents. Such a tool would be very useful.

Cheers,
Wendell

======================================================================
Wendell Piez                            mailto:wapiez@xxxxxxxxxxxxxxxx
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

Current Thread
use xsl to generate statistics of collection of XML documents. John Carlson - 19 Oct 2009 15:55:28 -0000 Wendell Piez - 19 Oct 2009 17:21:22 -0000 <=

<- Previous	Index	Next ->
use xsl to generate statistic, John Carlson	Thread	server side xalan / struts ha, bryan rasmussen
use xsl to generate statistic, John Carlson	Date	Re: Grouping by attribute, Jostein Austvik Jaco
	Month

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >