[Home] [By Thread] [By Date] [Recent Entries]
John,
At 11:54 AM 10/19/2009, you wrote: I am wondering if someone has written some XSL to generate statistics of a collection of XML documents. Thus it would provide per node statistics (usage), and node relationships statistics (order/ nesting). Sure, I've done this and so, I imagine, have others on this list. My goal would be to generate new sample XML documents from the statistics. This would be similar to generating XML documents from probabilistic production rules--but the generated documents should pass either a DTD or Schema validator. I do realize that there are semantics that need to be accounted for. That would be a future goal. I've tried generating sample documents from a schema using XMLSpy--does it have some way of recording probability into the schema? This is a tougher nut to crack, but I don't see why it couldn't be done. A stylesheet could process a report generated in XML to generate arbitrary samples. (The archives of this list would provide help with the randomizing aspect.) You would perhaps want to make sure in doing so that you limited, for example, how deeply a result document would nest, assuming you had content models allowing for recursion. I don't know of any commercial tools that do this, quite. If you work on this, consider also the stylesheet that would make a "pathological" variant document -- one that had at least one example of every attested construct from a set of documents. Such a tool would be very useful. Cheers, Wendell ====================================================================== Wendell Piez mailto:wapiez@xxxxxxxxxxxxxxxx Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================
|

Cart



