[Home] [By Thread] [By Date] [Recent Entries]
On Sun, 2015-03-29 at 11:17 -0700, Dimitre Novatchev wrote: > Has anyone thought about unifying all available XML documents into a > single repository? I fear even just for static documents you'd need more disk space than is easily avaiable :-) I know of people with petabytes of XML. If you include XML documents generated by car engine computers and Web services there's more XML in the world than HTML. > This would provide many benefits to XML practitioners, and in > general could be used as an "XML data-warehouse" and allow BI for > querying and acquiring interesting and unknown facts about XML. Someone I think in Amsterdam made a collection of a few Web documents, I think just 10 gigabytes or something; I looked at it in some detail and even did a Balisage paper about this, because there was talk going round that said that it showed a high proportion of XML on the Web wasn't well-formed. It turned out that if you handled the document encoding properly the documents were almost all just fine. But at any rate that collection was on the Web last time I checked. Liam > > Examples of such queries: > > 1. What is the maximum depth of any known XML document? > > 2. What is the maximum number of different element/attribute names > of any known XML document? > > 3. What is the maximum length of element/attribute names in any > known XML document? > > 4. What are all namespaces used and what are they in sorted order > by > frequency of being referenced? > > 5. What is the longest chain (length of chain) of XInclude > references? > > 6. What are all XPath expressions used in all available XSLT > modules > (XSLT is a kind of XML) -- and a variety of questions about the > complexity and syntax structure of these expressions. > > 7. Similar to the above, but for XSD > > etc, ..., etc. > > Among other benefits, such a repository would provide for real-world > XML test data, when writing tests for a new XML processing > application. > >
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



