[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Liam R. E. Quin" <liam@w...>
  • To: Dimitre Novatchev <dnovatchev@g...>
  • Date: Sun, 29 Mar 2015 15:13:19 -0400

On Sun, 2015-03-29 at 11:17 -0700, Dimitre Novatchev wrote:
>  Has anyone thought about unifying all available XML documents into a
> single repository?

I fear even just for static documents you'd need more disk space than 
is easily avaiable :-) I know of people with petabytes of XML.

If you include XML documents generated by car engine computers and Web 
services there's more XML in the world than HTML.

> This would provide many benefits to XML practitioners, and in 
> general could be used as an "XML data-warehouse" and allow BI for 
> querying and acquiring interesting and unknown facts about XML.

Someone I think in Amsterdam made a collection of a few Web documents, 
I think just 10 gigabytes or something; I looked at it in some detail 
and even did a Balisage paper about this, because there was talk going 
round that said that it showed a high proportion of XML on the Web 
wasn't well-formed. It turned out that if you handled the document 
encoding properly the documents were almost all just fine. But at any 
rate that collection was on the Web last time I checked.

Liam

> 
> Examples of such queries:
> 
>  1. What is the maximum depth of any known XML document?
> 
>   2. What is the maximum number of different element/attribute names
> of any known XML document?
> 
>   3. What is the maximum length of element/attribute names in any
> known XML document?
> 
>   4. What are all namespaces used and what are they in sorted order 
> by
> frequency of being referenced?
> 
>   5. What is the longest chain (length of chain) of XInclude 
> references?
> 
>   6. What are all XPath expressions used in all available XSLT 
> modules
> (XSLT is a kind of XML) -- and a variety of questions about the 
> complexity and syntax structure of these expressions.
> 
>  7. Similar to the above, but for XSD
> 
> etc, ..., etc.
> 
> Among other benefits, such a repository would provide for real-world 
> XML test data, when writing tests for a new XML processing 
> application.
> 
> 


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member