[Home] [By Thread] [By Date] [Recent Entries]


Bob Foster wrote:

> Recent criticisms of some Eclipse-based XML editors (including mine) (in 
> part) because they use a lot of memory relative to file size underline 
> the fairly obvious fact that XML files are often much larger than 
> programming language files. When the techniques used successfully for 
> programming languages are applied to XML, they can break down.
> 
> The first person I ever saw address this issue directly was Bryan Ford, 
> in his packrat parsing paper 
> (http://www.brynosaurus.com/pub/lang/packrat-icfp02.pdf). Packrat 
> parsing requires an O(n), where n is the document size, data structure 
> with a rather large constant factor. Ford observes "For example, for 
> parsing XML streams, which have a fairly simple structure but often 
> encode large amounts of relatively flat, machine-generated data, the 
> power and flexibility of packrat parsing is not needed and its storage 
> cost would not be justified."
> 

This has reminded me of the Judy array, which can function as a list or 
a dictionary, and is supposed to have very good performance and low 
memory load whether sparsely or densely populated.  This is achieved by 
havin a lrage nuber of specialized substructure types, put into play 
depending on the local data characteristics.  Quoting from an 
introduction to Judy arrays -

"Judy adapts efficiently to a wide range of populations and data set 
densities. Since the Judy data structure is a tree of trees, each 
sub-tree is a static expanse that is optimized to match the "character" 
or density of the keys it contains. To support this flexibility, in 
32[64]-bit Judy there are approximately 25[85] major data structures and 
a similar number of minor structures."

Perhaps a really top XML editor needs a similarly large number of 
specialized tree structures (glad it's not me writing them!).

Cheers,

Tom P

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member