[Home] [By Thread] [By Date] [Recent Entries]


Elliotte Rusty Harold wrote:

> I expect any plausible binary compression scheme to be lossless with 
> respect to the infoset, not the PSVI mind you but the I. I don't 
> expect to lose any significant data just because:
>
> 1. The data is invalid
> 2. I happen to use a different schema for decoding than you used for 
> encoding
>
> If the binary compression fails these tests, I cry shenanigans on you. 
> :-) 

For an example of encoding XML documents without loss of data you can 
see my old XMLS project at 
http://www.sosnoski.com/opensrc/xmls/index.html This is designed for 
serialization/deserialization speed rather than maximum compression. 
Even so, it reduced sizes by about 40% overall for the set of documents 
I used in testing. It also ran several times faster than text for going 
to and from dom4j and JDOM document models. I didn't actually compare 
parsing speed directly (this was originally intended as an alternative 
to Java serialization for moving document models over the wire, not as a 
general-purpose XML transport), but I'd suspect it's at least twice as 
fast as any parser. In answer to your earlier email about actual 
results, the page at http://www.sosnoski.com/opensrc/xmls/results.html 
gives full benchmark information.

I've thought about extending this to full Infoset compatibility, and 
while I'm at it there are still a few optimizations I can make for 
faster handling of character data content. Don't know when/if I'll ever 
get back to it as things sit right now, but if anyone is interested let 
me know.

  - Dennis


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member