Re: Parsing efficiency? - why not 'compile'????

Cart

XML Editor - Download a Free Trial >

See What's New >

Buy Now >

[Home] [By Thread] [By Date] [Recent Entries]

To: Elliotte Rusty Harold <elharo@m...>
Subject: Re: Parsing efficiency? - why not 'compile'????
From: Dennis Sosnoski <dms@s...>
Date: Thu, 27 Feb 2003 11:52:55 -0800
Cc: xml-dev@l...
In-reply-to: <p0433010aba83dd0e8813@[192.168.254.4]>
References: <OF047F14BF.0A3C2919-ONCA256CD8.00031C17@f...> <20030227073457.2C45F5542@c...> <p04330106ba83be615336@[192.168.254.4]> <E18oPCa-0001s1-00@c...> <p0433010aba83dd0e8813@[192.168.254.4]>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20021130

Elliotte Rusty Harold wrote:

> I expect any plausible binary compression scheme to be lossless with 
> respect to the infoset, not the PSVI mind you but the I. I don't 
> expect to lose any significant data just because:
>
> 1. The data is invalid
> 2. I happen to use a different schema for decoding than you used for 
> encoding
>
> If the binary compression fails these tests, I cry shenanigans on you. 
> :-) 

For an example of encoding XML documents without loss of data you can 
see my old XMLS project at 
http://www.sosnoski.com/opensrc/xmls/index.html This is designed for 
serialization/deserialization speed rather than maximum compression. 
Even so, it reduced sizes by about 40% overall for the set of documents 
I used in testing. It also ran several times faster than text for going 
to and from dom4j and JDOM document models. I didn't actually compare 
parsing speed directly (this was originally intended as an alternative 
to Java serialization for moving document models over the wire, not as a 
general-purpose XML transport), but I'd suspect it's at least twice as 
fast as any parser. In answer to your earlier email about actual 
results, the page at http://www.sosnoski.com/opensrc/xmls/results.html 
gives full benchmark information.

I've thought about extending this to full Infoset compatibility, and 
while I'm at it there are still a few optimizations I can make for 
faster handling of character data content. Don't know when/if I'll ever 
get back to it as things sit right now, but if anyone is interested let 
me know.

  - Dennis

References:
- Parsing efficiency? - why not 'compile'????
  - From: Matthew.Bennett@f...
- Re: Parsing efficiency? - why not 'compile'????
  - From: Alaric Snell <alaric@a...>
- Re: Parsing efficiency? - why not 'compile'????
  - From: Elliotte Rusty Harold <elharo@m...>
- Re: Parsing efficiency? - why not 'compile'????
  - From: "Alaric B. Snell" <alaric@a...>
- Re: Parsing efficiency? - why not 'compile'????
  - From: Elliotte Rusty Harold <elharo@m...>

Prev by Date: RE: Registered Namespace prefixes
Next by Date: Re: Registered Namespace prefixes
Previous by thread: Re: Parsing efficiency? - why not 'compile'????
Next by thread: Re: Parsing efficiency? - why not 'compile'????
Index(es):
- Date
- Thread

XML Editor - Download a 15 Day Free Trial Now >

See What's New in Stylus Studio >

Buy Stylus Studio - XML Editor - Now >