[Home] [By Thread] [By Date] [Recent Entries]
On Sat, 01 Jan 2005 13:29:18 -0700, Uche Ogbuji <Uche.Ogbuji@f...> wrote: [about SAX/Java character events providing an offset into an array rather than a test/string object] > I know the original SAX idea was optimization, but I do think this is > exactly one of those areas where perhaps (IMO) premature optimization > ends up limiting design evolution, and I also think that it interferes > with the "Simple" part. That was a tough choice at the time. I think it was James Clark who suggested it -- he is justly famous for fast code, but as anyone who ever tried to work with SP (his C++ SGML parsing library) can attest, he's not famous for readable code. Here are the pros and cons with the benefit of six or so years of hindsight: Pro: Buffer copying is a killer for high-performance apps. SAX does not allow a parser to avoid *all* buffer copying -- it's still necessary to copy attribute values, for example, unless the parser happens to know that they're tokenized -- but otherwise, a SAX parser can provide direct offsets into its own buffer for character events and use internalized strings for Namespace URIs and element and attribute names, avoiding most thrashing around in the heap. It's worth noting that even today, when Java heap operations are much faster than they used to be, SAX-based parsers are still remarkably fast. In any case, without this speed advantage back in late 1990s, when people were still scared of Java (much less XML) because it was so slow, SAX may not have gained widespread acceptance in the commercial world. Who wants an API that makes your parser run even slower? Con: In most XML applications that actually do anything significant with the parse events, parsing overhead is a tiny fraction of total processing time, say 1% of the total. In other words, making the XML parser twice as fast might reduce processing time by 1/200. In any case, there's usually at least one round of buffer copying anyway, when the byte buffer (say, from an HTTP packet) gets converted to Unicode. I'm not sure what I would do, even if I were starting fresh. A good API should stay out of people's way, and SAX was always meant to be low-level. I had assumed that most developers would use fancy toolkits on top, like the original SAXON, which provided friendlier events, element stacks, etc.; instead, almost everyone went straight to the basic API. XML developers always seem to like to stay close to the metal. All the best, David -- http://www.megginson.com/
|

Cart



