[Home] [By Thread] [By Date] [Recent Entries]
On Mar 23, 2006, at 7:41 PM, Rick Jelliffe wrote: > Michael Kay wrote: > >>> My expectation is that XML parsing can be significantly sped up >>> with ... >>> >> >> I think that UTF-8 decoding is often the bottleneck and the >> obvious way to >> speed that up is to write the whole thing in assembler. I suspect >> the only >> way of getting a significant improvement (i.e. more than a >> doubling) in >> parser speed is to get closer to the hardware. I'm surprised no- >> one has done >> it. Perhaps no-one knows how to write assembler any more (or >> perhaps, like >> me, they just don't enjoy it). >> > Yes.* The technique using C++ intrinsics (which is assembler in > disguise) I gave in my blog (URL in previous post) gives a *four to > five* times speed increase compared to fairly tight C++ code, for > the libxml utf-8 to UTF-16 transcoder, for ASCII valued data. In bnux binary XML, UTF-8 transcoding to Java strings typically accounts for about 20-50% of parsing at overall throughput of 50 - 400 MB/s [1]. This is even though the conversion routines are highly optimized, taking full advantage of pure or partial ASCII valued data, similar in spirit to the technique your blog mentions (except that it's in Java). I do have some hope that future VMs with better dynamic optimization logic for memory prefetching, bulk operations, etc. could make more of a difference here, though. Care to explain why a dynamic optimizer couldn't get close to what those handcoded assembler routines do, in particular considering modern memory latencies? On the standard textual XML front: As has been noted, Xerces and woodstox can be made to run quite fast, but in practise, few people know how do configure them accordingly, and to do so reliably, and without conformance compromises. Overall, configuring textual XML toolkits to reach high levels of performance often requires substantial time and expertise. For example, to minimize startup and initialization time for each test, the same parser/serializer object instance should be reused (pooled), at least in the presence of many small messages. In our experience, most text XML models perform poorly out-of-the-box. Thus, we expect most real world applications to perform significantly worse than shown in our experiments, perhaps dramatically so. Real observed performance is not only a function of capability, but also of accessability. Most users would be better off if XML parsers would perform well out-of-the-box, or be self-tuning. Most users can't afford to study the complex reliability vs. performance interactions of myriads of more or less static tuning knobs. [1] http://www.gridforum.org/GGF15/presentations/wsPerform_hoschek.pdf Wolfgang.
|

Cart



