-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 9.10.2014 16:16, Eliot Kimber ekimber@xxxxxxxxxxxx wrote:
> Can streaming help, either with overall processing efficiency or
> with memory usage?
Yes, the typical motivation for streaming is saving memory
consumption, in your case it's very unlikely that you can gain any
performance benefits.
> Where would I go today or in the near future to gain the
> understanding of streaming required to answer these questions
> (other than the XSLT 3 spec itself, obviously)?
There were several talks and papers presented in past years both at
XML Prague and Balisage conferences. For example:
https://www.youtube.com/watch?v=OeSQ4ompB1g&index=6&list=PLQpqh98e9RgXPGvJaNsE3b1Sqncz6MGvr
https://www.youtube.com/watch?v=kzGZvh-FbNw&list=PLQpqh98e9RgXPGvJaNsE3b1Sqncz6MGvr&index=7
If there is enough interested I can try to organize streaming workshop
or something like that as a part of XML Prague 2015 (http://xmlprague.cz)
> Because my data collection process is copying data to a new result,
> I'm pretty sure it's inherently streamable: I'm just processing
> documents in an order determined by a normal depth-first tree walk
> of the map structure (a hierarchy of hyperlinks to topics) and
> grabbing relevant data (e.g., division titles, figure titles, index
> entries, etc.). If this was all I was doing, then for sure
> streaming would help memory usage.
>
> But because I must then process each topic again to generate the
> final result, and that process is not directly streamable, would
> streaming the first phase help overall?
You can split your transformation into two steps -- first will be
streamable and second will not. Compared to the current situation you
will save around 50% memory.
> Taken a step further: are there implementation techniques I could
> apply in order to make the second phase streamable (e.g.,
> collecting the information needed to render cross references
> without having to fetch the target elements) and could I expect
> that to then provide enough performance improvement to justify the
> implementation cost?
You can do this. You can process "compiled grand-source document" in a
streaming mode and make lookups in smaller document with
cross-referencing data in a non-streaming mode.
> The current code is both mature and relatively naive in its
> implementation. Reworking it to be streamable could entail a
> significant refactoring (maybe, that's part of what I'm trying to
> determine).
>
> The actual data processing cost is more or less fixed, so unless
> streaming makes the XSLT operations faster, I wouldn't expect
> streaming by itself to reduce processing time.
It's very unlikely that streaming rewrite will make your code faster.
Of course lookups in a small cross-ref auxiliary file will be faster
than in a large document, but if you use keys today, it shouldn't be
very big difference.
> However, the primary concern in this use case is memory usage:
> currently, memory required is proportional to the number of topics
> in the publication, whereas it could be limited to simply the
> largest topic plus the size of the collected data itself (which is
> obviously much smaller than the size of the topics as it includes
> the minimum data needed to enable numbering and such).
I don't know how large is your documentation set, but I would be
surprised if it couldn't fit into memory (who would read it then? :-).
Streaming is generally useful when it's impossible to load documents
into memory -- which on current machines means processing gigabytes
large XML files.
Jirka
- --
- ------------------------------------------------------------------
Jirka Kosek e-mail: jirka@xxxxxxxx http://xmlguru.cz
- ------------------------------------------------------------------
Professional XML consulting and training services
DocBook customization, custom XSLT/XSL-FO document processing
- ------------------------------------------------------------------
OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 rep.
- ------------------------------------------------------------------
Bringing you XML Prague conference http://xmlprague.cz
- ------------------------------------------------------------------
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
iEYEARECAAYFAlQ2re4ACgkQzwmSw7n0dR6shwCffITFOIsRjAVeUE+XI4c6vHmt
UEAAn1ssKI6bxGb59UYqi67McfirpoL1
=a1hq
-----END PGP SIGNATURE-----
|