- From: Michael Sokolov <sokolov@i...>
- To: Len Bullard <cbullard@h...>
- Date: Thu, 05 Jan 2012 21:21:47 -0500
Good question!
It depends on the content.
Most human-readable texts are broken down into conceptual units
already; articles, sections, chapters, entries, etc. We try to pick
one that will at least fill the screen with text, and then impose
maximum size constraints based on the delivery channel's capacity.
It's not just viewing though that informs the choice; search often
figures into it as well. Ideally search results are 1-1 with
viewable chunks; this leads to a natural, easily-grasped interface,
and makes search implementation straightforward.
Sometimes texts (like novels) don't have natural breaks; in these
cases search is less important, reading more so, and we just
paginate according to the user's viewport size.
Other texts impose their own specific chunking requirements
(enormous court documents; dictionaries where you can search
entries, senses (within an entry) or quotations (within a sense))
that fight against the simple rules. In these cases we try to recast
the problem in more familiar terms, sometimes chunking at multiple
levels at once for search, but displaying using anchors or
pagination within a larger chunk.
Machine to machine I think is informed by a different set of
considerations: transaction boundaries, channel capacity again,
ability to rollback and retry, etc. Basically a compromise between
performance (large messages will tend to be more performant, up to
memory limits), and robustness (small messages make a smaller crater
when they fail).
As far as human-machine, it does also depend to a certain extent on
the software. Word can handle much larger documents than in-browser
editors, and features like autosave can mitigate the failure to save
a large document, but generally speaking I'd say chunk size here is
similar to the human-human piece. I do sometimes end up poking
around in 50MB xml documents in emacs, sometimes even changing
something, and it works fine, but I don't think that's a typical use
case? I find that 100MB is pretty much the limit for that sort of
thing.
-Mike
On 1/5/2012 7:14 PM, Len Bullard wrote:
8044FBBA608F4BAEACD54B9453165FD9@LenBullardPro"
type="cite">
When
building XML systems, how do you
choose the best granularity for storing and retrieving
fragments?
Machine
to machine
Human
to machine
Human
to human
Part
of the art is interpreting what
branch and leaf combinations best give a role/user the
most copacetic view.
How do you choose? Does the user choose?
The
proportion of XML consumed and
emitted by machines or humans is not interesting,IME.
The cost and
type of the value-add of the humans consuming and
emitting XML is. In
documents, this is obvious. Granularity.
len
|
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
|