[Home] [By Thread] [By Date] [Recent Entries]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Piotr Bañski writes:
> My question is: how 'exotic' (or plainly thoughtless) are we in relying
> on the development of, and support for, XPointer technology for the
> description of linguistic resources?
Relying on the stability of the semantics of XPointer makes perfectly
good sense -- if it enables you to describe your data in useful ways,
via a dependence on XPointer semantics, go for it.
As for expecting implementations of . . . well, what, exactly? A
version of XInclude which supports full XPointer [1]? Not likely, as
I'm not aware of any impetus towards extending the XInclude
specification in that way.
With my NLP hat on I'm a big fan of standoff markup, and we use it
extensively in our work at Edinburgh [2]. But our experience has been
that it's virtually always worth building up from a version of the
data which is tokenised at the lowest level we're ever likely to care
about, rather than trying to point into runs of text. . .
If that's ruled out because of the sheer scale of your data, or
copyright restrictions, or whatever, then you can achieve the same
effect by targetting your standoff at a virtual document, which is the
_output_ of a tokeniser on your raw data.
ht
[1] http://www.w3.org/TR/WD-xptr
[2] http://www.ltg.ed.ac.uk/~ht/sgmleu97.html
- --
Henry S. Thompson, School of Informatics, University of Edinburgh
Half-time member of W3C Team
10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 651-1426, e-mail: ht@i...
URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
iD8DBQFKcCztkjnJixAXWBoRAtPeAJ0W9aInf8b+N/IccOZP80CkRAe6uACeOyWd
OA3cWGxrd/Spys0R3IXPwH8=
=QaFl
-----END PGP SIGNATURE-----
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] |

Cart



