Subject: Re: Merging lines of 3 words or less
From: James Cummings <cummings.james@xxxxxxxxx>
Date: Thu, 8 Sep 2005 11:46:45 +0100
|
On 9/8/05, David Carlisle <davidc@xxxxxxxxx> wrote:
>
> > Is there any objective distinction between the pseudo-lines and real
> > lines? Maybe a test on interpunction or Capitalised words could
> > improve the xslt's guesses?
>
> or of course since you know in advance what those 20 are, you could test
> for their id in the stylesheet and not merge those cases (saves fixing
> by hand each time you run the stylesheet)
The estiamted amount of corrections is based on manually looking
through a smaller sample so not accurate and I don't know the actual
20. However, after I have manually corrected those that I can see,
the person who is interested in the content has promised to
read-through and alert me to any which shouldn't have been merged. :-)
-James
--
James Cummings, Cummings dot James at GMail dot com
|