[Home] [By Thread] [By Date] [Recent Entries]

  • From: "Stephen Green" <stephengreenubl@g...>
  • To: "Marcus Carr" <mcarr@a...>
  • Date: Tue, 18 Dec 2007 10:02:30 +0000

Sounds good. Thanks Marcus.

On 17/12/2007, Marcus Carr <mcarr@a...> wrote:
>
> Stephen Green wrote:
>
> > What methods are there, these days, for extracting structured data from
> > unstructured documents (such as PDF)?
>
> Maybe I'm missing something, but I didn't see anyone suggest saving the
> PDF as XML straight from Acrobat. If you have a full licence, it does a
> pretty respectable job, getting you paragraph and character tagging,
> tables and images. You can also batch process, converting entire
> directories or what have you. The results are at least as good as saving
> the PDF to something like Word first and you could be forgiven for
> expecting that they might even be better.
>
> Once you're that far, you can get on your XSLT boots...
>
>
> Marcus
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@l...
> subscribe: xml-dev-subscribe@l...
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>


-- 
Stephen Green

Partner
SystML, http://www.systml.co.uk
Tel: +44 (0) 117 9541606

http://www.biblegateway.com/passage/?search=matthew+22:37 .. and voice


[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]


Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member