On 29/10/18 21:04, ian.proudfoot@xxxxxxxxxxx wrote:
> Agreed Wendell and Graydon. I am already doing multiple passes to
> get the content in a suitable state to do the nesting part. I find
> that most word processed text is in a poor state for easy conversion
> to good XML that is valid to a specific schema.
Microsoft's excellent marketing has successfully persuaded this planet
that "looking pretty" is the same thing as "being right".
> When based simply on paragraph and character style names the end
> result is often unusable.
IFF the styles are applied rigorously and in conformance with a known
stylesheet, it is actually possible to get fairly good transformations
to (eg) JATS, DocBook, TEI, etc.
> So I use temporary attributes that encode the important stylistic
> overrides - capturing what the author was trying to achieve. I have
> been very pleased with the results.
I'm very intrigued by this: where do you get the author's intentions
from? Traces they leave in the markup (eg italics or bold)?
///Peter
|