Subject: RE: How to parse text into words, phrases, clauses, sentences, and paragraphs
From: mark bordelon <markcbordelon@xxxxxxxxx>
Date: Thu, 7 Jun 2007 07:04:41 -0700 (PDT)
|
--- Michael Kay <mike@xxxxxxxxxxxx> wrote:
> You don't really make it clear where you are having
> difficulty. There seem
> to be four separate problems here:
Mike, Thanks for helping me even break this down. THis
is definitely something I can and want to do myself.
Just need the initial hints.
> (a) translating your concepts, such as "words" and
> "sentences" into precise
> specifications
> (b) translating these specifications into regular
> expressions
Got these.
E.g. the specification for "word" could be [^ '-]*
>
> (c) using these regular expressions within a
> stylesheet, for example as an
> argument to the tokenize() function or the
> xsl:analyze-string instruction.
>
This is my first problem. How to apply a template
match ysing the tokenize() function. And which order
to apply (from paragraph -> word or word ->
paragraph).
> (d) doing the output numbering.
I haven't a clue how this would be done, either way.
>
> The fourth problem seems quite unrelated to the
> others. Of the other three,
> I'm reluctant to launch into answering without
> knowing which of the three
> steps you need help with. (Generally I think most
> people answering on this
> list adopt the approach of trying to help you solve
> your problem, rather
> than doing the work for you.)
After any initial hints, I would and could be able to
do the rest of the work myself.
>
> Incidentally, regular expressions are an XSLT 2.0
> feature so I assume you're
> looking for XSLT 2.0 solutions.
>
That is an issue. IS there any way to do this without
regular expressions?
> Michael Kay
> http://www.saxonica.com/
>
> > -----Original Message-----
> > From: mark bordelon
> [mailto:markcbordelon@xxxxxxxxx]
> > Sent: 06 June 2007 22:52
> > To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> > Subject: How to parse text into words,
> phrases,
> > clauses, sentences, and paragraphs
> >
> > Hey XML gurus,
> >
> > Still somewhat new to XML/XSL and need some help
> getting
> > started on how to use regular expressions and
> tokens in
> > English text to transform it into an XML document
> marked up for:
> >
> > 1.words (delimited by WS, excluding any external
> > 2.punctuation, but allowing internal punctuation)
> 3.phrases
> > (delimited by the comma) 4.clauses (delimited by
> colon or
> > semicolon) 5.sentences (delimited by the period,
> > question-mark, or exclamation mark) 6.paragraphs
> (delimited
> > by a line break)
> >
> > Also ideal would be to assign sequenced id's to
> every tag,
> > either in a running consecutive style from
> beginning to end,
> > or repeating from 1 for every level of nesting.
> >
> > In more concrete terms,
> >
> > To transfrom this text:
> >
> > THOU still unravish'd bride of quietness, Thou
> foster-child
> > of Silence and slow Time, Sylvan historian, who
> canst thus
> > express A flowery tale more sweetly than our
> rhyme:
> > What leaf-fringed legend haunts about thy shap Of
> deities or
> > mortals, or of both, In Tempe or the dales of
> Arcady?
> > What men or gods are these? What maidens loth?
> > What mad pursuit? What struggle to escape?
> > What pipes and timbrels? What wild ecstasy?
> >
> > into this XML: (using indexing that renumbers for
> each
> > sub-group)
> >
> > <para id=1>
> > <sent id=1>
> > <clause id=1>
> > <phrase id=1>THOU still unravish'd bride of
> quietness,</phrase>
> > <phrase id=2>Thou foster-child of Silence and
> slow Time,</phrase>
> > <phrase id=3>Sylvan historian,</phrase>
> > <phrase id=4> who canst thus express A flowery
> tale more
> > sweetly than our rhyme</phrase>:
> > </clause>
> > <clause id=2>
> > What leaf-fringed legend haunts about thy shape Of
> deities or
> > mortals,</phrase>
> > <phrase id=1> or of both,</phrase>
> > <phrase id=2> In Tempe or the dales of Arcady?
> > </clause>
> > </sent>
> > <sent id=2>What men or gods are these?</sent>
> <sent
> > id=3>What maidens loth?</sent> <sent id=4>What
> mad
> > pursuit?</sent> <sent id=5>What struggle to
> escape?</sent>
> > <sent id=6>What pipes and timbrels?</sent> <sent
> id=7>What
> > wild ecstasy?</sent> </para>
> >
> >
> > or into this XML: (using indexing that is
> continuous per tag)
> >
> > <para id=1>
> > <sent id=1>
> > <clause id=1>
> > <phrase id=1>THOU still unravish'd bride of
> quietness,</phrase>
> > <phrase id=2>Thou foster-child of Silence and
> slow Time,</phrase>
> > <phrase id=3>Sylvan historian,</phrase>
> > <phrase id=4> who canst thus express A flowery
> tale more
> > sweetly than our rhyme</phrase>:
> > </clause>
> > <clause id=2>
> > What leaf-fringed legend haunts about thy shape Of
> deities or
> > mortals,</phrase>
> > <phrase id=5> or of both,</phrase>
> > <phrase id=6> In Tempe or the dales of Arcady?
> > </clause>
> > </sent>
> > <sent id=2>What men or gods are these?</sent>
> <sent
> > id=3>What maidens loth?</sent> <sent id=4>What
> mad
> > pursuit?</sent> <sent id=5>What struggle to
> escape?</sent>
> > <sent id=6>What pipes and timbrels?</sent> <sent
> id=7>What
> > wild ecstasy?</sent> </para>
> >
> > Surely this has been done before. I have searched
> through
> > archives and have not found anything, probably
> since I am
> > searching using the wrong terminology.
> >
> > Would really appreciate the help as it would give
> me insight
> > into using regular expressions and sequencing in
> XSL.
> >
> > Thanks in advance
> >
> > Mark Bordelon
> >
> >
> >
> >
> >
>
______________________________________________________________
> > ______________________
> > Need Mail bonding?
> > Go to the Yahoo! Mail Q&A for great tips from
> Yahoo! Answers users.
> >
>
http://answers.yahoo.com/dir/?link=list&sid=396546091
|