Subject: RE: Identifying place names in text...
From: "Michael Kay" <mike@xxxxxxxxxxxx>
Date: Thu, 21 Jul 2005 17:38:07 +0100
|
This isn't difficult, no need to contemplate doing it in Java. You can
tokenize the text using the tokenize() function in XSLT 2.0, or the
str:tokenize() function/template in EXSLT (www.exslt.org). Then look up each
token in your list of place names, using a key for efficiency.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Karl Koch [mailto:TheRanger@xxxxxxx]
> Sent: 21 July 2005 14:56
> To: Mulberry list
> Subject: Identifying place names in text...
>
> Hello group,
>
> I would like to find a way of automatically identifying
> references to places
> in XML text. The thing is that I have a very large set of
> content. In this
> content there are sometimes references to particular places,
> which I want to
> know about.
>
> This is my xml structure (made up for simplification):
>
> <bookshelf:
> <book>
> <title>1000 years of London's history</title>
> ...
> </book>
> <book>
> <title>1984</title>
> ...
> </book>
> </bookshelf>
>
> Can I use XSLT to search for place names in the title of all
> the books? I
> would like to use a wordlist of geographical place names
> (which I already
> have). This would contain coutry and city names. The
> stylesheet would match
> occurances of these words in the <title> XML element. The
> output here would
> be a list of all books which have references about locations
> in the title.
> In this example, the result would only be the first book,
> because it has
> "London" in th title.
>
> Perhaps this is the point where XSLT is getting too
> complicated and I should
> consider Java as a solution. However, I am continuously
> impressed by the
> power of XSLT and therefore I ask here because I think there
> might be even a
> solution for that problem using XSLT.
>
> A note on the side: The output of this stylesheet would be a
> helper and an
> additional control for a mainly handcrafted process. I could
> discover books
> which I have overseen in the manual process.
>
> Any help would be greatly appreciated.
>
> Kind Regards,
> Karl
>
> --
> 5 GB Mailbox, 50 FreeSMS http://www.gmx.net/de/go/promail
> +++ GMX - die erste Adresse fo?=r Mail, Message, More +++
|