[Home] [By Thread] [By Date] [Recent Entries]
I have written a natural language modelling tool which marks up (inserts XML tags into) natural language documents already in XML. I have come across an issue with this tool: some users and documents have an expectation that <i><b></b></i> and <b><i></i></b> (and similar classes of constructs) are equivalent, whereas my tool sees these are completely distinct. From looking at at the standards, is appears that HTML, XHTML and XML are all silent on the semantics of situations such as this. Are there any systems or toolkits which have already been written to help systematise documents and corpora into a single, consistent representation? cheers stuart -- Stuart Yeates stuart.yeates@c... OSS Watch http://www.oss-watch.ac.uk/ Oxford Text Archive http://ota.ahds.ac.uk/ Humbul Humanities Hub http://www.humbul.ac.uk/
|

Cart



