[Home] [By Thread] [By Date] [Recent Entries]
I wouldn't even attempt to write any code based on this as the
specification. For this to work at all well, you're going to need to
iteratively adapt the solution to handle all the names in your dataset,
or at least a sample of a couple of thousand of them. There's just too
much variation in the names you might encounter. Are "Jr" and "Sr"
really the only suffixes, and are they always spelt this way, or do you
also get "III" and "Jnr" and "Jnr."?
If I'm wrong, and the names are all regular and in the pattern you describe, then I think you can just tokenize on whitespace and do something like suffix := $tokens[last()][. = ('Jr', 'Sr')] stem := if ($suffix) then remove($tokens, count($tokens)) else $tokens value-of select="concat($stem[last()], ',']), remove($stem, count($stem), if ($suffix) then concat('(', $suffix, ')') else '')" Michael Kay Saxonica On 05/11/2012 23:45, Mark wrote: This must have been done many times, so can some one show me where to find the answer?
|

Cart



