[Home] [By Thread] [By Date] [Recent Entries]
Murali Mani wrote: > One disadvantage of term-based weighting or vector space model is the > well-known example cited in the Google's original paper (rather sales > pitch??) -- > > A document with only the words "Bill Clinton [expletive deleted]"; as opposed to the > actual white house page was considered more important for the query "Bill > Clinton" (when Clinton was the president) > > I believe we can use vector-space model only when the document collection > is "homogeneous" in some manner.. and has repetitive words etc. Google is apparently looking at a noun clustering scheme. http://news.zdnet.com/2100-9588_22-5605127.html?tag=nl.e539 Norvig highlighted a research paper written by a Google employee last year regarding a classification engine the company is testing. The technology can parse a proper noun or compound nouns into several categories in order to deliver clustered results, for example. For a query on "ATM," or asynchronous transfer mode, the engine would be able to use the terms "such as" on Web pages indexed with the term to discover that it can be linked to the expression "high-speed networks." As a result, a search for high-speed networks might pull up a cluster on ATM.
|

Cart



