[Home] [By Thread] [By Date] [Recent Entries]
One disadvantage of term-based weighting or vector space model is the well-known example cited in the Google's original paper (rather sales pitch??) -- A document with only the words "Bill Clinton [expletive deleted]"; as opposed to the actual white house page was considered more important for the query "Bill Clinton" (when Clinton was the president) I believe we can use vector-space model only when the document collection is "homogeneous" in some manner.. and has repetitive words etc. Also note -- vector space model, you have to obtain rank of documents in real-time given a query. For other metrics such as say pagerank, rank of documents can be pre-computed, and we can use better algorithms based on this property. best, murali.
|

Cart



