[Home] [By Thread] [By Date] [Recent Entries]



One disadvantage of term-based weighting or vector space model is the
well-known example cited in the Google's original paper (rather sales
pitch??) --

A document with only the words "Bill Clinton [expletive deleted]"; as opposed to the
actual white house page was considered more important for the query "Bill
Clinton" (when Clinton was the president)

I believe we can use vector-space model only when the document collection 
is "homogeneous" in some manner.. and has repetitive words etc.

Also note -- vector space model, you have to obtain rank of documents in
real-time given a query.

For other metrics such as say pagerank, rank of documents can be 
pre-computed, and we can use better algorithms based on this property.

best, murali.

Site Map | Privacy Policy | Terms of Use | Trademarks
Free Stylus Studio XML Training:
W3C Member