TY - GEN
T1 - A cluster-based resampling method for pseudo-relevance feedback
AU - Lee, Kyung Soon
AU - Croft, W. Bruce
AU - Allan, James
PY - 2008
Y1 - 2008
N2 - Typical pseudo-relevance feedback methods assume the topretrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a clusterbased resampling method to select better pseudo-relevant documents based on the relevance model. The main idea is to use document clusters to find dominant documents for the initial retrieval set, and to repeatedly feed the documents to emphasize the core topics of a query. Experimental results on large-scale web TREC collections show significant improvements over the relevance model. For justification of the resampling approach, we examine relevance density of feedback documents. A higher relevance density will result in greater retrieval accuracy, ultimately approaching true relevance feedback. The resampling approach shows higher relevance density than the baseline relevance model on all collections, resulting in better retrieval accuracy in pseudo-relevance feedback. This result indicates that the proposed method is effective for pseudo-relevance feedback.
AB - Typical pseudo-relevance feedback methods assume the topretrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a clusterbased resampling method to select better pseudo-relevant documents based on the relevance model. The main idea is to use document clusters to find dominant documents for the initial retrieval set, and to repeatedly feed the documents to emphasize the core topics of a query. Experimental results on large-scale web TREC collections show significant improvements over the relevance model. For justification of the resampling approach, we examine relevance density of feedback documents. A higher relevance density will result in greater retrieval accuracy, ultimately approaching true relevance feedback. The resampling approach shows higher relevance density than the baseline relevance model on all collections, resulting in better retrieval accuracy in pseudo-relevance feedback. This result indicates that the proposed method is effective for pseudo-relevance feedback.
KW - a cluster-based resampling
KW - Dominant documents
KW - Information retrieval
KW - Pseudo-relevance feedback
KW - Query expansion
UR - https://www.scopus.com/pages/publications/57349180460
U2 - 10.1145/1390334.1390376
DO - 10.1145/1390334.1390376
M3 - Conference paper
AN - SCOPUS:57349180460
SN - 9781605581644
T3 - ACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings
SP - 235
EP - 242
BT - ACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings
T2 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008
Y2 - 20 July 2008 through 24 July 2008
ER -