Skip to main navigation Skip to search Skip to main content

A cluster-based resampling method for pseudo-relevance feedback

    • University of Massachusetts

    Research output: Contribution to conferenceConference paperpeer-review

    Abstract

    Typical pseudo-relevance feedback methods assume the topretrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a clusterbased resampling method to select better pseudo-relevant documents based on the relevance model. The main idea is to use document clusters to find dominant documents for the initial retrieval set, and to repeatedly feed the documents to emphasize the core topics of a query. Experimental results on large-scale web TREC collections show significant improvements over the relevance model. For justification of the resampling approach, we examine relevance density of feedback documents. A higher relevance density will result in greater retrieval accuracy, ultimately approaching true relevance feedback. The resampling approach shows higher relevance density than the baseline relevance model on all collections, resulting in better retrieval accuracy in pseudo-relevance feedback. This result indicates that the proposed method is effective for pseudo-relevance feedback.

    Original languageEnglish
    Title of host publicationACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings
    Pages235-242
    Number of pages8
    DOIs
    StatePublished - 2008
    Event31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008 - Singapore, Singapore
    Duration: 2008.07.202008.07.24

    Publication series

    NameACM SIGIR 2008 - 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Proceedings

    Conference

    Conference31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM SIGIR 2008
    Country/TerritorySingapore
    CitySingapore
    Period08.07.2008.07.24

    Keywords

    • a cluster-based resampling
    • Dominant documents
    • Information retrieval
    • Pseudo-relevance feedback
    • Query expansion

    Quacquarelli Symonds(QS) Subject Topics

    • Computer Science & Information Systems

    Fingerprint

    Dive into the research topics of 'A cluster-based resampling method for pseudo-relevance feedback'. Together they form a unique fingerprint.

    Cite this