Incorporating topical support documents into a small training set in text categorization

    Research output: Contribution to conferenceConference paperpeer-review

    Abstract

    This paper explores the incorporation of topical support documents into a training set as a means of compensating for a shortage of positive training data in text categorization. To support topical representation, our method applies a simple transformation to documents, i.e., making new documents from existing positive documents by squaring a conventional term weight. The topical support documents thus created not only are expected to preserve the topic, but even improve the topical representation by emphasizing terms with higher weights. Experiments with support vector machines showed the effectiveness on RCV1 collection with a small number of positive training data. Our topical support representation achieved 52.01% and 8.83% improvements for 33 and 56 categories of RCV1 Topic in micro-averaged F1 with less than 100 and 300 positive documents in learning, respectively. Result analyses based on robustness indicate that topical support documents contribute to a steady and stable improvement.

    Original languageEnglish
    Title of host publicationProceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM'08
    Pages1511-1512
    Number of pages2
    DOIs
    StatePublished - 2008
    Event17th ACM Conference on Information and Knowledge Management, CIKM'08 - Napa Valley, CA, United States
    Duration: 2008.10.262008.10.30

    Publication series

    NameInternational Conference on Information and Knowledge Management, Proceedings

    Conference

    Conference17th ACM Conference on Information and Knowledge Management, CIKM'08
    Country/TerritoryUnited States
    CityNapa Valley, CA
    Period08.10.2608.10.30

    Keywords

    • Support vector machine
    • Text categorization
    • Topical support representation
    • Transformation

    Quacquarelli Symonds(QS) Subject Topics

    • Business & Management Studies
    • Data Science

    Fingerprint

    Dive into the research topics of 'Incorporating topical support documents into a small training set in text categorization'. Together they form a unique fingerprint.

    Cite this