TY - GEN
T1 - Incorporating topical support documents into a small training set in text categorization
AU - Lee, Kyung Soon
PY - 2008
Y1 - 2008
N2 - This paper explores the incorporation of topical support documents into a training set as a means of compensating for a shortage of positive training data in text categorization. To support topical representation, our method applies a simple transformation to documents, i.e., making new documents from existing positive documents by squaring a conventional term weight. The topical support documents thus created not only are expected to preserve the topic, but even improve the topical representation by emphasizing terms with higher weights. Experiments with support vector machines showed the effectiveness on RCV1 collection with a small number of positive training data. Our topical support representation achieved 52.01% and 8.83% improvements for 33 and 56 categories of RCV1 Topic in micro-averaged F1 with less than 100 and 300 positive documents in learning, respectively. Result analyses based on robustness indicate that topical support documents contribute to a steady and stable improvement.
AB - This paper explores the incorporation of topical support documents into a training set as a means of compensating for a shortage of positive training data in text categorization. To support topical representation, our method applies a simple transformation to documents, i.e., making new documents from existing positive documents by squaring a conventional term weight. The topical support documents thus created not only are expected to preserve the topic, but even improve the topical representation by emphasizing terms with higher weights. Experiments with support vector machines showed the effectiveness on RCV1 collection with a small number of positive training data. Our topical support representation achieved 52.01% and 8.83% improvements for 33 and 56 categories of RCV1 Topic in micro-averaged F1 with less than 100 and 300 positive documents in learning, respectively. Result analyses based on robustness indicate that topical support documents contribute to a steady and stable improvement.
KW - Support vector machine
KW - Text categorization
KW - Topical support representation
KW - Transformation
UR - https://www.scopus.com/pages/publications/70349240843
U2 - 10.1145/1458082.1458361
DO - 10.1145/1458082.1458361
M3 - Conference paper
AN - SCOPUS:70349240843
SN - 9781595939913
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1511
EP - 1512
BT - Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM'08
T2 - 17th ACM Conference on Information and Knowledge Management, CIKM'08
Y2 - 26 October 2008 through 30 October 2008
ER -