Feature engineering for topical clustering based on named entity

  • Hyo Jung Oh
  • , Bo Hyun Yun*
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Conventional clustering researches are focused on the extraction of keywords for word similarity grouping. However, high complexity, low speed, and low accuracy are incurred owing to the computation of too many candidates. To overcome these weaknesses, this paper presents a topical web document clustering model using not only keywords but also named entities such as a person's name, organization, and location. We compare our proposed model with traditional models experimentally and analyze how different the effects of named entities are according to the characteristics of the document collection. For feature engineering, we adopt word embedding techniques as the collective name for a set of language modeling in natural language processing. In particular, we examine the correlation among topic words of clustered sets according to the concept level of the named entities.

Original languageEnglish
Pages (from-to)763-774
Number of pages12
JournalFar East Journal of Electronics and Communications
Volume16
Issue number4
DOIs
StatePublished - 2016.12

Keywords

  • Feature engineering
  • Named entity
  • Web document clustering
  • Word Embedding

Quacquarelli Symonds(QS) Subject Topics

  • Computer Science & Information Systems
  • Engineering - Electrical & Electronic
  • Engineering - Petroleum

Fingerprint

Dive into the research topics of 'Feature engineering for topical clustering based on named entity'. Together they form a unique fingerprint.

Cite this