Abstract
Conventional clustering researches are focused on the extraction of keywords for word similarity grouping. However, high complexity, low speed, and low accuracy are incurred owing to the computation of too many candidates. To overcome these weaknesses, this paper presents a topical web document clustering model using not only keywords but also named entities such as a person's name, organization, and location. We compare our proposed model with traditional models experimentally and analyze how different the effects of named entities are according to the characteristics of the document collection. For feature engineering, we adopt word embedding techniques as the collective name for a set of language modeling in natural language processing. In particular, we examine the correlation among topic words of clustered sets according to the concept level of the named entities.
| Original language | English |
|---|---|
| Pages (from-to) | 763-774 |
| Number of pages | 12 |
| Journal | Far East Journal of Electronics and Communications |
| Volume | 16 |
| Issue number | 4 |
| DOIs | |
| State | Published - 2016.12 |
Keywords
- Feature engineering
- Named entity
- Web document clustering
- Word Embedding
Quacquarelli Symonds(QS) Subject Topics
- Computer Science & Information Systems
- Engineering - Electrical & Electronic
- Engineering - Petroleum
Fingerprint
Dive into the research topics of 'Feature engineering for topical clustering based on named entity'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver