Determining the best K for clustering transactional datasets: A coverage density-based approach

  • Hua Yan*
  • , Keke Chen
  • , Ling Liu
  • , Joonsoo Bae
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

The problem of determining the optimal number of clusters is important but mysterious in cluster analysis. In this paper, we propose a novel method to find a set of candidate optimal number Ks of clusters in transactional datasets. Concretely, we propose Transactional-cluster-modes Dissimilarity based on the concept of coverage density as an intuitive transactional inter-cluster dissimilarity measure. Based on the above measure, an agglomerative hierarchical clustering algorithm is developed and the Merging Dissimilarity Indexes, which are generated in hierarchical cluster merging processes, are used to find the candidate optimal number Ks of clusters of transactional data. Our experimental results on both synthetic and real data show that the new method often effectively estimates the number of clusters of transactional data.

Original languageEnglish
Pages (from-to)28-48
Number of pages21
JournalData and Knowledge Engineering
Volume68
Issue number1
DOIs
StatePublished - 2009.01

Keywords

  • Differential MDI curve
  • Merging Dissimilarity Index
  • Transactional-cluster-mode
  • Transactional-cluster-modes Dissimilarity

Quacquarelli Symonds(QS) Subject Topics

  • Computer Science & Information Systems

Fingerprint

Dive into the research topics of 'Determining the best K for clustering transactional datasets: A coverage density-based approach'. Together they form a unique fingerprint.

Cite this