Data partitioning in hierarchical clustering: A parameter-insensitive approach

Kyo Sung Jeong, Seok Ho Yoon, Suk Soon Song, Sang Chul Lee, Minsoo Ryu, Sang Wook Kim, Byung Soo Jeong

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we propose a parameter-insensitive data partitioning approach for Chameleon, a hierarchical clustering algorithm. We first show that the quality of clusters produced by Chameleon is significantly affected by the sizes of initial sub-clusters and also that it is mainly because Chameleon recursively splits a dataset into two equal-sized clusters until the size of clusters becomes similar to that given by a user. Also, through preliminary experimentation, we show the problem appear in real situations. The proposed method splits a given dataset into every possible number of clusters by using existing algorithms that do allow arbitrarysized sub-clusters in partitioning. After that, it evaluates the quality of every set of initial sub-clusters by using our measurement function, and decides the optimal set of initial sub-clusters such that they show the highest value of measurement. Finally, it merges these optimal initial subclusters repeatedly and produces the final clustering result. We perform extensive experiments, and the results show that the proposed approach is insensitive to parameters and also produces a set of final clusters whose quality is better than the previous one.

Original languageEnglish
Pages (from-to)7699-7709
Number of pages11
JournalInformation (Japan)
Volume16
Issue number10
StatePublished - Oct 2013

Keywords

  • Data partitioning
  • Hierarchical clustering
  • Parameter-insensitive

Fingerprint

Dive into the research topics of 'Data partitioning in hierarchical clustering: A parameter-insensitive approach'. Together they form a unique fingerprint.

Cite this