Abstract
In this paper, we propose a parameter-insensitive data partitioning approach for Chameleon, a hierarchical clustering algorithm. We first show that the quality of clusters produced by Chameleon is significantly affected by the sizes of initial sub-clusters and also that it is mainly because Chameleon recursively splits a dataset into two equal-sized clusters until the size of clusters becomes similar to that given by a user. Also, through preliminary experimentation, we show the problem appear in real situations. The proposed method splits a given dataset into every possible number of clusters by using existing algorithms that do allow arbitrarysized sub-clusters in partitioning. After that, it evaluates the quality of every set of initial sub-clusters by using our measurement function, and decides the optimal set of initial sub-clusters such that they show the highest value of measurement. Finally, it merges these optimal initial subclusters repeatedly and produces the final clustering result. We perform extensive experiments, and the results show that the proposed approach is insensitive to parameters and also produces a set of final clusters whose quality is better than the previous one.
Original language | English |
---|---|
Pages (from-to) | 7699-7709 |
Number of pages | 11 |
Journal | Information (Japan) |
Volume | 16 |
Issue number | 10 |
State | Published - Oct 2013 |
Keywords
- Data partitioning
- Hierarchical clustering
- Parameter-insensitive