PERFORMANCE EVALUATION OF SELECTED DISTANCE-BASED AND DISTRIBUTION-BASED CLUSTERING ALGORITHMS


PERFORMANCE EVALUATION OF SELECTED DISTANCE-BASED AND DISTRIBUTION-BASED CLUSTERING ALGORITHMS

Ajiboye, A. R. Olufadi, H. I.

Department of Computer Science
Faculty of Communication & Information Sciences
University of Ilorin, Ilorin, Nigeria.
Email: This email address is being protected from spambots. You need JavaScript enabled to view it.

ABSTRACT
Clustering is an automated search for hidden patterns in a datasets to unveil group of related observations. The technique is one of the viable means by which the patterns or internal structure of the data within the same collection can be revealed. Choosing the right algorithm to achieve clusters of good quality is usually a challenge, especially when the number of clusters cannot be pre-determined. This study focuses on evaluating a number of selected clustering algorithms in finding quality clusters in the data sets. To achieve the central objective of this study, prominent technique in both the distance-based and the distribution-based clustering algorithm, specifically k-means and EM clustering algorithm respectively are implemented in this study. The data sets on which the algorithms were implemented comprised of 1,309 records of passenger information that boarded a ship retrieved from rapidMiner open repository. Experiments were conducted and clusters were formed based on the number of chosen partitions, k. The qualities of the clusters formed are measured using the concept of external criterion, Normalized Mutual Information (NMI), to validate all the clusters formed. The resulting output of this study shows that, the distance-based algorithm find clusters of higher quality with NMI value of 0.912 out of a maximum achievable value of 1. The experiment further reveals the average execution time it takes each algorithm to form the cluster model. The findings of this study also unveiled some useful insight into the choice of clustering algorithm as regards their support for a particular data type and the ease of execution of each algorithm.


Keywords: clustering, data mining, k-means, EM-clustering, un-supervised learning.


pdf ico FULL PAPER

 
 
 
 
 

Contact Us

Managing Editor of IJSECS
Faculty of Computer Systems & Software Engineering (FSKKP)

Universiti Malaysia Pahang
Lebuhraya Tun Razak
26300 Gambang,
Kuantan, Pahang Darul Makmur.

Tel: +609 549 2133
Fax: +609 549 2144
Email: ijsecsfskkp@ump.edu.my

Visitor Counter

0057450
Today
Yesterday
This Week
Last Week
This Month
Last Month
All days
61
131
551
588
1781
2435
57450