Authors : N. Karthikeyani Visalakshi and K. Thangavel
Abstract: Distributed clustering is an emerging research area in the broader field of Knowledge discovery in databases. Normalization is an essential preprocessing step in data mining, to standardize values of all attributes or features from different dynamic range into a specified range. In this study, distributed K-Means clustering algorithm is extended by applying global normalization before performing the clustering on distributed datasets, without necessarily downloading all the data into a single site. The performance of proposed normalization based distributed K-Means clustering algorithm is compared against distributed K-Means clustering algorithm and normalization based centralized K-Means clustering algorithm. The quality of clustering is also compared by three normalization procedures, namely Min-max, Z-score and decimal scaling for the proposed distributed clustering algorithm. The comparative analysis shows that the distributed clustering results depend on the type of normalization procedure. The experiments are carried out for various numerical datasets of UCI machine learning data repository.
N. Karthikeyani Visalakshi and K. Thangavel, 2009. Impact of Normalization in Distributed K-Means Clustering. International Journal of Soft Computing, 4: 168-172.