Search in Medwell
 
 
Asian Journal of Information Technology
Year: 2011 | Volume: 10 | Issue: 8 | Page No.: 341-347
DOI: 10.3923/ajit.2011.341.347  
The Use of Hartigan Index for Initializing K-Means++ in Detecting Similar Texts of Clustered Documents as a Plagiarism Indicator
Diana Purwitasari , I. Wayan Surya Priantara , Putu Yuwono Kusmawan , Umi Laili Yuhana and Daniel Oranova Siahaan
 
Abstract: Plagiarism is increasingly alarming, especially if this happens in the field of education. Many writing works in which a part of the content is written by plagiarizing other people’s works. Similar sentence detection as a plagiarism indicator can be conducted by using n-gram based hashing algorithm of Winnowing algorithm. The function of Winnowing is to generate document fingerprint which convert texts within document into a collection of hash values. Similar fingerprint between documents shows that there are similar texts as a plagiarism indicator. Plagiarizing usually happens on documents having similar topics. Therefore, to detect plagiarism, documents having similar topics should be clustered. K-means++ is a clustering algorithm that requires cluster number as its input through recommendation conducted by Hartigan index to give a recommendation for the cluster number. After clustering documents, a comparison was made between document fingerprint and fingerprint cluster instead of between documents. Then, the comparison was made for documents which become members of the closest cluster that had been selected from the first comparison.
 
How to cite this article:
Diana Purwitasari, I. Wayan Surya Priantara , Putu Yuwono Kusmawan , Umi Laili Yuhana and Daniel Oranova Siahaan , 2011. The Use of Hartigan Index for Initializing K-Means++ in Detecting Similar Texts of Clustered Documents as a Plagiarism Indicator. Asian Journal of Information Technology, 10: 341-347.
DOI: 10.3923/ajit.2011.341.347
URL: http://medwelljournals.com/abstract/?doi=ajit.2011.341.347