Asian Journal of Information Technology

Year: 2013
Volume: 12
Issue: 7
Page No. 236 - 241

Synonym Based Duplicate Record Detection

Authors : K. Amshakala and R. Nedunchezhian

Abstract: As the amount of data and data providers are increasing tremendously, there is a high demand for integrating data from heterogeneous data sources. Often, in the real world, entities have two or more representations and data are not defined in a consistent way across different data sources. When answering user’s query, results are returned to the users by combining data from several databases and the results include duplicate entries. Duplicate detection techniques detect multiple representations of identical real world entities. Without using duplicate record detection techniques, the quality of the extracted data remains low. This study presents an unsupervised duplicate record detection technique which does not require expert’s knowledge or hand coded rules to detect duplicate records. A large lexical database called WordNet ontology is used to match the entities.

How to cite this article:

K. Amshakala and R. Nedunchezhian, 2013. Synonym Based Duplicate Record Detection. Asian Journal of Information Technology, 12: 236-241.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved