Abstract: As the amount of data and data providers are increasing tremendously, there is a high demand for integrating data from heterogeneous data sources. Often, in the real world, entities have two or more representations and data are not defined in a consistent way across different data sources. When answering users query, results are returned to the users by combining data from several databases and the results include duplicate entries. Duplicate detection techniques detect multiple representations of identical real world entities. Without using duplicate record detection techniques, the quality of the extracted data remains low. This study presents an unsupervised duplicate record detection technique which does not require experts knowledge or hand coded rules to detect duplicate records. A large lexical database called WordNet ontology is used to match the entities.
K. Amshakala and R. Nedunchezhian, 2013. Synonym Based Duplicate Record Detection. Asian Journal of Information Technology, 12: 236-241.