Research Journal of Applied Sciences

Year: 2013
Volume: 8
Issue: 7
Page No. 358 - 368

Integrating Ontology to Enhance HCL-Based Text Document Clustering

Authors : S. Vijayalakshmi and D. Manimegalai

Abstract: Increasingly large text datasets and the high dimensionality associated with natural language is a great challenge of text mining. Initially, researchers have been compared using three types of Document Representation (Bag of Word (BoW), Bag of Noun (BoN) and Bag of Phrase (BoP)) and researchers found that Bag of Noun and Bag of Phrase are performing better than BoW. BoP significantly improves the better F-measure than BoN and BoW when the corpus is smaller. If the corpus is larger, it increases the dimensionality. BoN document representation working efficiently and also used to reduce its dimensionality when the corpus is larger in text document clustering than BoP and BoN. Researchers have been used Bag of Noun document representation. Nouns are checked with ontology and extracted to construct term document matrix, although it reduces the dimension and gives semantics. The comparative study result shows that the performance of Bag of Noun document representation is better than Bag of Phrase. Exploration of learning algorithm gives promising results in recent years. In this study, researchers propose ontology based OHCLK-Means Clustering algorithm. It significantly improves the clustering quality than ontology based K-means and ontology based ONVK-means.

How to cite this article:

S. Vijayalakshmi and D. Manimegalai , 2013. Integrating Ontology to Enhance HCL-Based Text Document Clustering. Research Journal of Applied Sciences, 8: 358-368.

Design and power by Medwell Web Development Team. © Medwell Publishing 2022 All Rights Reserved