Abstract: Increasingly large text datasets and the high dimensionality associated with natural language is a great challenge of text mining. Initially, researchers have been compared using three types of Document Representation (Bag of Word (BoW), Bag of Noun (BoN) and Bag of Phrase (BoP)) and researchers found that Bag of Noun and Bag of Phrase are performing better than BoW. BoP significantly improves the better F-measure than BoN and BoW when the corpus is smaller. If the corpus is larger, it increases the dimensionality. BoN document representation working efficiently and also used to reduce its dimensionality when the corpus is larger in text document clustering than BoP and BoN. Researchers have been used Bag of Noun document representation. Nouns are checked with ontology and extracted to construct term document matrix, although it reduces the dimension and gives semantics. The comparative study result shows that the performance of Bag of Noun document representation is better than Bag of Phrase. Exploration of learning algorithm gives promising results in recent years. In this study, researchers propose ontology based OHCLK-Means Clustering algorithm. It significantly improves the clustering quality than ontology based K-means and ontology based ONVK-means.
S. Vijayalakshmi and D. Manimegalai , 2013. Integrating Ontology to Enhance HCL-Based Text Document Clustering. Research Journal of Applied Sciences, 8: 358-368.