International Journal of Soft Computing

Year: 2008
Volume: 3
Issue: 1
Page No. 58 - 62

Similarity-Based Techniques for Text Document Classification

Authors : S. Senthamarai Kannan and N. Ramaraj

Abstract: With large scale text classification labeling a large number of documents for training poses a considerable burden on human experts who need to read each document and assign it to appropriate categories. With this problem in mind, our goal was to develop a text categorization system that uses fewer labeled examples for training to achieve a given level of performance using a similarity-based learning algorithm and thresholding strategies. Experimental results show that the proposed model is quite useful to build document categorization systems. This has been designed for a small level implementation considering the size of the corpus being used. This can be enhanced for a larger data set and the efficiency can be proved against the performance of the presently available methods like SVM, naive bayes etc. This approach on the whole concentrates on categorizing small level documents and does the assigned task with completeness.

How to cite this article:

S. Senthamarai Kannan and N. Ramaraj , 2008. Similarity-Based Techniques for Text Document Classification. International Journal of Soft Computing, 3: 58-62.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved