Journal of Engineering and Applied Sciences

Year: 2017
Volume: 12
Issue: 13
Page No. 3534 - 3540

Enhanced Feature for Short Document Classification

Authors : Ali Abdulkadhim Hasan, Sabrina Tiun, Maryati Mohd Yusof, Umi Asma` Mokhtar and Dian Indrayani Jambari

Abstract: Now a days, the use of short text has been increased dramatically in which many applications are being relied on short text such as mobile messaging, breaking news social media and queries. The key challenging behind the short text lies on the limitation of acquiring context information from such text. This limitation increases both sparsity and ambiguity of the text. The traditional approaches that have been used for the classical text such as bag-of-words, seems to be insufficient due to the too limited information that could be extracted from the short text. This leads to loss the semantic knowledge and the semantic relations between the words within the short text. Hence, this study aims to propose a new feature selection method based on Interesting Term Count (ITC) with an external knowledge of WordNet and weighting to new weight (di) to identify the variation between classes on the base of ITC. The proposed feature selection approach aims at identifying the frequent terms without losing the semantic manner where the WordNet will be utilized in order to provide the semantic correspondences among the words within the short text. Furthermore, three classification methods have been used including support vector machine, J48 and Naive Bayes. The evaluation has been performed by applying the three classifiers with the proposed feature selection method and without the proposed feature selection method. Experimental results shown an outperformance of the classifiers with the proposed feature selection method. This can imply the effectiveness behind using the proposed ITC with external source knowledge for the short text classification.

How to cite this article:

Ali Abdulkadhim Hasan, Sabrina Tiun, Maryati Mohd Yusof, Umi Asma` Mokhtar and Dian Indrayani Jambari, 2017. Enhanced Feature for Short Document Classification. Journal of Engineering and Applied Sciences, 12: 3534-3540.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved