Abstract: Usually, in text mining techniques, the term frequency of a term (word or phrase) is computed to explore the importance of the term in the document. However, two terms can have the same frequency in their documents but one term contributes more to the meaning of its sentences than the other term. In this study, a novel concept-based mining model is proposed. The proposed model captures the semantic structure of each term within a sentence and document rather than the frequency of the term within a document only. In the proposed model, three measures for analyzing concepts on the sentence, document and corpus levels are computed. Each sentence is labelled by a semantic role labeller that determines the terms which contribute to the sentence semantics associated with their semantic roles in a sentence. Each term that has a semantic role in the sentence is called a concept. Concepts can be either words or phrases and are totally dependent on the semantic structure of the sentence. When a new document is introduced to the system, the proposed mining model can detect a concept match from this document to all the previously processed documents in the data set by scanning the new document and extracting the matching concepts. A new concept-based similarity measure which makes use of the concept analysis on the sentence, document and corpus levels is proposed.
A. Ronald Tony and D. Saravanan, 2015. Text Taxonomy Using Datamining Clustering System. Asian Journal of Information Technology, 14: 97-104.