Abstract: One of the biggest issues that affect the Information Retrieval (IR) systems performance is the difficulties facing users to define exactly what their information needs as that information might be a gap in their knowledge. Such an issue is more problematic for classical and literary documents such as the holy Quran. One of the approaches to overcome such an issue is pseudo-relevance feedback which assumes a small number of top-ranked documents as relevant in the initial retrieval results. It selects related terms from these documents to improve the query representation through query expansion. Among the issues in the Quranic text are ambiguities and complexit y of the text. Due to these issues, users need to reformulate and refine their queries to match their information needs. Pseudo-relevance feedback can help relieve these issues. The classic Rocchio algorithm has been widely used to support query reformulation in pseudo relevance feedbacks. In this research, a modified Rocchio algorithm was proposed by considering element of terms selection and query importance. In this case it combines the Term Frequency and Inverse Document Frequency (TF-IDF) weights and Rocchios algorithm weights in order to generate a new query. It also uses the frequency of terms to choose suitable expansion words. Evaluation of the proposed algorithm were compared against the probabilistic IR Model implemented in Lucene toolkit and against the WordNet query expansion approach. The experiments only consider relevance feedbacks after two iterations. The evaluation used the Quranic dataset previously used by other researchers. Twelve queries were considered during the evaluation. The results of the experiments showed that the proposed method exhibit significant improvement in recall and precision. The average precision through pseudo relevance feedback for the first iteration was 8.3% and for the second iteration was 11.3% whereas, the average precision by Lucene was 3.3% and the average precision by WordNet query expansion was 2.7%. These results prove that the proposed method improves retrieval performance.
Yasir Hadi Farhan and Shahrul Azman Mohd. Noah, 2017. Psuedo Relevance Feeback for Literary Documents. Asian Journal of Information Technology, 16: 599-604.