Asian Journal of Information Technology

Year: 2011
Volume: 10
Issue: 7
Page No. 259 - 270

Evaluation of the Implementation of Indonesian Electronic Journals Citation System Using Regex Technique and PDF Extraction Tool

Authors : Riri Fitri Sari and Agung Kurniawan

Abstract: All research papers produced by researchers worldwide now are based on previous academic publication written by other researchers. Many research papers published in electronic media and new media also refered to previous publications. The advancement of technology makes internet become the most widely used media. Research papers are published in many formats such as in the journal. Relation among journals can be traced through their citations. The number of citation to a journal study can also be calculated to show the contribution of that particular study. In order to know the relation among journal articles published on the internet, a system was designed which can automatically produce a relationship information betwen articles from different journals which are located in different websites. Therefore, in this research we created a mashup in order to extract the web pages and then pick required files automatically. This system produced a database to save the extracted files and then find the relations. The results of the process are shown in a web portal. The interface has functionalities for searching by using the key words inputted by users. As a result, the whole system forms a Mashup. We created an automatic extraction for Indonesian electronic journals system using data from fourteen university e-journal sites. We built the system using PHP language and MySQL database, after carefully studied the algorithm in Openkapow Robomaker. The system can successfully extract information from journal provider’s web pages which include special type of PDF pages then save them in database. The system generated finally shows the connection and the relation among the journals. The test result shows the processing time and memory usage evaluation for a random number of files. The evaluation results show that the execution time is dependent on the number of journal series, volumes and number of articles on related e-journal sites. The system has been complemented with some functionalities for the user interface to report the number of the total journal articles extracted automatically from different sites. Some approach such as the use of DOM tree, Regular Expression techniques and PDF extraction tools have been used to improve the system in extracting web pages and getting full journal articles to be processed.

How to cite this article:

Riri Fitri Sari and Agung Kurniawan, 2011. Evaluation of the Implementation of Indonesian Electronic Journals Citation System Using Regex Technique and PDF Extraction Tool. Asian Journal of Information Technology, 10: 259-270.

Design and power by Medwell Web Development Team. © Medwell Publishing 2024 All Rights Reserved