Abstract: This paper presents a hybrid machine learning approach to extract information from WWW. It applies structure analysis to improve the extraction accuracy, with 96.5% average precision and 96.7% average recall for static web page, and 100% precision and recall for dynamic web page. Furthermore, the working time is short (< 800 ms) and the number of learning examples is small (< 4) due to little user participation. Our results prove that this approach offers the attractive advantageous of fast, convenient and high-accuracy requirements of practical applications.
Kun Yu, Zhi Cai , Xufa Wang and Qingsheng Cai , 2005. A Hybrid Machine Learning Approach for Extracting Information from WWW . Asian Journal of Information Technology, 4: 41-48.