Research Area:  Machine Learning
Advertisement identification and filtering in web pages gain significance due to various factors such as accessibility, security, privacy, and obtrusiveness. Current practices in this direction involve maintaining URL-based regular expressions called filter lists. Each URL obtained on a web page is matched against this filter list. While effectual, this procedure lacks scalability as it demands regular continuance of the filter list. To counter these limitations, we devise a machine learning based advertisement detection system using a diverse feature set which can distinguish advertisement blocks from non-advertisement blocks. The method can act as a base to provide various accessibility-related features like smooth browsing and text summarization for persons with visual impairments, cognitive impairments, and photosensitive epilepsy. The results from a classifier trained on the proposed feature set achieve 98.6% accuracy in identifying advertisements.
Author(s) Name:  AbShaqoor Nengroo and K.S.Kuppusamy
Journal name:  Future Generation Computer Systems
Publisher name:  ELSEVIER
Volume Information:  Volume 89, December 2018, Pages 68-77
Paper Link:   https://www.sciencedirect.com/science/article/abs/pii/S0167739X17328777