Deep Learning Based Cursive Text Detection and Recognition in Natural Scene Images

Research Area:  Machine Learning


This thesis addresses the challenging problem of cursive text, specifically Urdu text detection and recognition in natural scene images. Developing robust text spotting systems for such languages are more complex and challenging than for non-cursive languages such as English. This is mainly due to the language complexities and several challenges associated with cursive text in natural images. Different machine learning and deep learningbased methods have been proposed to detect and recognize Urdu text while considering variations in text (different font sizes, colors, writing styles, aspect ratios) and environmental factors (blur, occlusion, un-even lighting, etc.).Three different deeplearning approaches have been proposed throughout this research work. The first approach combines multi-scale and multi-level features of a Convolutional Neural Network (CNN) to recognize isolated Urdu character images. A nearest interpolation method is proposed to normalize the spatial dimensions of feature maps. An up-sampling and element-wise addition operation is used to aggregate multi-scale features and pass these to a multi-level feature fusion network. Finally, the aggregated and multi-level features are combined to create a more powerful feature set. The proposed method improves Urdu character recognition accuracy when compared to sequential CNN and machine learning methods.

Name of the Researcher:  Asghar Ali

Name of the Supervisor(s):  Professor Mark R. Pickering

Year of Completion:  2020

University:  The University of New South Wales

