The Community for Technology Leaders
2015 13th International Conference on Frontiers of Information Technology (FIT) (2015)
Islamabad, Pakistan
Dec. 14, 2015 to Dec. 16, 2015
ISBN: 978-1-4673-9665-3
pp: 334-340
ABSTRACT
Urdu as a language, is gaining popularity because lot many people around the world e.g, India, Pakistan, Bangladesh, etc., speak and understand it. Like other languages e.g, Latin, Chinese, Japanese, Persian, Arabic, etc., Urdu is also under consideration of research community for developing Optical Character Recognition (OCR) Systems. Like Arabic, Urdu script comes with a number of fonts e.g, Nasakh, Nastalique, Noori, etc. The presented work uses analytical approach to recognize machine written Urdu Nastalique script. The methodology includes 3 major modules, (1) Preprocessing that uses binarization and filtering on the input image, (2) Main Process that includes sub phases Line Segmentation, Baseline Detection, Thinning, Segmentation, Smoothing, Dot Recognition from preprocessed image, and (3) Recognition that normalizes the processed image into a standard size of 50x32 and makes a row vector of 1600 using row-major order. Finally it uses Feed Forward Neural Network to recognize the processed input image as one of the 271 ligature classes. The neural network has 1600 neurons in input layer, 60 hidden neurons, and 271 output neurons. The methodology is evaluated on 10 images, 69 lines, and 1292 ligatures. The overall recognition rate is 87%.
INDEX TERMS
Optical character recognition software, Character recognition, Image segmentation, Feature extraction, Shape, Optical imaging, Image recognition,Recognition, Optical Character Recognition (OCR), Analytical Approach, Neural Network
CITATION
Sabahat Mir, Safdar Zaman, Muhammad Waqas Anwar, "Printed Urdu Nastalique Script Recognition Using Analytical Approach", 2015 13th International Conference on Frontiers of Information Technology (FIT), vol. 00, no. , pp. 334-340, 2015, doi:10.1109/FIT.2015.65
94 ms
(Ver 3.3 (11022016))