Line and Ligature Segmentation of Urdu Nastaleeq Text
نویسندگان
چکیده
منابع مشابه
Line and Ligature Segmentation in Printed Urdu Document Images
This paper presents a technique for segmentation of printed Urdu text images into lines and ligatures, a key pre-processing step in Urdu Optical Character Recognition (OCR) systems. Unlike classical projection profile based line segmentation methods, the proposed scheme successfully segments overlapping and touching lines. Once the lines are segmented, ligatures are extracted from each text lin...
متن کاملArabic & Urdu Text Segmentation Challenges & Techniques
Text Segmentation is one of the critical and vital step in OCR system of any language because accuracy of OCR depends upon correctly segmented characters. Segmentation divide the text images into its constituent parts (i.e. lines, components or words and individual characters). As Urdu and Arabic are highly cursive and context sensitive in nature and have improper space between words therefore,...
متن کاملSegmentation-free optical character recognition for printed Urdu text
This paper presents a segmentation-free optical character recognition system for printed Urdu Nastaliq font using ligatures as units of recognition. The proposed technique relies on statistical features and employs Hidden Markov Models for classification. A total of 1525 unique high-frequency Urdu ligatures from the standard Urdu Printed Text Images (UPTI) database are considered in our study. ...
متن کاملFont Size Independent OCR for Noori Nastaleeq
This paper presents a technique for font size independent OCR of Noori Nastaleeq. Most of the existing OCRs for Noori Nastaleeq support only a single font size. Urdu government documents, news papers, magazines and books written in Noori Nastaleeq font style, has varying range of font sizes. The presented technique in this paper gives support for the font size independence for Noori Nastaleeq O...
متن کاملUrdu Word Segmentation
Word Segmentation is the foremost obligatory task in almost all the NLP applications where the initial phase requires tokenization of input into words. Urdu is amongst the Asian languages that face word segmentation challenge. However, unlike other Asian languages, word segmentation in Urdu not only has space omission errors but also space insertion errors. This paper discusses how orthographic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2017
ISSN: 2169-3536
DOI: 10.1109/access.2017.2703155