Character Segmentation Scheme for OCR System: For Myanmar Printed Documents

نویسندگان

  • Htwe Pa Pa Win
  • Phyo Thu Thu Khine
  • Khin Nwe Ni Tun
چکیده

Automatic machine-printed Optical Characters or texts Recognizers (OCR) are highly desirable for a multitude of modern IT applications, including Digital Library software. However, the state of the art OCR systems cannot do for Myanmar scripts as the language poses many challenges for document understanding. Therefore, the authors design an Optical Character Recognition System for Myanmar Printed Document (OCRMPD), with several proposed techniques that can automatically recognize Myanmar printed text from document images. In order to get more accurate system, the authors propose the method for isolation of the character image by using not only the projection methods but also structural analysis for wrongly segmented characters. To reveal the effectiveness of the segmentation technique, the authors follow a new hybrid feature extraction method and choose the SVM classifier for recognition of the character image. The proposed algorithms have been tested on a variety of Myanmar printed documents and the results of the experiments indicate that the methods can increase the segmentation accuracy as well as recognition rates. DOI: 10.4018/978-1-4666-3906-5.ch018

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Character Segmentation Scheme for OCR System : For Myanmar Printed

Automatic machine-printed Optical Characters or texts Recognizers (OCR) are highly desirable for a multitude of modern IT applications, including Digital Library software. However, the state of the art OCR systems cannot do for Myanmar scripts as the language poses many challenges for document understanding. Therefore, the authors design an Optical Character Recognition System for Myanmar Print...

متن کامل

OCR for printed Kannada text to Machine editable format using Database approach

This paper describes an Optical Character Recognition (OCR) system for printed text documents in Kannada, a South Indian language. The proposed OCR system for the recognition of printed Kannada text, which can handle all types of Kannada characters. The system first extracts image of Kannada scripts, then from the image to line segmentation then segments the words into sub-character level piece...

متن کامل

Bilingual OCR System for Myanmar and English Scripts with Simultaneous Recognition

Htwe Pa Pa Win, Phyo Thu Thu Khine, Khin Nwe Ni Tun AbstractThe increasing amount of development of the digital libraries worldwide raises many new challenges for document image analysis research and development. Storing wide variety of document images in Digital library, for example, for cultural, technical or historical, that are written in many languages, also create many advancement for pre...

متن کامل

A Structural Analysis Based Feature Extraction Method for OCR System For Myanmar Printed Document Images

This paper proposes a new feature extraction method for off-line recognition of Myanmar printed documents. One of the most important factors to achieve high recognition performance in Optical Character Recognition (OCR) system is the selection of the feature extraction methods. Different types of existing OCR systems used various feature extraction methods because of the diversity of the script...

متن کامل

Study and Analysis for Development of an Efficient OCR for Printed and Handwritten ODIA Documents: A Survey

The OCR (optical character recognition) is the process of translating the hand written or printed text into a format that is understood by the machine for the purpose of editing, searching and indexing. Preprocessing, segmentation, features extraction, classification and post processing are the main phases of any OCR system and these specific fields are in use today. For all these tasks the seg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJCVIP

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2011