Low Quality Mobile Image Data Processing Under Uneven Shading - Separating and Cleaning Text Lines and Graphic Regions in mobile Color Document Image

نویسندگان

  • Xiaohua Zhang
  • Ning Xie
  • Masayuki Nakajima
  • Masaki Hayashi
  • Steven Bachelder
چکیده

This paper proposes a simple approach for extracting texts from graphic regions in low quality color document images taken by smart phones or other mobile devices with cameras. An algorithm first computes an edge map by the Canny edge detector. All textual and non-textual regions are then analyzed heuristically based on their connected components(CC). A 2D histogram is calculated to estimate the frequent width and height of connected components. After grouping the CCs according to association rules, the CCs in which the width or height levels are then measured as extremely large or small are assigned as non-textual regions. The remaining CCs are then extracted as text regions. The results of our experimentations demonstrate that the proposed approach performs with plausible consistency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Directional Stroke Width Transform to Separate Text and Graphics in City Maps

One of the complex documents in the real world is city maps. In these kinds of maps, text labels overlap by graphics with having a variety of fonts and styles in different orientations. Usually, text and graphic colour is not predefined due to various map publishers. In most city maps, text and graphic lines form a single connected component. Moreover, the common regions of text and graphic lin...

متن کامل

Document Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)

Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Ancient Document Images Enhancement Using Phase Based Binarization

In this paper, we present a phase-based binarization model for degraded document images, also a post processing method that can improve any binarization method and a ground truth generation tool. Usually, many binarization techniques are implemented in the literature for different types of binarization problems. It include an adaptive image contrast based document image binarization technique t...

متن کامل

استخراج پارامترهای ساختاری منسوج تاری و پودی با استفاده از روش موجک- فازی و الگوریتم ژنتیک

Flexibility of woven fabric structure has caused many errors in yarn location detection using customary methods of image processing. On this line, proposing an adaptive method with fabric image properties is concentrated to extract its parameters. In this regards, using meta-heuristic algorithms seems applicable to correspond extraction algorithm of structural parameters to the image conditions...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016