document image

Unconstrained Tight Structure Extraction Using Voronoi Tesselation on Document Images

2006

P. Nagabhushan Sahana D. Gowda R. K. Bharathi

Document structure is the intermediary result obtained through page segmentation, which is used in the analysis of the document image. The structure serves the purpose of extracting the shape of the document from paragraph up to character level in a hierarchical exploratory methodology for understanding the layout structure of the document image. The extracted layout forms a dominant feature wh...

متن کامل

A comprehensive survey of mostly textual document segmentation algorithms since 2008

Journal: :Pattern Recognition 2017

Sébastien Eskenazi Petra Gomez-Krämer Jean-Marc Ogier

In document image analysis, segmentation is the task that identifies the regions of a document. The increasing number of applications of document analysis requires a good knowledge of the available technologies. This survey highlights the variety of the approaches that have been proposed for document image segmentation since 2008. It provides a clear typology of documents and of document image ...

متن کامل

Classification et extraction des documents complexes à partir des images issues d'un périphérique mobile : application aux documents d'identité

2016

Ahmad Montaser Awal Abdullah Almaksour

We propose in this paper a document image classification method. In contrary to most of existing systems, the proposed approach allow locating the document and recognizing its type simultaneously. First, a knowledge base of document models is created from reference images. Training images are not indispensable and though only one reference image is enough to create a document model. Then, key-p...

متن کامل

An Automatic Closed-loop Methodology for Generating Character Groundtruth for Scanned Documents an Automatic Closed-loop Methodology for Generating Character Groundtruth for Scanned Documents an Automatic Closed-loop Methodology for Generating Character Groundtruth for Scanned Documents

1998

Tapas Kanungo Robert M. Haralick

Character groundtruth for real, scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not practical because (i) accuracy in delineating groundtruth character bounding boxes is not high enoug...

متن کامل

Resolution-sensitive document image analysis for document repurposing

2004

Kathrin Berkner Edward L. Schwartz

The variety of displays used to browse and view images has created a need to adapt an image representation to the size constraints of a given display. In this paper a resolutionsensitive analysis of document images is performed that segments images into text and image regions using document layout analysis and JPEG 2000 header-based processing. Repurposing of the document for viewing on a given...

متن کامل

Connected Component Based Word Spotting on Persian Handwritten image documents

Journal: International Journal of Nonlinear Analysis and Applications 2019

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

Document Image Binarization Technique for Degraded Document Images

2015

Supriya Lokhande N. A. Dawande Bolan Su Shijian Lu Chew Lim Tan

Document image binarization is a vital pre-processing technique for document image analysis that segments text from badly degraded document images. In this paper, we propose a robust document image binarization technique that is based on the concept of adaptive image contrast. The adaptive image contrast which is formed by combining local image contrast and the local image gradient makes it tol...

متن کامل

Content-based document image retrieval in complex document collections

2007

Shlomo Argamon Ophir Frieder David A. Grossman David D. Lewis

We address the problem of content-based image retrieval in the context of complex document images. Complex document are documents that typically start out on paper and are then electronically scanned. These documents have rich internal structure and might only be available in image form. Additionally, they may have been produced by a combination of printing technologies (or by handwriting); and...

متن کامل

پژوهشی کیفی در تحلیل الگوی بهره‌گیری خبرگان حوزه‌ی سلامت از تصاویر پزشکی

ژورنال: مدیریت سلامت 2015

منصوریان, یزدان, کیانی, معصومه,

Introduction: In health sector, image functions as a form of document that can convey a considerable amount of information. Employing this type of information can increase the effectiveness of the performance of medical experts. This study aimed to survey how health experts use medical images in their practice. Methods: This applied qualitative study was carried out in 1392 (2013). The study p...

متن کامل

An Automatic Closed-Loop Methodology for Generating Character Groundtruth for Scanned Documents

Journal: :IEEE Trans. Pattern Anal. Mach. Intell. 1999

Tapas Kanungo Robert M. Haralick

Character groundtruth for real, scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not practical because (i) accuracy in delineating groundtruth character bounding boxes is not high enoug...

متن کامل