Connected Component Based Word Spotting on Persian Handwritten image documents

نویسندگان: ثبت نشده
چکیده مقاله:

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-uments using attribute-based classication and label-embedding. For this purpose, a hierarchicalframework is proposed, in which at rst, the candidate are selected based on connected compo-nents(CCs) sequence. Then, the query word is segmented to constructor CCs, and similar CCs countin the candidate region of document are selected based on their distances to the CCs count of thequery word. As a result, the candidate regions are extracted. In the nal phase, the query wordis located only in the candidate regions of the document. A well known Persian handwritten textdataset, namely FTH, is chosen as a benchmark for the presented method. The results shows thatthe proposed method outperforms the state-of-the-art methods, 81.02 percent for unseen word classretrieval.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Offline Word Spotting in Handwritten Documents

The digitization of written human knowledge into string data has reached up to but not beyond the recognition of typeset text. This means that vast libraries of handwritten, cursive documents must be indexed and transcribed by a human—a prohibitively laborious task. This paper explores an existing technique developed in [1] and [12] for the offline indexation of historical handwritten documents...

متن کامل

Word Spotting in Handwritten Arabic Documents Using Bag-Of-Descriptors

This paper presents a query-by-example word spotting in handwritten Arabic documents, based on Scale Invariant Feature Transform (SIFT), without using any text word or line segmentation approach, because any errors affect to the subsequent word representation. First the interest points are automatically extracted from the images using SIFT detector, then, we use SIFT descriptor to represent eac...

متن کامل

Segmentation-free Word Spotting for Handwritten Arabic Documents

6 Abstract — In this paper we present an unsupervised segmentation-free method for spotting and searching query, especially, for images documents in handwritten Arabic, for this, Histograms of Oriented Gradients (HOGs) are used as the feature vectors to represent the query and documents image. Then, we compress the descriptors with the product quantization method. Finally, a better representati...

متن کامل

On the Influence of Word Representations for Handwritten Word Spotting in Historical Documents

Word spotting is the process of retrieving all instances of a queried keyword from a digital library of document images. In this paper we evaluate the performance of different word descriptors to assess the advantages and disadvantages of statistical and structural models in a framework of query-by-example word spotting in historical documents. We compare four word representation models, namely...

متن کامل

Query Word Image based Retrieval Scheme for Handwritten Tamil Documents

This paper brings out an autoassociative neural network (AANN) based information retrieval mechanism to locate handwritten documents from a literary collection in Tamil language corresponding to query word images. The strategy extends to create models for the chosen search word images, evolve a methodology to identify the search word and subsequently retrieve the relevant documents. AANN emphas...

متن کامل

Word Spotting: Indexing Handwritten Archives

There are many historical manuscripts written in a single hand which it would be useful to index. Examples include the early Presidential papers at the Library of Congress and the collected works of W. B. DuBois at the library of the University of Massachusetts. The standard technique for indexing documents is to scan them in, convert them to machine readable form (ASCII) using Optical Characte...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 10  شماره 2

صفحات  11- 21

تاریخ انتشار 2019-12-01

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023