A Generic Recognition System for Making Archives Documents accessible to Publi

نویسندگان

  • Bertrand Coüasnon
  • Ivan Leplumey
چکیده

This paper presents annotations needed for handwritten archives document retrieval by content. We propose two complementary ways of producing those annotations : automatically by using optical document recognition and collectively by using Internet and a manual input by users. A platform for managing those annotations is presented as well as examples of automatic annotations on civil status registers, military forms (tested on 60,000 pages) and naturalization decrees, using a generic document recognition method. Examples of collective annotations built on automatic annotations are also

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Accès par le contenu aux documents manuscrits d'archives numérisés

This paper presents handwritten archives document retrieval by content. This retrieval is build on information (annotations) associated to document images. We propose two complementary ways of producing those annotations : automatically by using optical document recognition and collectively by using internet and a manual input by users. A platform for managing those annotations is presented as ...

متن کامل

Enriching Textual Documents with Timecodes from Video Fragments

The OLIVE project aims the development of a multilingual indexing tool for broadcast material based on speech recognition, which automatically produces indexes from the sound track of a program (television or radio). Such a tool allows multimedia archives to be searched by keywords and corresponding fragments to be retrieved. This paper gives a report on the alignment module, which is one of th...

متن کامل

Making Indian Language Legacy Documents Accessible Via Web

The reliable optical character recognition is not available for scripts of Indian languages. Thus, the only way to make legacy documents in Indian languages available on the web is by scanning them. This work is an attempt to cater to the need for a better representation and efficient storage technique for Indian language documents and their near perfect regeneration at the browser. We work wit...

متن کامل

The Making of a New Medical Specialty: A Policy Analysis of the Development of Emergency Medicine in India

Background Medical specialization is an understudied, yet growing aspect of health systems in low- and middleincome countries (LMICs). In India, medical specialization is incrementally, yet significantly, modifying service delivery, workforce distribution, and financing. However, scarce evidence exists in India and other LMICs regar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003