نتایج جستجو برای: historical documents

تعداد نتایج: 175141  

2012
Thomas L. Packer David W. Embley

Lists are often the most data-rich parts of a document collection, but are usually not set apart explicitly from the rest of the text, especially in a corpus of historical OCRed documents. There are many kinds of lists, differing from each other in both layout and content. Writing individualized code to process all possible types of lists is an expensive challenge. In the present research, we f...

2014
Jan-Hendrik Worch Björn Gottfried Joachim Hertzberg Michael Beetz

The solution for a feature selection problem is presented in the field of document image processing. The choice of shape features for describing glyphs of historical documents is a non-trivial task since the variations of glyphs in different documents is innumerable. Hence, the manual selection of shape features would be a cumbersome task. To select a subset of features from a given set a genet...

2007
Joost van Beusekom Faisal Shafait Thomas M. Breuel

In the research area of historical documents it is of high interest to reconstruct the process of the emergence of a historical typesetted document. Therefore, the chronological order of the different versions of a typesetted document has to be reconstructed. This is done by manually finding differences in two versions and then deciding on the order between these two versions. In this paper we ...

Journal: :Archives of disease in childhood 2005
D Martino A Tanner G Defazio A J Church K P Bhatia G Giovannoni R C Dale

Sydenham's chorea (SC) became a well defined nosological entity only during the second half of the nineteenth century. Such progress was promoted by the availability of large clinical series provided by newly founded paediatric hospitals. This paper analyses the demographic and clinical features of patients with chorea admitted to the first British paediatric hospital (the Hospital for Sick Chi...

2015
Anshul Gupta Ricardo Gutierrez-Osuna Matthew Christy Boris Capitanu Loretta Auvil Liz Grumbach Richard Furuta Laura Mandell

Mass digitization of historical documents is a challenging problem for optical character recognition (OCR) tools. Issues include noisy backgrounds and faded text due to aging, border/marginal noise, bleed-through, skewing, warping, as well as irregular fonts and page layouts. As a result, OCR tools often produce a large number of spurious bounding boxes (BBs) in addition to those that correspon...

2016
Haithem Afli Andy Way

Machine Translation (MT) plays a critical role in expanding capacity in the translation industry. However, many valuable documents, including digital documents, are encoded in non-accessible formats for machine processing (e.g., Historical or Legal documents). Such documents must be passed through a process of Optical Character Recognition (OCR) to render the text suitable for MT. No matter how...

2016
Brian Davis Robert Clawson William Barrett

In the absence of accurate handwriting recognition for historical documents, computer assisted transcription (CAT) methods move into the spotlight. We explore some of the weaknesses of current CAT systems and propose a CAT system which relies on subword spotting that overcomes most of these. The system is ideal crowdsourcing transcription to mobile users.

2011
Vicente Alabau Verónica Romero Antonio L. Lagarda Carlos D. Martínez-Hinarejos

Handwritten Text Recognition is a problem that has gained attention in the last years due to the interest in the transcription of historical documents. Handwritten Text Recognition employs models that are similar to those employed in Automatic Speech Recognition (Hidden Markov Models and n-grams). Dictation of the contents of the document is an alternative to text recognition. In this work, we ...

2011
Elisa H. Barney Smith Jérôme Darbon Laurence Likforman-Sulem

This paper proposes a novel method for document enhancement. The method is based on the combination of two state-of-the-art filters through the construction of a mask. The mask is applied to a TV (Total Variation) regularized image where background noise has been reduced. The masked image is then filtered by NLmeans (Non Local Means) which reduces the noise in the text areas located by the mask...

Journal: :Pattern Recognition 2007
Maya R. Gupta Nathaniel P. Jacobson Eric K. Garcia

We consider the problem of document binarization as a pre-processing step for optical character recognition (OCR) for the purpose of keyword search of historical printed documents. A number of promising techniques from the literature for binarization, pre-filtering, and post-binarization denoising were implemented along with newly developed methods for binarization: an error diffusion binarizat...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید