An end-to-end pipeline for historical censuses processing
نویسندگان
چکیده
Abstract Censuses are structured documents of great value for social and demographic history, which became widespread from the nineteenth century on. However, plurality formats natural variability historical data make their extraction arduous often lead to ungeneric recognition algorithms. We propose an end-to-end processing pipeline, based on optimization, in attempt reduce number free parameters. The layout analysis is semantic segmentation using neural networks a generic explicit column structure. implicit row structure deduced directly position text segments. handwritten detection complemented by intelligent framing method significantly improves quality HTR. In end, we combine several post-correction approaches, networks, language models, further improve performance. Ultimately, our flexible methods it possible accurately detect more than 98% columns 88% rows, despite lack graphical separator diversity formats. Thanks various reframing strategies, HTR results reach excellent performance 3.44% character error rate these noisy data. total, 18,831 pages were extracted 72 censuses over century. This large dataset, as well training data, made open-access released along with this article.
منابع مشابه
DeepISP: Learning End-to-End Image Processing Pipeline
We present DeepISP, a full end-to-end deep neural model of the camera image signal processing (ISP) pipeline. Our model learns a mapping from the raw low-light mosaiced image to the final visually compelling image and encompasses low-level tasks such as demosaicing and denoising as well as higher-level tasks such as color correction and image adjustment. The training and evaluation of the pipel...
متن کاملJEJUNAL EVERSION MUCOSECTOMY AND INVAGINATION: AN INNOVATIVE TECHNIQUE FOR THE END TO END PANCREATICOJEJUNOSTOMY
ABSTRACT Background: The pancreatojejunostomy has notoriously been known to carry a high rate of operative complications, morbidity and mortality, mainly due to anastomotic leak and ensuing septic complications. Objective: In order to decrease anastomotic leak and its attendant morbidity and mortality in operations requiring a pancreato-jejunal anastomosis, and also in order to simplify the op...
متن کاملComprehensive end-to-end test for intensity-modulated radiation therapy for nasopharyngeal carcinoma using an anthropomorphic phantom and EBT3 film
Background: In head and neck radiotherapy, immobilization devices can affect dose delivery. In this study, a comprehensive end-to-end test was developed to evaluate the accuracy of radiotherapy treatment. Materials and Methods: An Alderson Radiation Therapy (ART) anthropomorphic phantom with EBT3 film was used to mimic the actual patient treatment process. Ten patients treated for nasopharyngea...
متن کاملEnd-to-end esophagojejunostomy versus standard end-to-side esophagojejunostomy: which one is preferable?
Abstract Background: End-to-side esophagojejunostomy has almost always been associated with some degree of dysphagia. To overcome this complication we decided to perform an end-to-end anastomosis and compare it with end-to-side Roux-en-Y esophagojejunostomy. Methods: In this prospective study, between 1998 and 2005, 71 patients with a diagnosis of gastric adenocarcinoma underwent total gastrec...
متن کاملESPnet: End-to-End Speech Processing Toolkit
This paper introduces a new open source platform for end-toend speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal on Document Analysis and Recognition
سال: 2023
ISSN: ['1433-2833', '1433-2825']
DOI: https://doi.org/10.1007/s10032-023-00428-9