Extracting hierarchical data points and tables from scanned contracts
نویسندگان
چکیده
We present a technique for developing systems to automatically extract information from scanned semi-structured contracts. Such contracts are based on a template, but have different layouts and clientspecific changes. While the presented technique is applicable to all kinds of such contracts we specifically focus on so called ISDA credit support annexes. The data model for such documents consists of 150 individual entities some of which are tables that could span multiple pages. The information extraction is based on the Apache UIMA framework. It consists of a collection of small and simple Analysis Components that extract increasingly complex information based on earlier extractions. This technique is applied to extract individual data points and tables. Experiments show an overall precision of 97% with a recall of 93% regarding individual/simple data points and 89%/81% for table cells measured against manually entered ground truth. Due to its modular nature our system can be easily extended and adapted to other collections of contracts as long as some data model can be formulated.
منابع مشابه
Search Space Reduction for Farsi Printed Subwords Recognition by Position of the Points and Signs
In the field of the words recognition, three approaches of words isolation, the overall shape and combination of them are used. Most optical recognition methods recognize the word based on break the word into its letters and then recogniz them. This approach is faced some problems because of the letters isolation dificulties and its recognition accurcy in texts with a low image quality. Therefo...
متن کاملAutomatic 3D Feature Extraction from Structuralized LIDAR Data
Abstract: LIDAR, or laser scanning, is capable of collecting accurate 3D coordinates of scanned points densely and sub-randomly distributed on scanned object surfaces. The huge amount of 3D points implies abundant recessive spatial information which can be turned into dominant information through various data processing methods. To explore valuable spatial information from LIDAR data automatica...
متن کاملAugmented Reality for 3D Avatar using Laser Scanned Body Data
An implementation of a motion editor using laser-scanned 3D body data and the animated result in augmented reality are reported in this paper. Joint points of the skeleton in the body were picked up as pivot points for 3D rotation. The body data were framed to skeleton model and organized as hierarchical structure. In order to implement the 3D animation of the laser scanned body data, the verte...
متن کاملEfficient surface reconstruction method for distributed CAD
This paper describes a new fast Reverse Engineering (RE) method for creating a 3D computerized model from an unorganized cloud of points. The proposed method is derived directly from the problems and difficulties currently associated with remote design over the Internet, such as accuracy, transmission time and representation at different levels of abstraction. With the proposed method, 3D model...
متن کاملAutomatic Road Detection and Extraction From MultiSpectral Images Using a New Hierarchical Object-based Method
Road detection and Extraction is one of the most important issues in photogrammetry, remote sensing and machine vision. A great deal of research has been done in this area based on multispectral images, which are mostly relatively good results. In this paper, a novel automated and hierarchical object-based method for detecting and extracting of roads is proposed. This research is based on the M...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013