Semantic Data Integration across Different Scales: Automatic Learning of Generalization Rules
نویسنده
چکیده
In this paper we present an approach realizing the integration of data sets of different origin and with different resolution levels. The underlying idea is to reveal semantic correspondences between object classes of different geo-ontologies only by analysis of spatial and geometrical characteristics of instances of the data sets. As a result we derive transformation rules with Data Mining methods, which subsequently allow the semantic connection between data sets. For our case study we use data sets with similar thematic focus, but different semantic and geometric resolution: on the one hand building objects from cadastral data in a scale about 1:1K (detailed data set) and on the other hand settlement areas from topographic data in a scale about 1:25K (less detailed data set). To derive links between instances from the detailed data set to the more general one by geometric overlay it is required that the data sets are available in the same geographical extent. Then we generalize the detailed data set by using the object boundaries in the less detailed data set as a constraint for the generalization and generate an ’intermediate data set’ that is in a similar spatial resolution. We enrich the given information with additional attributes representing spatial relations and implicit or intrinsic given instance properties (e.g. object size), in order to derive transformation rules. These rules can be further used for classification of settlement areas of unknown regions in the target data set.
منابع مشابه
Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملPresentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures
Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...
متن کاملSemantic Abstraction for generalization of tweet classification: An evaluation of incident-related tweets
Social media is a rich source of up-to-date information about events such as incidents. The sheer amount of available information makes machine learning approaches a necessity to process this information further. This learning problem is often concerned with regionally restricted datasets such as data from only one city. Because social media data such as tweets varies considerably across differ...
متن کاملA Semi-automatic Approach to Update Mapping for Ontology Evolution
The web is one of the important sources of information for different data integration systems. Semantic web represents the knowledge in web through ontologies. When knowledge is to be shared across different sources, integration of ontologies is achieved by means of ontology mapping and merging. When the ontology evolves from one state to another the mapping becomes stale and the user may not r...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008