CroMatcher - results for OAEI 2013

نویسندگان

  • Marko Gulic
  • Boris Vrdoljak
چکیده

CroMatcher is an ontology matching system based on terminological and structural matchers. The most important part of the system is automated weighted aggregation of correspondences produced by using different basic ontology matchers. This is the first year CroMatcher has been involved in the OAEI campaign. The results obtained this year will certainly help in finding and resolving shortcomings in the system before the next campaign. 1 Presentation of the system CroMatcher is an automatic ontology matching system for determining correspondences between entities of two different ontologies. There are several terminological and structural basic matchers in CroMatcher. The system is based on a weighted aggregation that automatically determines the importance of each basic matcher according to the produced correspondences. As this is the first time the CroMatcher has taken part in the OAEI campaign, CroMatcher is fully prepared only for benchmark test set. 1.1 State, purpose, general statement CroMatcher is a system that executes several basic matchers and then aggregates the results obtained by these matchers. The system does not use any external resource. After the execution of terminological basic matchers, the automatic weighted aggregation is executed. The results of certain terminological basic matcher are included into the common results depending on their importance. The importance of certain basic matcher is determined automatically within weighted aggregation. Then, the several iterative structural matchers are executed (e.g. if the child entities are similar, the parent entities are similar too). To find correspondences with structural matchers, the common results of terminological matchers are used. After the execution of structural basic matchers, the automatic weighted aggregation is executed too. At the end of matching process, the weighted aggregation is executed for the terminological and structural common results. Finally, the method of final alignment (choosing the relevant correspondences between entities of two ontologies) is executed. This method iteratively takes the best correspondences between two certain entities into the final alignment. Each entity can be related just to one entity of other ontology. 1.2 Specific techniques used In this section, the main components of the CroMatcher will be described in details. The workflow and the main components of the system can be seen in the Fig. 1. The CroMatcher consists of the following components: 1. Data extraction from ontologies the information of every entity is extracted from given ontologies. After extraction of all data about certain entity, all textual data is normalized by tokenizing into set of tokens, and removing stop words. Data extraction from ontologies Terminological matchers Autoweighted aggregation (aggregated correspondences of terminological matchers) Structural matchers Autoweighted aggregation (final aggregation) Autoweighted aggregation (aggregated correspondences of structural matchers) Final alignment Parallel composition Parallel composition Fig. 1. The workflow and the main components of the Cromatcher 2. Terminological matchers:  Matcher that compares ID and annotations’ text of two entities (classes or properties) with the bi(tri)gram matcher (tests how many bi(tri)grams, i.e. (substrings of length 2, 3) are the same within two names, e.g. FTP and FTPServer have 2 bigrams FT and TP) [1]  Matcher that compares only label (or entity’s ID if the entity does not have label) of two entities (classes or properties) with the bi(tri)gram matcher  Matcher that compares textual profiles of two entities with TF/IDF [2] and cosine similarity [3]. A profile of class entity contains annotations of actual class entity (and all sub classes) and annotations of every property whose domain is actual class. A profile of property entity contains annotations of actual property entity and all sub properties.  Matcher that compares individuals of two entities with TF/IDF and cosine similarity. An individual of class entity contains individual values of actual class entity and individual values of all subclasses. An individual of property entity contains individual values of its range class entities.  Matcher that compares extra individuals of two entities with TF/IDF and cosine similarity. An extra individual of class entity contains individual values of first super class of actual class entity. An extra individual of property entity contains individual values of its domain and range class entities.  Matcher that compares some general data about the entities. A general data of class entity contains number of object (data) properties, number of restrictions and number of sub (super) class entities. A general data of property entity contains number of sub (super) property entities, number of domain class entities. More similar the general data, there is the greater correspondence between entities. 3. Structural matchers:  Matcher that compares the similarity between super entities (classes or properties) of currently compared entities. If the super entities are similar, compared entities are similar too. The matcher is executed iteratively and it ends when the correspondence value of compared entities stops changing. In each step, the new correspondence value of compared entities is calculated by summing 50% of the previous similarity value and 50% of the similarity value between super entities.  Matcher that compares the similarity between sub entities (classes or properties) of currently compared entities. If the sub entities are similar, the compared entities are similar too. The matcher is executed iteratively and it ends when the correspondence value of compared entities stops changing. In each step, the new correspondence value of compared entities is calculated by summing 50% of the previous similarity value and 50% of the similarity value between sub entities.  Matcher that compares the similarity between properties (and its range classes) that have the currently compared classes as their domain. A part of matcher for similarity between properties compares domain classes of properties.  Matcher that compares the similarity between range classes of currently compared properties. 4. Autoweighted aggregation for parallel composition of basic matchers: After the execution of terminological and structural matchers, the results of these matchers have to be aggregated together. In our system, we used a parallel composition of matchers for integration of multiple matchers. The main problem in parallel composition is how to aggregate the results obtained by every basic matcher. Weighted aggregation is one of the methods for aggregation of matchers [4]. This method determines a weighted sum of similarity values of the basic matchers and needs relative weights which should correspond to the expected importance of the basic matchers. The problem is how to determine the importance of every basic matcher. Our automatic Autoweight method proposed in [5] automatically defines the importance of various basic matchers in order to improve overall performance of the matching system. In this method, the importance of certain basic matcher is specified by determining the importance of individual best correspondences (greatest correspondences between two entities in both directions of mapping, as those correspondences are the most relevant) within the results obtained by that matcher. The importance of a certain correspondence found within the results of a basic matcher is higher when the same correspondence is found within a smaller number of other basic matchers. The method that finds the same correspondences as all other methods does not provide any new significant information for the matching process. 5. Process of final alignment: At the end, the selection of relevant correspondences, for inclusion in the final alignment, is executed iteratively. The final alignment includes only the greatest correspondences between entity1i (first ontology) and entity2j (second ontology). A correspondence between entity1i and entity2j is the greatest correspondence only if it has the greatest value among all correspondences in which the entity1i (or entity2j) is included. Threshold for these greatest correspondences is set to 0.15. We consider that this threshold is sufficient because the final alignment included only those correspondences that are the greatest for both compared entities. 1.3 Link to the system and parameters file A system can be downloaded from the http://www.seals-project.eu (tool identifier: e0fe95d5-943e-4652-bc53-5b36b712c9cb, version: 1.0).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CroMatcher results for OAEI 2016

Ontology matching plays an important role in the integration of heterogeneous data sources that are described by ontologies. In order to find correspondences between entities of different ontologies, a matching system has to be built. CroMatcher is an ontology matching system that consists of several string and structural basic matchers. As individual basic matcher computes similarity between e...

متن کامل

CroMatcher results for OAEI 2015

CroMatcher is an ontology matching system based on parallel composition of basic ontology matchers. There are two fundamental parts of the system: first, automated weighted aggregation of correspondences produced by different basic matchers in the parallel composition; second, an iterative final alignment method. This is the second time CroMatcher has been involved in the OAEI campaign. Basic i...

متن کامل

Results of the Ontology Alignment Evaluation Initiative 2013

Ontology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple thesauri to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation and consensus. OA...

متن کامل

Automating OAEI Campaigns

This paper reports the first effort into integrating OAEI and SEALS evaluation campaigns. OAEI is an annual evaluation campaign for ontology matching systems. The 2010 campaign includes a new modality in coordination with the SEALS project. This project aims at providing standardized resources (software components and data sets) for automatically executing evaluations of typical semantic web to...

متن کامل

Is my ontology matching system similar to yours?

The quality of the mappings computed by an ontology matching system in the Ontology Alignment Evaluation Initiative (OAEI) [2, 1] is typically measured in terms of precision and recall with respect to a reference set of mappings. Additionally, the OAEI also evaluates the coherence of the computed mappings [1]. However, the differences and similarities among the mappings computed by different sy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013