Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous, Distributed Information Sources

نویسندگان

Doina Caragea

Jun Zhang

Jie Bao

Jyotishman Pathak

Vasant Honavar

چکیده

Development of high throughput data acquisition technologies, together with advances in computing, and communications have resulted in an explosive growth in the number, size, and diversity of potentially useful information sources. This has resulted in unprecedented opportunities in data-driven knowledge acquisition and decisionmaking in a number of emerging increasingly data-rich application domains such as bioinformatics, environmental informatics, enterprise informatics, and social informatics (among others). However, the massive size, semantic heterogeneity, autonomy, and distributed nature of the data repositories present significant hurdles in acquiring useful knowledge from the available data. This paper introduces some of the algorithmic and statistical problems that arise in such a setting, describes algorithms for learning classifiers from distributed data that offer rigorous performance guarantees (relative to their centralized or batch counterparts). It also describes how this approach can be extended to work with autonomous, and hence, inevitably semantically heterogeneous data sources, by making explicit, the ontologies (attributes and relationships between attributes) associated with the data sources and reconciling the semantic differences among the data sources from a user’s point of view. This allows user or context-dependent exploration of semantically heterogeneous data sources. The resulting algorithms have been implemented in INDUS an open source software package for collaborative discovery from autonomous, semantically heterogeneous, distributed data sources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information Integration and Knowledge Acquisition from Semantically Heterogeneous Biological Data Sources

We present INDUS (Intelligent Data Understanding System), a federated, query-centric system for knowledge acquisition from autonomous, distributed, semantically heterogeneous data sources that can be viewed (conceptually) as tables. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of such data sources (regardless of location, internal...

متن کامل

Learning Classifiers from Distributed, Ontology-Extended Data Sources

There is an urgent need for sound approaches to integrative and collaborative analysis of large, autonomous (and hence, inevitably semantically heterogeneous) data sources in several increasingly data-rich application domains. In this paper, we precisely formulate and solve the problem of learning classifiers from such data sources, in a setting where each data source has a hierarchical ontolog...

متن کامل

Learning Relational Bayesian Classifiers on the Semantic Web

With the advent of the Semantic Web, there is an increased availability of meta data (ontologies) that make explicit the semantic commitments associated with data and an urgent need for machine learning algorithms for building predictive models from such data. Usually, there is no unique global interpretation of data from semantically disparate, autonomous sources. Furthermore, it is neither fe...

متن کامل

Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources (KADASH)

ion. For example, the program of study a student in a data source can be specified as Graduate Program (higher level of abstraction), while the program of study of a different student in the same data source (or even a different data source) can be specified as Doctoral Program (lower level of abstraction). 2005 IEEE ICDM Workshop on KADASH 5 The workshop brings together researchers in relevant...

متن کامل

Towards Semantics-Enabled Distributed Infrastructure for Knowledge Acquisition

We summarize progress on algorithms and software knowledge acquisition from large, distributed, autonomous, and semantically disparate information sources. Some key results include: scalable algorithms for constructing predictive models from data based on a novel decomposition of learning algorithms that interleaves queries for sufficient statistics from data with computations using the statist...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous, Distributed Information Sources

نویسندگان

چکیده

منابع مشابه

Information Integration and Knowledge Acquisition from Semantically Heterogeneous Biological Data Sources

Learning Classifiers from Distributed, Ontology-Extended Data Sources

Learning Relational Bayesian Classifiers on the Semantic Web

Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources (KADASH)

Towards Semantics-Enabled Distributed Infrastructure for Knowledge Acquisition

عنوان ژورنال:

اشتراک گذاری