Grid-based Approaches for Distributed Data Mining Applications

نویسندگان

  • Lamine M. Aouad
  • Nhien-An Le-Khac
  • M. Tahar Kechadi
چکیده

The data mining field is an important source of large-scale applications and datasets which are getting more and more common. In this paper, we present grid-based approaches for two basic data mining applications, and a performance evaluation on an experimental grid environment that provides interesting monitoring capabilities and configuration tools. We propose a new distributed clustering approach and a distributed frequent itemsets generation well-adapted for grid environments. Performance evaluation is done using the Condor system and its workflow manager DAGMan. We also compare this performance analysis to a simple analytical model to evaluate the overheads related to the workflow engine and the underlying grid system. This will specifically show that realistic performance expectations are currently difficult to achieve on the grid.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grid - based Distributed Data Mining Systems , Algorithms and Services ∗

Distribution of data and computation allows for solving larger problems and execute applications that are distributed in nature. The Grid is a distributed computing infrastructure that enables coordinated resource sharing within dynamic organizations consisting of individuals, institutions, and resources. The Grid extends the distributed and parallel computing paradigms allowing resource negoti...

متن کامل

Distributed Mining of Molecular Fragments

In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However,...

متن کامل

A Grid-Based Distributed SVM Data Mining Algorithm

Distribution of data and manipulation allows for solving larger problems and executing applications that are distributed in nature. In this paper we present a grid-based distributed Support Vector Machine (SVM) algorithm. The Grid is a distributed computing infrastructure that enables coordinated resource sharing within dynamic organizations consisting of individuals, in situations and resource...

متن کامل

OpenMolGRIND: Molecular Science and Engineering in a Grid Context

Modern approaches to chemistry and pharmacology deal with large-scale molecular design problems. The molecular design is essentially based on data warehousing and data mining. Data warehousing techniques are needed to collect relevant data from distributed and heterogeneous databases. Data mining techniques are used to build predictive quantitative structure-property and activity relationship m...

متن کامل

A Data Mining Ontology for Grid Programming

The Grid is an integrated infrastructure for coordinated resource sharing and problem solving in distributed environments. The effective and efficient use of stored data and its transformation into information and knowledge will be a main driver in Grid evolution. The use of ontologies to describe Grid resources will simplify and structure the systematic building of Grid applications through th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1703.09807  شماره 

صفحات  -

تاریخ انتشار 2007