Effective and Efficient Pruning of Meta-Classifiers in a Distributed Data Mining System
نویسندگان
چکیده
Distributed data mining systems aim to discover and combine useful information that is distributed across multiple databases. One of the main challenges is the design of effective and efficient methods to combine multiple models computed over multiple distributed sources that scale well over many large distributed databases. We describe in detail several methods that evaluate, prune and combine large collections of imported models computed at remote sites into efficient and scalable meta-classifiers. We demonstrate and evaluate the pruning methods by detailing many experiments performed on actual credit card data sets supplied by collaborating financial institutions, where the target learning task is fraud detection. We show that pruned meta-classifiers can sustain or even improve predictive performance at a substantially higher throughput, compared to the unpruned meta-classifiers.
منابع مشابه
Pruning Meta-Classifiers in a Distributed Data Mining System CUCS-032-97
JAM is a powerful and portable agent-based distributed data mining system that employs metalearning techniques to integrate a number of independent classifiers (models) derived in parallel from independent and (possibly) inherently distributed databases. Although meta-learning promotes scalability and accuracy in a simple and straightforward manner, brute force meta-learning techniques can resu...
متن کاملPruning Meta-Classifiers in a Distributed Data Mining System
JAM is a powerful and portable agent-based distributed data mining system that employs metalearning techniques to integrate a number of independent classifiers (models) derived in parallel from independent and (possibly) inherently distributed databases. Although meta-learning promotes scalability and accuracy in a simple and straightforward manner, brute force metalearning techniques can resul...
متن کاملPruning Classifiers in a Distributed Meta-Learning System
JAM is a powerful and portable agent-based distributed data mining system that employs meta-learning techniques to integrate a number of independent classifiers (concepts) derived in parallel from independent and (possibly) inherently distributed databases. Although metalearning promotes scalability and accuracy in a simple and straightforward manner, brute force meta-learning techniques can re...
متن کاملMeta-learning in distributed data mining systems: Issues and approaches
Data mining systems aim to discover patterns and extract useful information from facts recorded in databases. A widely adopted approach to this objective is to apply various machine learning algorithms to compute descriptive models of the available data. Here, we explore one of the main challenges in this research area, the development of techniques that scale up to large and possibly physicall...
متن کاملCost Complexity Pruning of Ensemble Classifiers
In this paper we study methods that combine multiple classification models learned over separate data sets in a distributed database setting. Numerous studies posit that such approaches provide the means to efficiently scale learning to large datasets, while also boosting the accuracy of individual classifiers. These gains, however, come at the expense of an increased demand for run-time system...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999