Parallel and Distributed Data Mining: An Introduction
نویسنده
چکیده
The explosive growth in data collection in business and scientific fields has literally forced upon us the need to analyze and mine useful knowledge from it. Data mining refers to the entire process of extracting useful and novel patterns/models from large datasets. Due to the huge size of data and amount of computation involved in data mining, high-performance computing is an essential component for any successful large-scale data mining application. This chapter presents a survey on large-scale parallel and distributed data mining algorithms and systems, serving as an introduction to the rest of this volume. It also discusses the issues and challenges that must be overcome for designing and implementing successful tools for large-scale data mining.
منابع مشابه
Web Based Parallel/Distributed Medical Data Mining Using Software Agents
Using Software Agents Hillol Kargupta, Brian Sta ord, and Ilker Hamzaoglu Computational Science Methods Group X Division, Los Alamos National Laboratory P.O. Box 1663, MS F645, Los Alamos, NM, 87545 This paper describes an experimental parallel/distributed data mining system PADMA (PArallel Data Mining Agents) that uses software agents for local data accessing and analysis and a web based inter...
متن کاملWeighted Itemset Mining from Bigdata using Hadoop
Data items have been extracted using an empirical data mining technique called frequent itemset mining. In majority of theapplication contexts items are enriched with weights. Pushing an item weights into the itemset extraction process, i.e., mining weighted itemsets rather than traditional itemsets, is an appealing research direction. Although many efficient weighteditemset mining algorithms a...
متن کاملParallel Computing for Mining Association Rules in Distributed P2P Networks
Distributed computing and Peer-to-Peer (P2P) systems have emerged as an active research field that combines techniques which cover networks, distributed computing, distributed database, and the various distributed applications. Distributed Computing and P2P systems realize information systems that scale to voluminous information on very large numbers of participating nodes. Data mining on large...
متن کاملTowards Parallel and Distributed Computing in Large-Scale Data Mining: A Survey
The implementation of data mining ideas in high-performance parallel and distributed computing environments is becoming crucial for ensuring system scalability and interactivity as data continues to grow inexorably in size and complexity. This paper is a survey on the parallelization of well-known data mining techniques covering classification, link analysis, clustering and sequential learning,...
متن کاملGrid - based Distributed Data Mining Systems , Algorithms and Services ∗
Distribution of data and computation allows for solving larger problems and execute applications that are distributed in nature. The Grid is a distributed computing infrastructure that enables coordinated resource sharing within dynamic organizations consisting of individuals, institutions, and resources. The Grid extends the distributed and parallel computing paradigms allowing resource negoti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999