Compiler and Runtime Support for Shared Memory Parallelization of Data Mining Algorithms

نویسندگان

  • Xiaogang Li
  • Ruoming Jin
  • Gagan Agrawal
چکیده

Data mining techniques focus on finding novel and useful patterns or models from large datasets. Because of the volume of the data to be analyzed, the amount of computation involved, and the need for rapid or even interactive analysis, data mining applications require the use of parallel machines. We have been developing compiler and runtime support for developing scalable implementations of data mining algorithms. Our work encompasses shared memory parallelization, distributed memory parallelization, and optimizations for processing disk-resident datasets. In this paper, we focus on compiler and runtime support for shared memory parallelization of data mining algorithms. We have developed a set of parallelization techniques that apply across algorithms for a variety of mining tasks. We describe the interface of the middleware where these techniques are implemented. Then, we present compiler techniques for translating data parallel code to the middleware specification. Finally, we present a brief evaluation of our compiler using apriori association mining and k-means clustering.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Evaluation of a High-Level Interface for Data Mining

This paper presents a case study in developing an application class specific high-level interface for shared memory parallel programming. The application class we focus on is data mining. With the availability of large datasets in areas like bioinformatics, medical informatics, scientific data analysis, financial analysis, telecommunications, retailing, and marketing, data mining tasks have bec...

متن کامل

Compiler and Middleware Support for Scalable Data Mining

High performance data mining is emerging as an important class of parallel applications. The expertise and eeort currently required in implementing, maintaining, and performance tuning a parallel data mining application is currently an impediment in the wide use of parallel computers for data mining. We have developed a data parallel dialect of Java that can be used for expressing common data m...

متن کامل

Compiling Sequential Programs for Speculative Parallelism

We present a runtime system and a parallelizing compiler for exploiting speculative parallelism in sequential programs. In speculative executions, the computation consists of tasks which may start before their data or control dependencies are resolved; dependency violation is detected and corrected at runtime. Our runtime system provides a shared memory abstraction and ensures that shared acces...

متن کامل

Shared memory multiprocessor support for functional array processing in SAC

Classical application domains of parallel computing are dominated by processing large arrays of numerical data. Whereas most functional languages focus on lists and trees rather than on arrays, SaC is tailor-made in design and in implementation for efficient high-level array processing. Advanced compiler optimizations yield performance levels that are often competitive with low-level imperative...

متن کامل

Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences

ÐThis paper presents a time stamp algorithm for runtime parallelization of general DOACROSS loops that have indirect access patterns. The algorithm follows the INSPECTOR/EXECUTOR scheme and exploits parallelism at a fine-grained memory reference level. It features a parallel inspector and improves upon previous algorithms of the same generality by exploiting parallelism among consecutive reads ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002