Dynamic Thread Pinning for Phase-Based OpenMP Programs

نویسندگان

  • Abdelhafid Mazouz
  • Sid Ahmed Ali Touati
  • Denis Barthou
چکیده

Thread affinity has appeared as an important technique to improve the overall program performance and for better performance stability. However, if we consider a program with multiple phases, it is unlikely that a single thread affinity produces the best program performance for all these phases. If we consider the case of OpenMP, applications may have multiple parallel regions, each with a distinct inter-thread data sharing pattern. In this paper, we propose an approach that allows to change thread affinity dynamically (thread migrations) between parallel regions at runtime to account for these distinct inter-thread data sharing patterns. We demonstrate that as far as cache sharing is concerned for SPEC OMP01, not all the tested OpenMP applications exhibit a distinct phase behavior. However, we show that while fixing thread affinity for the whole execution may improve performance by up to 30%, allowing dynamic thread pinning may improve performance by up to 40%. Furthermore, we provide an analysis about the required conditions to improve the effectiveness of the approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Empirical Comparison of Filtering Techniques for On-the-fly Data Race Detection in OpenMP Programs

It is a well-known that data races in implicit threading applications, such as OpenMP programs, are the most notorious class of concurrency bugs, because they may lead to unpredictable results of the program. The main drawback of on-the-fly data race detection techniques is the heavy additional overheads for analyzing every memory operations and thread operations, such as load, store, fork, and...

متن کامل

COBRA: A Framework for Continuous Profiling and Binary Re-Adaptation

Dynamic optimizers have shown to improve performance and power efficiency of single-threaded applications. Multithreaded applications running on CMP, SMP and cc-NUMA systems also exhibit opportunities for dynamic binary optimization. Existing dynamic optimizers lack efficient monitoring schemes for multiple threads to support appropriate thread specific or system-wide optimization for a collect...

متن کامل

Extending Global Optimizations in the OpenUH Compiler for OpenMP

This paper presents our design and implementation of a framework for analyzing and optimizing OpenMP programs within the OpenUH compiler, which is based on Open64. The paper describes the existing analyses and optimizations in OpenUH, and explains why the compiler may not apply classical optimizations to OpenMP programs directly. It then presents an enhanced compiler framework including Paralle...

متن کامل

Characterizing Task-Based OpenMP Programs

Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. ...

متن کامل

Processor-Oblivious Record and Replay

Record-and-replay systems are useful tools for debugging non-deterministic parallel programs by first recording an execution and then replaying that execution to produce the same access pattern. Existing record-and-replay systems generally target thread-based execution models, and record the behaviors and interleavings of individual threads. Dynamic multithreaded languages and libraries, such a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013