Dynamic Thread Pinning for Phase-Based OpenMP Programs
نویسندگان
چکیده
Thread affinity has appeared as an important technique to improve the overall program performance and for better performance stability. However, if we consider a program with multiple phases, it is unlikely that a single thread affinity produces the best program performance for all these phases. If we consider the case of OpenMP, applications may have multiple parallel regions, each with a distinct inter-thread data sharing pattern. In this paper, we propose an approach that allows to change thread affinity dynamically (thread migrations) between parallel regions at runtime to account for these distinct inter-thread data sharing patterns. We demonstrate that as far as cache sharing is concerned for SPEC OMP01, not all the tested OpenMP applications exhibit a distinct phase behavior. However, we show that while fixing thread affinity for the whole execution may improve performance by up to 30%, allowing dynamic thread pinning may improve performance by up to 40%. Furthermore, we provide an analysis about the required conditions to improve the effectiveness of the approach.
منابع مشابه
Empirical Comparison of Filtering Techniques for On-the-fly Data Race Detection in OpenMP Programs
It is a well-known that data races in implicit threading applications, such as OpenMP programs, are the most notorious class of concurrency bugs, because they may lead to unpredictable results of the program. The main drawback of on-the-fly data race detection techniques is the heavy additional overheads for analyzing every memory operations and thread operations, such as load, store, fork, and...
متن کاملCOBRA: A Framework for Continuous Profiling and Binary Re-Adaptation
Dynamic optimizers have shown to improve performance and power efficiency of single-threaded applications. Multithreaded applications running on CMP, SMP and cc-NUMA systems also exhibit opportunities for dynamic binary optimization. Existing dynamic optimizers lack efficient monitoring schemes for multiple threads to support appropriate thread specific or system-wide optimization for a collect...
متن کاملExtending Global Optimizations in the OpenUH Compiler for OpenMP
This paper presents our design and implementation of a framework for analyzing and optimizing OpenMP programs within the OpenUH compiler, which is based on Open64. The paper describes the existing analyses and optimizations in OpenUH, and explains why the compiler may not apply classical optimizations to OpenMP programs directly. It then presents an enhanced compiler framework including Paralle...
متن کاملCharacterizing Task-Based OpenMP Programs
Programmers struggle to understand performance of task-based OpenMP programs since profiling tools only report thread-based performance. Performance tuning also requires task-based performance in order to balance per-task memory hierarchy utilization against exposed task parallelism. We provide a cost-effective method to extract detailed task-based performance information from OpenMP programs. ...
متن کاملProcessor-Oblivious Record and Replay
Record-and-replay systems are useful tools for debugging non-deterministic parallel programs by first recording an execution and then replaying that execution to produce the same access pattern. Existing record-and-replay systems generally target thread-based execution models, and record the behaviors and interleavings of individual threads. Dynamic multithreaded languages and libraries, such a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013