Thread-level synthetic benchmarks for multicore systems
نویسندگان
چکیده
One of the commonly used techniques to speedup early architectural exploration and performance evaluation of new hardware architectures is to use synthetic benchmarks. This paper presents a novel automated thread-level synthetic benchmark generation framework with characterization and generation components. The resulting thread-level synthetic benchmarks are fast, portable, human-readable, and they accurately mimic the micro-architecture dependent and independent characteristics of each thread in original application. We demonstrate that we can generate multi-threaded synthetic benchmarks for real-life PARSEC and Rodinia benchmarks, while being faster (on average 147 ) and smaller (on average 11 ) than originals. The obtained results show that synthetic benchmarks not only accurately preserve thread-level micro-architecture dependent and independent characteristics but also parallel programming patterns, which are high-quality solutions to frequently occurring problems in parallel programming. 2015 Elsevier B.V. All rights reserved.
منابع مشابه
Exploiting fine-grain thread parallelism on multicore architectures
In this work we present a runtime threading system which provides an efficient substrate for fine-grain parallelism, suitable for deployment in multicore platforms. Its architecture encompasses a number of optimizations that make it particularly effective in managing a large number of threads and with low overheads. The runtime system has been integrated into an OpenMP implementation to allow f...
متن کاملApplication Characteristics of Many-tasking Execution Models
Performance gain for computer systems through Moore’s Law is jeopardized by the limitations of clock rate growth due to power considerations and the limitations in instruction-level parallelism improvement from processor core computer architecture experienced over the last decade. High performance computer architectures are addressing this challenge through multicore processors that combine man...
متن کاملUnderstanding Concurrency for Graph Workloads in Large Scale Multicores
Algorithms operating on a graph setting are known to be highly irregular and unstructured. This leads to workload imbalance and data locality challenge when these algorithms are parallelized and executed on the evolving multicore processors. Previous parallel benchmark suites for shared memory multicores have focused on various workload domains, such as scientific, graphics, and vision. However...
متن کاملTowards Autotuning of OpenMP Applications on Multicore Architectures
In this paper we describe an autotuning tool for optimization of OpenMP applications on highly multicore and multithreaded architectures. Our work was motivated by in-depth performance analysis of scientific applications and synthetic benchmarks on IBM Power 775 architecture. The tool provides an automatic code instrumentation of OpenMP parallel regions. Based on measurement of chosen hardware ...
متن کاملOpenMP task scheduling strategies for multicore NUMA systems
The recent addition of task parallelism to the OpenMP shared memory API allows programmers to express concurrency at a high level of abstraction and places the burden of scheduling parallel execution on the OpenMP run time system. Efficient scheduling of tasks on modern multi-socket multicore shared memory systems requires careful consideration of an increasingly complex memory hierarchy, inclu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Microprocessors and Microsystems - Embedded Hardware Design
دوره 39 شماره
صفحات -
تاریخ انتشار 2015