Modeling GPU Dynamic Parallelism for self similar density workloads

نویسندگان

چکیده

Dynamic Parallelism (DP) is a GPU programming abstraction that can make parallel computation more efficient for problems exhibit heterogeneous workloads. With DP, threads launch kernels with threads, recursively, producing subdivision effect where resources are focused on the regions work. Doing an optimal process not trivial, as combination of different parameters play relevant role in final performance DP. Also, current DP relies kernel recursion, which has overhead. This work presents new cost model self similar density (SSD) workloads, useful finding schemes. implementation free recursion overhead presented, named Adaptive Serial Kernels (ASK). Using Mandelbrot set case study, shows achieved when using {g∼32,r∼2,B∼32} initial subdivision, recurrent and stopping size, respectively. Experimental results agree theoretical parameters, confirming usability model. In terms performance, ASK approach runs up to ∼60% faster than set, 12× basic exhaustive implementation, whereas 7.5× faster. energy efficiency, ∼2× ∼20× approach, These put tools analyzing potential improvement based approaches developing GPU-based libraries or fine-tune specific codes research teams.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring Parallelism in Transactional Workloads

Multicores are now standard in most machines, which means that many programmers are faced with the challenge of how to take advantage of all the potential parallelism. Transactional Memory (TM) promises to simplify this task. Yet, at the same time, TM inhibits the programmer from fully exploring the latent parallelism in his application. In particular, it does not allow a transaction to contain...

متن کامل

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System

NVIDIA’s Compute Unified Device Architecture (CUDA) and its attached C/C++ based API went a long way towards making GPUs more accessible to mainstream programming. So far, the use of GPUs for high performance computing has been primarily restricted to data parallel applications, and with good reason. The high number of computational cores and high memory bandwidth supported by the device makes ...

متن کامل

Modeling Self-Similar Traffic for Network Simulation

In order to closely simulate the real network scenario thereby verify the effectiveness of protocol designs, it is necessary to model the traffic flows carried over realistic networks. Extensive studies [1] showed that the actual traffic in access and local area networks (e.g., those generated by ftp and video streams) exhibits the property of self-similarity and long-range dependency (LRD) [2]...

متن کامل

Dynamic Task Parallelism and Nonblocking Communication for Scalable Ecosystem Modeling

Climate change can have devastating effects on a wide variety of terrestrial ecosystems. The Dynamic Land Ecosystem Model (DLEM) enables scientists to computationally analyze, understand, and quantify the dynamics and evolution of ecosystems at large spatio-temporal scales. In order to overcome fundamental limitations on the execution performance of DLEM, we have designed pDLEM, a parallel vers...

متن کامل

Pthreads for Dynamic Parallelism

Expressing a large number of lightweight, parallel threads in a shared address space significantly eases the task of writing a parallel program. Threads can be dynamically created to execute individual parallel tasks; the implementation schedules these threads onto the processors and effectively balances the load. However, unless the threads scheduler is designed carefully, such a parallel prog...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Future Generation Computer Systems

سال: 2023

ISSN: ['0167-739X', '1872-7115']

DOI: https://doi.org/10.1016/j.future.2023.03.046