gpu parallel computation

Accelerating the image registration of MRI volumes by modern GPGPU parallel computation

2009

S-Y. Ju Y-W. Tang T-Y. Huang

Introduction Image registration [1] has been an important topic in the MRI applications, such as longitudinal follow-up studies, brain-normalization for group statistics and motion correction for fMRI studies. There are many different algorithms of image registrations. In general, the calculations require a lot of iterations of coordinate transformations to find the displacements and rotations ...

متن کامل

Fast Speaker Diarization Using a Specialization Framework for Gaussian Mixture Model Training

2011

Ekaterina Gonina Kurt Keutzer Armando Fox

Most current speaker diarization systems use agglomerative clustering of Gaussian Mixture Models (GMMs) to determine “who spoke when” in an audio recording. While state-of-the-art in accuracy, this method is computationally costly, mostly due to the GMM training, and thus limits the performance of current approaches to be roughly real-time. Increased sizes of current datasets require processing...

متن کامل

Accelerating Computation of DCM for ERP in MATLAB by External Function Calls to the GPU

2013

Wei-Jen Wang I-Fan Hsieh Chun-Chuan Chen

This study aims to improve the performance of Dynamic Causal Modelling for Event Related Potentials (DCM for ERP) in MATLAB by using external function calls to a graphics processing unit (GPU). DCM for ERP is an advanced method for studying neuronal effective connectivity. DCM utilizes an iterative procedure, the expectation maximization (EM) algorithm, to find the optimal parameters given a se...

متن کامل

A Study on Adaptive Algorithms for Numerical Quadrature on Heterogeneous GPU and Multicore Based Systems

2013

Giuliano Laccetti Marco Lapegna Valeria Mele Diego Romano

In this work, a parallel adaptive algorithm for the computation of a multidimensional integral on heterogeneous GPU and multicore based systems is described. Two different strategies have been combined together in the algorithm: a first procedure is responsible for the load balancing among the threads on the multicore CPU and a second one is responsible for an efficient execution on the GPU of ...

متن کامل

Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads

2013

Dongyou Seo Hyeonsang Eom Heon Y. Yeom

Todays, there are many studies in complicated computation and big data processing by using the high performance computability of GPU. Tesla K20X recently announced by NVIDIA provides 3.95 TFLOPS in precision floating point performance [1]. The performance of K20X is 10 times higher than Intel’s high-end CPUs. Due to the high performance computability of GPU, K20X was adapted to Titan, the first...

متن کامل

Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

2014

Huayou Su Mei Wen Nan Wu Ju Ren Chunyuan Zhang

Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, su...

متن کامل

Runtime Analysis of GPU-Based Stereo Matching

2015

Christian Zentner Yan Liu

This paper elaborates on the possibility to leverage the highly parallel nature of GPUs to implement more efficient stereo matching algorithms. Different algorithms have been implemented and compared on the CPU and the GPU in order to show the speedup gained by moving the computation to the graphics card. The results were evaluated for accuracy using the test available on the Middlebury website...

متن کامل

High-Performance High-Order Simulation of Wave and Plasma Phenomena

2010

Andreas Klöckner Jan Sickmann

of “High-Performance High-Order Simulation of Wave and Plasma Phenomena” by Andreas Klöckner, Ph.D., Brown University, May 2010 This thesis presents results aiming to enhance and broaden the applicability of the discontinuous Galerkin (“DG”) method in a variety of ways. DG was chosen as a foundation for this work because it yields high-order finite element discretizations with very favorable nu...

متن کامل

Streaming Collision Detection Using Programmable GPU

2003

Zhaowei Fan Huagen Wan Shuming Gao

Real-time collision detection is required by most of all computer graphics applications. However, the current collision detection methods still have difficulties in achieving real time. Recent advances in programmable graphics hardware (GPU) make it possible to be used in general-purpose computation. In this paper, we explore to solve the collision detection problem with programmable GPUs. An a...

متن کامل

A Map Reduce Framework for Programming Graphics Processors

2008

Bryan Catanzaro Narayanan Sundaram Kurt Keutzer

Recent developments in programmable, highly parallel Graphics Processing Units (GPUs) have enabled high performance general purpose computation. We describe a framework designed for high performance GPU programming, built on Nvidia’s Compute Unified Device Architecture (CUDA) platform. The framework is built around the Map Reduce abstraction, which allows application developers to focus on thei...

متن کامل