نتایج جستجو برای: many core architectures

تعداد نتایج: 1178744  

Journal: :SIAM J. Scientific Computing 2016
Karl Rupp Philippe Tillet Florian Rudolf Josef Weinbub Andreas Morhammer Tibor Grasser Ansgar Jüngel Siegfried Selberherr

CUDA, OpenCL, and OpenMP are popular programming models for the multi-core architectures of CPUs and many-core architectures of GPUs or Xeon Phis. At the same time, computational scientists face the question of which programming model to use to obtain their scientific results. We present the linear algebra library ViennaCL, which is built on top of all three programming models, thus enabling co...

Journal: :Concurrency and Computation: Practice and Experience 2016
Shuo Li James Lin

In this paper, we start by looking at the algorithms and the numerical methods of pricing one exotic option, the strong path dependent Asian option using the Black–Scholes pricing model. We cover both geometric average and arithmetic average schemes that lead us to two different numerical solutions. Next, we discuss how to implement these algorithms on the leading many-core architectures with c...

Journal: :IET Computers & Digital Techniques 2016
Aviral Shrivastava Nikil D. Dutt Jian Cai Majid Namaki-Shoushtari Bryan Donyanavard Hossein Tajik

Software Programmable Memories, or SPMs, are raw on-chip memories that are not implicitly managed by the processor hardware, but explicitly by software. For example, while caches fetch data from memories automatically and maintain coherence with other caches, SPMs explicitly manage data movement between memories and other SPMs through software instructions. SPMs make the design of on-chip memor...

2012
Elkin Garcia Daniel Orozco Robert Pavel Guang R. Gao

The recent evolution of many-core architectures has resulted in chips where the number of processor elements (PEs) are in the hundreds and continue to increase every day. In addition, many-core processors are more and more frequently characterized by the diversity of their resources and the way the sharing of those resources is arbitrated. On such a machine, task scheduling is of paramount impo...

2014
Yao Wu Guang R. Gao

The upcoming exa-scale era requires a parallel program execution model capable of achieving scalability, productivity, energy efficiency, and resiliency. The codelet model is a fine-grained dataflow-inspired execution model which is the focus of several tera-scale and exa-scale studies such as DARPA’s UHPC, DOE’s X-Stack, and the European TERAFLUX projects. Current codelet implementations aim t...

2011
G. R. Markall A. Slemmer D. A .Ham P. H. J. Kelly C. D. Cantwell S. J. Sherwin

We demonstrate that radically differing implementations of finite element methods are needed on multicore (CPU) and many-core (GPU) architectures, if their respective performance potential is to be realised. Our experimental investigations using a finite element advection-diffusion solver show that increased performance on each architecture can only be achieved by committing to specific and div...

2015
Valeria Cardellini Alessandro Fanfarillo Salvatore Filippone Damian W. I. Rouson

Accelerators such as NVIDIA GPUs and Intel MICs are currently provided as co-processor devices, usable only through a CPU host. For Intel MICs it is planned that this constraint will be lifted in the near future: CPU and accelerator(s) will then form a single, many-core, processor capable of peak performance of several Teraflops with high energy efficiency. In order to exploit the available com...

2014
Vincent Nélis Patrick Meumeu Yomsi Luís Miguel Pinho José Carlos Fonseca Marko Bertogna Eduardo Quiñones Roberto Vargas Andrea Marongiu

The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. Many recent HPC applications require huge amounts of information to be processed within a bounded amount of time while EC systems are increasingly concerned with providing higher performance in real-time. T...

2007
John Sartori Rakesh Kumar

While power has long been a well-studied problem, most dynamic power reduction techniques, e.g., V/f scaling, clock gating, etc., exploit slack in the execution behavior of programs to reduce average power. Peak power is often left untouched. However, peak power plays a large role in determining the characteristics and hence the cost of the power supply, thermal budgeting for the chip, as well ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید