Excel-NUMA: Toward Programmability, Simplicity, and High Performance

نویسندگان

  • Zheng Zhang
  • Marcelo H. Cintra
  • Josep Torrellas
چکیده

ÐWhile hardware-coherent scalable shared-memory multiprocessors are relatively easy to program, they still require substantial programming effort to deliver high performance. Specifically, to minimize remote accesses, data must be carefully laid out in memory for locality and application working sets carefully tuned for caches. It has been claimed that this programming effort is less necessary in hardware COMA machines like Flat-COMA thanks to automatic line-based data migration. Unfortunately, Flat-COMA is complex to design. Consequently, we would like a machine as programmable as Flat-COMA, as simple as plain CC-NUMA, and that outperforms both. This paper presents our proposal: Excel-NUMA (EX-NUMA). The idea is to exploit the fact that, after a memory line is written and cached, the storage that kept the line in memory is unutilized. We use that storage to temporarily hold remote data displaced from the local caches. This enables automatic data migration, like in Flat-COMA, enhancing programmability. The hardware required to manage the system is a simple, local module added to a CC-NUMA; the global cache coherence protocol is not changed. Simulations of Splash2 applications show that EX-NUMA outperforms CC-NUMA and Flat-COMA in every single application and eliminates most of the conflict misses. Index TermsÐShared-memory multiprocessors, NUMA organizations, cache-coherence protocols, caches, performance evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Transparent Runtime Data Distribution Engine for OpenMP

This paper makes two important contributions. First, the paper investigates the performance implications of data placement in OpenMP programs running on modern NUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of contempo...

متن کامل

C3D: Mitigating the NUMA bottleneck via coherent DRAM caches

Massive datasets prevalent in scale-out, enterprise, and high-performance computing are driving a trend toward ever-larger memory capacities per node. To satisfy the memory demands and maximize performance per unit cost, today’s commodity HPC and server nodes tend to feature multi-socket shared memory NUMA organizations. An important problem in these designs is the high latency of accessing mem...

متن کامل

Performance Modelling for Parallel PDE Solvers on NUMA-Systems

A detailed model of the memory performance of a PDE solver running on a NUMA-system is set up. Due to the complexity of modern computers, such a detailed model inevitably is very complicated. Therefore, approximations are introduced that simplify the model and allows NUMA-systems and PDE solvers to be described conveniently. Using the simpli ed model, it is shown that PDE solvers using ordered ...

متن کامل

Towards Programmability of a NUMA-Aware Storage Engine

The SQL database language was originally intended for application programmers. However, after more than 20 years of language extensions, SQL can only be generated by software components and is no longer suitable for an increasing user base like knowledge workers or data scientists, who want to work with data in an interactive fashion. The original idea of declarative query languages, telling th...

متن کامل

Improving Parallel System Performance with a NUMA-aware Load Balancer

Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high performance computing. On such NUMA nodes, the shared memory is physically distributed into memory banks connected by a network. Owing to this, memory access costs may vary depending on the distance between the processing unit and the memory bank. Therefore, a key element in improving the performance o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Computers

دوره 48  شماره 

صفحات  -

تاریخ انتشار 1999