TOLERATING FIRST LEVEL MEMORY ACCESS LATENCYIN HIGH - PERFORMANCE SYSTEMSWilliam

نویسندگان

  • William Y. Chen
  • Scott A. Mahlke
  • Wen-mei W. Hwu
چکیده

In order to improve performance, future parallel systems will continue to increase the processing power of each node in a system. As node processors, though, can execute more instructions concurrently, they become more sensitive to the rst level memory access latency. This paper presents a set of hardware and software techniques , collectively referred to as register preloading, to effectively tolerate long rst level memory access latency. The techniques include speculative execution, loop unrolling, dynamic memory disambiguation, and strip-mining. Results show that register preloading provides excellent tolerance to rst level memory access latency up to 16 cycles for an issue 4 node processor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tolerating First Level Memory Access Latency in High-Performance Systems

In order to improve performance, future parallel systems will continue to increase the processing power of each node in a system. As node processors, though, can execute more instructions concurrently, they become more sensitive to the rst level memory access latency. This paper presents a set of hardware and software techniques, collectively referred to as register preloading, to effectively t...

متن کامل

An improved SRAM cell design for tolerating radiation-induced single-event effects

This paper presents an improved design of a radiationhardened static random access memory (SRAM) cell. The memory cell is designed to be tolerant to transient single-event upsets by taking advantage of the fact that for the same area, the surface mobility of NMOS transistors is greater than that of PMOS transistors. The results show that the proposed design is able to withstand single-event ups...

متن کامل

Dynamic programming in faulty memory hierarchies (cache-obliviously)

Random access memories suffer from transient errors that lead the logical state of some bits to be read differently from how they were last written. Due to technological constraints, caches in the memory hierarchy of modern computer platforms appear to be particularly prone to bit flips. Since algorithms implicitly assume data to be stored in reliable memories, they might easily exhibit unpredi...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992