Exploiting Application-Level Information to Reduce Memory Bandwidth Consumption
نویسندگان
چکیده
As processors continue to deliver higher levels of performance and as memory latency tolerance techniques become widespread to address the increasing cost of accessing memory, memory bandwidth will emerge as a major performance bottleneck. Rather than rely solely on wider and faster memories to address memory bandwidth shortages, an alternative is to use existing memory bandwidth more efficiently. A promising approach is hardware-based selective subblocking [14, 2]. In this technique, hardware predictors track the portions of cache blocks that are referenced by the processor. On a cache miss, the predictors are consulted and only previously referenced portions are fetched into the cache, thus conserving memory bandwidth. This paper proposes a software-centric approch to selective sub-blocking. We make the key observation that wasteful data fetching inside long cache blocks arises due to certain sparse memory references, and that such memory references can be identified in the application source code. Rather than use hardware predictors to discover sparse memory reference patterns from the dynamic memory reference stream, our approach relies on the programmer or compiler to identify the sparse memory references statically, and to use special annotated memory instructions to specify the amount of spatial reuse associated with such memory references. At runtime, the size annotations select the amount of data to fetch on each cache miss, thus fetching only data that will likely be accessed by the processor. Our results show annotated memory instructions remove between 54% and 71% of cache traffic for 7 applications, reducing more traffic than hardware selective sub-blocking using a 32 Kbyte predictor on all applications, and reducing as much traffic as hardware selective sub-blocking using an 8 Mbyte predictor on 5 out of 7 applications. Overall, annotated memory instructions achieve a 17% performance gain when used alone, and a 22.3% performance gain when combined with software prefetching, compared to a 7.2% performance degradation when prefetching without annotated memory instructions.
منابع مشابه
Exploiting Application-Level Information to Reduce Memory Bandwidth Consumption
As processors continue to deliver ever higher levels of performance and as memory latency tolerance techniques become widespread to address the increasing cost of accessing memory, memory bandwidth will emerge as a major limitation to continued increases in application performance. In this paper, we propose a hybrid hardware/software technique for addressing the memory bandwidth bottleneck by m...
متن کاملFPGA Implementation of Dynamic Energy Efficient Memory Controller for a H.264/AVC Application
Improvement in high speed DSP applications can be done by integrating computational power with effective memory management. Bandwidth and latency of operation in memory system is rigidly dependent on data accesses. DSP applications such as multimedia require exhaustive streaming at high speed buses. The energy consumption is the key element which will be the focus of research in VLSI and Embedd...
متن کاملMemory Performance at Reduced CPU Clock Speeds: An Analysis of Current x86_64 Processors
Reducing CPU frequency and voltage is a well-known approach to reduce the energy consumption of memory-bound applications. This is based on the conception that main memory performance sees little or no degradation at reduced processor clock speeds, while power consumption decreases significantly. We study this effect in detail on the latest generation of x86 64 compute nodes. Our results show t...
متن کاملExploiting Data Transfer Locality in Memory Mapping
System-level exploration of memory architectures is one of the key issues in successful implementation of datatransfer dominated applications. Usually, one of the main design bottlenecks is the memory access bandwidth. Transformations, rearranging the layout of the data records stored in memory, are very effective to improve the locality of the data transfers but usually lead to a large memory ...
متن کاملF-STONE: A Fast Real-Time DDOS Attack Detection Method Using an Improved Historical Memory Management
Distributed Denial of Service (DDoS) is a common attack in recent years that can deplete the bandwidth of victim nodes by flooding packets. Based on the type and quantity of traffic used for the attack and the exploited vulnerability of the target, DDoS attacks are grouped into three categories as Volumetric attacks, Protocol attacks and Application attacks. The volumetric attack, which the pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002