instruction cache

Performance Limits of Trace Caches

Journal: :J. Instruction-Level Parallelism 1999

Matt Postiff Gary S. Tyson Trevor N. Mudge

A growing number of studies have explored the use of trace caches as a mechanism to increase instruction fetch bandwidth. The trace cache is a memory structure that stores statically non-contiguous but dynamically adjacent instructions in contiguous memory locations. When coupled with an aggressive trace or multiple branch predictor, it can fetch multiple basic blocks per cycle using a single-p...

متن کامل

Hardware Support for Hiding Cache Latency

1993

Trevor N. Mudge

As the decrease in processor cycle time continues to outpace the decrease in memory cycle time, even moderately sized on-chip caches may require several cycles of access time in the near future. This means that time is lost, even on a cache hit, if independent instructions cannot be scheduled after a read from memory. A novel hardware device is proposed that keeps track of the history of load i...

متن کامل

Inter-warp Instruction Temporal Locality in Deep-Multithreaded GPUs

2013

Ahmad Lashgar Amirali Baniasadi Ahmad Khonsari

GPUs employ thousands of threads per core to achieve high throughput. These threads exhibit localities in control-flow, instruction and data addresses and values. In this study we investigate inter-warp instruction temporal locality and show that during short intervals a significant share of fetched instructions are fetched unnecessarily. This observation provides several opportunities to enhan...

متن کامل

Omitting Cache Look-Up for High-Performance, Low-Power Microprocessors

2001

Koji INOUE Vasily G. Moshnyaga Kazuaki MURAKAMI

In this paper, we propose a novel architecture for low-power direct-mapped instruction caches, called “historybased tag-comparison (HBTC) cache”. The cache attempts to reuse tag-comparison results for avoiding unnecessary tag checks. Execution footprints are recorded into an extended BTB (Branch Target Buffer). In our evaluation, it is observed that the energy for tag comparison can be reduced ...

متن کامل

Analysis of Temporal-Based Program Behavior for Improved Instruction Cache Performance

Journal: :IEEE Trans. Computers 1999

John Kalamatianos Alireza Khalafi David R. Kaeli Waleed Meleis

ÐIn this paper, we examine temporal-based program interaction in order to improve layout by reducing the probability that program units will conflict in an instruction cache. In that context, we present two profile-guided procedure reordering algorithms. Both techniques use cache line coloring to arrive at a final program layout and target the elimination of first generation cache conflicts (i....

متن کامل

A Day in the Life of a Data Cache Miss

2002

Tejas Karkhanis

The activity within a processor following a cache miss is studied via a series of simulation experiments. This is a preliminary step toward developing ways of mitigating data cache miss penalties, especially for long misses. With a modest-sized reorder buffer (ROB) of 64 entries, structural blockages due to a full ROB are the major cause of the cache miss penalty. For the SpecINT2000 benchmarks...

متن کامل

Impact of instruction cache replacement policy on the tightness of WCET estimation

2008

Aurore Junier Damien Hardy Isabelle Puaut

Cache memories have been introduced to decrease the access time to the information due to the increasing gap between fast micro-processors and relatively slower main memories. Thus, there is a need for considering caches when validating the temporal behavior of real-time systems, in particular when estimating tasks’ worst-case execution times (WCETs). In this paper, we use new theoretical resul...

متن کامل

A Buffered Dual-Access-Mode Scheme Designed for Low-Power Highly-Associative Caches

Journal: :IJERTCS 2013

Yul Chu Marven Calagos

This paper proposes a buffered dual-access-mode cache to reduce power consumption for highly-associative caches in modern embedded systems. The proposed scheme consists of a MRU (most recently used) buffer table and a single cache structure to implement two accessing modes, phased mode and way-prediction mode. The proposed scheme shows better access time and lower power consumption than two pop...

متن کامل

Improving the Energy and Execution Efficiency of a Small Instruction Cache by Using an Instruction Register File

2007

Stephen Hines Gary Tyson David Whalley

Small filter caches (L0 caches) can be used to obtain significantly reduced energy consumption for embedded systems, but this benefit comes at the cost of increased execution time due to frequent L0 cache misses. The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packi...

متن کامل

An Accurate and Energy-Efficient Way Determination Technique for Instruction Caches by Using Early Tag Matching

2007

Eui-Young Chung Cheol Hong Kim Sung Woo Chung

Energy consumption has become an important design consideration in modern processors. Therefore, microarchitects should consider energy consumption, together with performance, when designing the cache architecture, since it is a major power consumer in a processor. This paper proposes an accurate and energy-efficient way determination (instead of prediction) technique for reducing energy consum...

متن کامل