instruction fetch

Way Memoization to Reduce Fetch Energy in Instruction Caches

2001

Albert Ma Michael Zhang

Instruction caches consume a large fraction of the total power in modern low-power microprocessors. In particular, set-associative caches, which are preferred because of lower miss rates, require greater access energy on hits than direct-mapped caches; this is because of the need to locate instructions in one of several ways. Way prediction has been proposed to reduce power dissipation in conve...

متن کامل

Utilizing Block Size Variability to Enhance Instruction Fetch Rate

2007

AZAM BEG

In the past, instruction fetch speeds have been improved by using cache schemes that capture the actual program flow. In this paper, we elaborate on the architecture and operation of an instruction cache named Variable-Sized Block Cache (VSBC) that also makes use of the dynamic behavior of a program. Current trace-based cache schemes usually have some instructions stored repeatedly; this redund...

متن کامل

Optimising long-latency-load-aware fetch policies for SMT processors

Journal: :IJHPCN 2004

Francisco J. Cazorla Alex Ramírez Mateo Valero Enrique Fernández

Simultaneous Multithreading (SMT) processors fetch instructions from several threads and, in this way, the available Instruction Level Parallelism (ILP) of each thread is exposed to the processor. In an SMT processor the fetch engine has the additional level of freedom, compared to a super-scalar processor, to select independent instructions. The fetch engine determines how shared resources are...

متن کامل

eXtended Block Cache

2000

Stéphan Jourdan Lihu Rappoport Yoav Almog Mattan Erez Adi Yoaz Ronny Ronen

This paper describes a new instruction-supply mechanism, called the eXtended Block Cache (XBC). The goal of the XBC is to improve on the Trace Cache (TC) hit rate, while providing the same bandwidth. The improved hit rate is achieved by having the XBC a nearly redundant free structure. The basic unit recorded in the XBC is the extended block (XB), which is a multiple-entry single-exit instructi...

متن کامل

Simultaneous Multithreading

2012

Michael Tullsen

Program Authorized to Offer Degree Date In presenting this dissertation in partial fulfillment of the requirements for the Doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this dissertation is allowable only for scholarly purposes, consistent with " fair use " as prescribed ...

متن کامل

Effective Instruction Prefetching In Chip Multiprocessors

2015

threaded application performance, often achieved through instruction level parallelism per chip is increasing, the software and hardware techniques to exploit the potential of studies mostly involve distributed shared memory multiprocessors and fetching will not be fully effective at masking the remote fetch latency. the effective address of the load instructions along that path based upon a hi...

متن کامل

Energy Aware Register File Implementation through Instruction Predecode

2003

José Luis Ayala Marisa López-Vallejo Alexander V. Veidenbaum Carlos A. Lopez

The register file is a power-hungry device in modern architectures. Current research on compiler technology and computer architectures encourages the implementation of larger devices to feed multiple data paths and to store global variables. However, low power techniques are not able to appreciably reduce power consumption in this device without a time penalty. This paper introduces an efficien...

متن کامل

Decoupled Value Prediction on Trace Processors

2000

Sang Jeong Lee Yuan Wang Pen-Chung Yew

Value prediction is a technique that breaks true data dependences by predicting the outcome of an instruction, and executes speculatively its data-dependent instructions based on the predicted outcome. In this paper, we address several implementation issues for value prediction which are important on wide-issue superscalar architectures, and present a value prediction scheme based on the trace ...

متن کامل

The Effect of Speculative Execution on Cache Performance

1994

Jim Pierce Trevor N. Mudge

Superscalar microprocessors obtain high performance by exploiting parallelism at the instruction level. To effectively use the instruction-level parallelism found in general purpose, non-numeric code, future processors will need to speculatively execute far beyond instruction fetch limiting conditional branches. One result of this deep speculation is an increase in the number of instruction and...

متن کامل

Hoisting Branch Conditions - Improving Super-Scalar Processor Performance

1995

William F. Appelbe Srinivas Doddapaneni Reid Harmon Phil May D. Scott Wills Maurizio Vitale

The performance and hardware complexity of super-scalar architectures is hindered by conditional branch instructions. When conditional branches are encountered in a program, the instruction fetch unit must rapidly predict the branch predicate and begin speculatively fetching instructions with no loss of instruction throughput. Speculative execution has a high hardware cost, is limited by dynami...

متن کامل