instruction cache

Traveling Speculations: An Integrated Prediction Strategy for Wide-Issue Microprocessors

2002

Ravi Bhargava Juan Rubio Lizy K. John

Performing multiple, accurate, low-latency predictions is crucial to improving instruction throughput in future wide-issue microprocessors. However, demands of wide-issue processing coupled with implementation challenges posed by high clock frequencies present obstacles to these prediction goals. This paper proposes the Traveling Speculation framework to accommodate predictions in a wide-issue ...

متن کامل

Empirical Study of Power Consumption of x86-64 Instruction Decoder

2016

Mikael Hirki Zhonghong Ou Kashif N. Khan Jukka K. Nurminen Tapio Niemi

It has been a common myth that x86-64 processors suffer in terms of energy efficiency because of their complex instruction set. In this paper, we aim to investigate whether this myth holds true, and determine the power consumption of the instruction decoders of an x86-64 processor. To that end, we design a set of microbenchmarks that specifically trigger the instruction decoders by exceeding th...

متن کامل

A Trace Cache Microarchitecture and Evaluation

Journal: :IEEE Trans. Computers 1999

Eric Rotenberg Steve Bennett James E. Smith

As the instruction issue width of superscalar processors increases, instruction fetch bandwidth requirements will also increase. It will eventually become necessary to fetch multiple basic blocks per clock cycle. Conventional instruction caches hinder this effort because long instruction sequences are not always in contiguous cache locations. Trace caches overcome this limitation by caching tra...

متن کامل

Cache Performance in Java Virtual Machines: A Study of Constituent Phases

2002

Anand S. Rajan Shiwen Hu Juan Rubio

This paper studies the level 1 cache performance of Java programs by analyzing memory reference traces of the SPECjvm98 applications executed by the Latte Java Virtual Machine. We study in detail Java programs’ cache performance of different access types in three JVM phases, under two execution modes, using three cache configurations and two application data sets. We observe that the poor data ...

متن کامل

Abstracted Instruction Cache of TITAC2 — As a Benchmark Circuit for Timed Asynchronous Circuit Verification

1999

Tomohiro Yoneda

As a benchmark circuit for timed asynchronous circuit verification, we have developed an abstracted version of TITAC 2 instruction cache sub-system and its formal specification. This document shows all the figures of the gate level sub-circuits which compose the abstracted instruction cache. A time Petri net model for the formal specification is also shown with the detailed explanation. The tex...

متن کامل

Instruction prefetching using branch prediction information

1997

Instruction prefetching can effectively reduce instruction cache misses, thus improving the performance. In this paper, we propose a prefetching scheme, which employs a branch predictor to run ahead of the execution unit and to prefetch potentially useful instructions. Branch prediction based (BP-based) prefetching has a separate small fetching unit, allowing it to compute and predict targets a...

متن کامل

Viper: a Vliw Integer Microprocessor

1993

Andrew Naylor Arthur Abnous

This paper describes the design and implementation of a very long instruction word (VLIW) microprocessor. The VIPER (VLIW integer processor) contains four pipelined functional units, and can achieve 0.25 cycle-per-instruction performance. The processor is capable of performing multiway branch operations, two load/store operations or up to four ALU operations in each clock cycle, with full regis...

متن کامل

Shared vs. Snoop: Evaluation of Cache Structure for Single-Chip Multiprocessors

1997

Toru Kisuki Masaki Wakabayashi Junji Yamamoto Keisuke Inoue Hideharu Amano

Abstract. The shared cache structures and snoop cache structures for single-chip multiprocessors are evaluated and compared using an instruction level simulator. Simulation results show that 1-port large shared cache achieves the best performance if there is no delay penalty for arbitration and accessing the bus. However, if 1-clock delay is assumed for accessing the shared cache, a snoop cache...

متن کامل

Spacewalker: Automated Design Space Exploration for Embedded Computer Systems

1998

Greg Snider

design space exploration, VLIW, systolic array, cache This paper addresses the problem of automated design of a computer system for an embedded application. The computer system to be designed consists of a VLIW processor and/or a customized systolic array, along with a cache subsystem comprising a data cache, instruction cache and second-level unified cache. Several algorithms for "walking" the...

متن کامل

A Comparison of Cache Aware and Cache Oblivious Static Search Trees Using Program Instrumentation

2000

Richard E. Ladner Ray Fortna Bao-Hoang Nguyen

An experimental comparison of cache aware and cache oblivious static search tree algorithms is presented. Both cache aware and cache oblivious algorithms outperform classic binary search on large data sets because of their better utilization of cache memory. Cache aware algorithms with implicit pointers perform best overall, but cache oblivious algorithms do almost as well and do not have to be...

متن کامل