instruction fetch

Multiple - Block Ahead

1996

Pascal Sainrat Pierre Michaud

A basic rule in computer architecture is that a processor cannot execute an application faster than it fetches its instructions. This paper presents a novel cost-eeective mechanism called the two-block ahead branch predictor. Information from the current instruction block is not used for predicting the address of the next instruction block, but rather for predicting the block following the next...

متن کامل

A Code Compression System Based on Pipelined Interpreters

Journal: :Softw., Pract. Exper. 1999

Jan Hoogerbrugge Lex Augusteijn Jeroen Trum Rik van de Wiel

This paper describes a system for compressed code generation. The code of applications is partioned into time-critical and non-time-critical code. Critical code is compiled to native code, and non-critical code is compiled to a very dense virtual instruction set which is executed on a highly optimized interpreter. The system employs dictionary-based compression by means of superinstructions whi...

متن کامل

Dataflow Predication Aaron

2006

Aaron Smith Ramadass Nagarajan Karthikeyan Sankaralingam Robert McDonald Doug Burger Stephen W. Keckler Kathryn S. McKinley

Predication facilitates high-bandwidth fetch and large static scheduling regions, but has typically been too complex to implement comprehensively in out-of-order microarchitectures. This paper describes dataflow predication, which provides per-instruction predication in a dataflow ISA, low predication computation overheads similar to VLIW ISAs, and low complexity out-of-order issue. A twobit fi...

متن کامل

Performance Evaluations of a Multithreaded

2000

U. Brinkschulte C. Krakowski

We propose handling of external real-time events through multithreading and describe the microarchitecture of our multithreaded Java microcontroller, called Komodo microcontroller. Real-time Java threads are used as interrupt service threads (ISTs) instead of interrupt service routines (ISRs). Our proposed Komodo microcon-troller supports multiple ISTs with zero-cycle context switching overhead...

متن کامل

Optimum Instruction-level Parallelism (ILP) for Superscalar and VLIW Processors

1999

Patrick Hung Michael J. Flynn

Modern superscalar and VLIW processors fetch, decode, issue, execute, and retire multiple instructions per cycle. By taking advantage of instruction-level parallelism (ILP), processor performance can be improved substantially. However, increasing the level of ILP may eventually result in diminishing and negative returns due to control and data dependencies among subsequent instructions as well ...

متن کامل

Dynamic Loop Caching Meets Preloaded Loop Caching - A Hybrid Approach

2002

Ann Gordon-Ross Frank Vahid

Dynamically-loaded tagless loop caching reduces instruction fetch power for embedded software with small loops, but only supports simple loops without taken branches. Preloaded tagless loop caching supports complex loops with branches and thus can reduce power further, but has a limit on the total number of instructions cached. We show that each does well on particular benchmarks, but neither i...

متن کامل

Design of Processors with Reconfigurable Microarchitecture

2014

Andrey Mokhov Maxim Rykunov Danil Sokolov Alex Yakovlev

Energy becomes a dominating factor for a wide spectrum of computations: from intensive data processing in “big data” companies resulting in large electricity bills, to infrastructure monitoring with wireless sensors relying on energy harvesting. In this context it is essential for a computation system to be adaptable to the power supply and the service demand, which often vary dramatically duri...

متن کامل

Design and VLSI implementation of an access processor for a decoupled architecture

Journal: :Microprocessors and Microsystems - Embedded Hardware Design 1992

Paul T. Hulina Lizy Kurian John Eugene John Lee D. Coraor

Decoupled computer architectures provide high scalar performance by exploiting the ne{grained parallelism existing between the access and execute functions in a computer program. These architectures employ an access processor to perform data fetch ahead of demand by the execute process. Some of the decoupled archi-tectures employ identical access and execute processors, but special processors t...

متن کامل

Turboscalar: A High Frequency High IPC Microarchitecture

2000

Bryan Black John Paul Shen

There is significant performance motivation to build larger and wider superscalar machines, however the implementation complexity can be overwhelming. When superscalar machines grow they necessarily become deeper in order to maintain frequency. As the pipeline depth increases the performance gained by a wide instruction fetch and dispatch is lost to branch misprediction penalty cycles. This wor...

متن کامل

A Microthreaded Chip Multiprocessor with a Vector instruction Set

2001

This paper describes a microthreaded, multiprocessor and presents simulations from a single processor implementation. The microthreaded approach obtains threads from a single context and exploits both vector and instruction level parallelism (ILP). Threaded code can be generated from sequential code, where loops may be transformed into families of, possibly dependent, concurrent threads. Instru...

متن کامل