نتایج جستجو برای: instruction fetch

تعداد نتایج: 42508  

1996
Pascal Sainrat Pierre Michaud

A basic rule in computer architecture is that a processor cannot execute an application faster than it fetches its instructions. This paper presents a novel cost-eeective mechanism called the two-block ahead branch predictor. Information from the current instruction block is not used for predicting the address of the next instruction block, but rather for predicting the block following the next...

Journal: :Softw., Pract. Exper. 1999
Jan Hoogerbrugge Lex Augusteijn Jeroen Trum Rik van de Wiel

This paper describes a system for compressed code generation. The code of applications is partioned into time-critical and non-time-critical code. Critical code is compiled to native code, and non-critical code is compiled to a very dense virtual instruction set which is executed on a highly optimized interpreter. The system employs dictionary-based compression by means of superinstructions whi...

2006
Aaron Smith Ramadass Nagarajan Karthikeyan Sankaralingam Robert McDonald Doug Burger Stephen W. Keckler Kathryn S. McKinley

Predication facilitates high-bandwidth fetch and large static scheduling regions, but has typically been too complex to implement comprehensively in out-of-order microarchitectures. This paper describes dataflow predication, which provides per-instruction predication in a dataflow ISA, low predication computation overheads similar to VLIW ISAs, and low complexity out-of-order issue. A twobit fi...

2000
U. Brinkschulte C. Krakowski

We propose handling of external real-time events through multithreading and describe the microarchitecture of our multithreaded Java microcontroller, called Komodo microcontroller. Real-time Java threads are used as interrupt service threads (ISTs) instead of interrupt service routines (ISRs). Our proposed Komodo microcon-troller supports multiple ISTs with zero-cycle context switching overhead...

1999
Patrick Hung Michael J. Flynn

Modern superscalar and VLIW processors fetch, decode, issue, execute, and retire multiple instructions per cycle. By taking advantage of instruction-level parallelism (ILP), processor performance can be improved substantially. However, increasing the level of ILP may eventually result in diminishing and negative returns due to control and data dependencies among subsequent instructions as well ...

2002
Ann Gordon-Ross Frank Vahid

Dynamically-loaded tagless loop caching reduces instruction fetch power for embedded software with small loops, but only supports simple loops without taken branches. Preloaded tagless loop caching supports complex loops with branches and thus can reduce power further, but has a limit on the total number of instructions cached. We show that each does well on particular benchmarks, but neither i...

2014
Andrey Mokhov Maxim Rykunov Danil Sokolov Alex Yakovlev

Energy becomes a dominating factor for a wide spectrum of computations: from intensive data processing in “big data” companies resulting in large electricity bills, to infrastructure monitoring with wireless sensors relying on energy harvesting. In this context it is essential for a computation system to be adaptable to the power supply and the service demand, which often vary dramatically duri...

Journal: :Microprocessors and Microsystems - Embedded Hardware Design 1992
Paul T. Hulina Lizy Kurian John Eugene John Lee D. Coraor

Decoupled computer architectures provide high scalar performance by exploiting the ne{grained parallelism existing between the access and execute functions in a computer program. These architectures employ an access processor to perform data fetch ahead of demand by the execute process. Some of the decoupled archi-tectures employ identical access and execute processors, but special processors t...

2000
Bryan Black John Paul Shen

There is significant performance motivation to build larger and wider superscalar machines, however the implementation complexity can be overwhelming. When superscalar machines grow they necessarily become deeper in order to maintain frequency. As the pipeline depth increases the performance gained by a wide instruction fetch and dispatch is lost to branch misprediction penalty cycles. This wor...

2001

This paper describes a microthreaded, multiprocessor and presents simulations from a single processor implementation. The microthreaded approach obtains threads from a single context and exploits both vector and instruction level parallelism (ILP). Threaded code can be generated from sequential code, where loops may be transformed into families of, possibly dependent, concurrent threads. Instru...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید