نتایج جستجو برای: instruction fetch

تعداد نتایج: 42508  

1995
Lizy Kurian John Vinod Reddy Paul T. Hulina Lee D. Coraor

Software oriented techniques to hide memory latency in superscalar and superpipe2ined machines include loop unrolling, software pipelining, and software cache prefetching. Issuing the data fetch request prior to actual need for data allows overlap of accessing with useful computations. Loop unrolling and software pipelining do not necessitate microarchitecture or instruction set architecture ch...

1997
Gary S. Tyson Todd M. Austin

As processors continue to exploit more instruction level parallelism, a greater demand is placed on reducing the eeects of memory access latency. In this paper, we introduce a novel modiication of the processor pipeline called memory renaming. Memory renaming applies register access techniques to load instructions, reducing the eeect of delays caused by the need to calculate effective addresses...

2010
Shubhajit Roy Chowdhury

The paper focuses on the use of field programmable gate arrays (FPGA) for signal processing applications. By allowing designers to create circuit architectures developed for the specific applications, high levels of performance can be achieved using FPGA for many digital signal processing (DSP) applications providing considerable improvements over conventional microprocessor and dedicated DSP p...

2013
Shahnawaz Talpur Yizhuo Wang Shahnawaz Farhan Khahro XiaoJun Wang Xu Chen Feng Shi

To achieve highest performance in rapidly growing advancement in multi-core technology, there is need to minimize the large gap between faster processor speed and memory. It becomes more critical issue when branch occurs with penalty of cache miss. Many researchers proposed different branch prediction, instruction perfecting methods and algorithms but the CPU pipeline performance couldn’t be th...

2007
Peter E. Strazdins Bill Clarke Andrew Over

This paper presents a novel technique for cycleaccurate simulation of the Central Processing Unit (CPU) of a modern superscalar processor, the UltraSPARC III Cu processor. The technique is based on adding a module to an existing fetch-decode-execute style of CPU simulator, rather than the traditional method of fully modelling the CPU microarchitecture. It is also suitable for accurate SMP model...

1995
Maged M. Michael Michael L. Scott

Our research addresses the general topic of atomic update of shared data structures on large-scale shared-memory multiprocessors. In this paper we consider alternative implementations of the general-purpose single-address atomic primitives fetch and , compare and swap, load linked, and store conditional. These primitives have proven popular on small-scale bus-based machines, but have yet to bec...

Journal: :Universität Trier, Mathematik/Informatik, Forschungsbericht 1998
Christoph W. Kessler Helmut Seidl

ForkLight is an imperative, task-parallel programming language for massively parallel shared memory machines. It is based on ANSI C, follows the SPMD model of parallel program execution, provides a sequentially consistent shared memory, and supports dynamically nested parallelism. While no assumptions are made on uniformity of memory access time or instruction– level synchronicity of the underl...

2007
Tanausú Ramírez Alex Pajuelo Oliverio J. Santana Mateo Valero

In this paper, we propose Runahead threads on Simultaneous Multithreading processors as a valuable solution for both exploiting the memory-level parallelism and reducing the resource contention. This approach transforms a memory-bounded eager resource thread into a speculative light thread, alleviating critical resource con icts among multiple threads. Furthermore, it improves the threadlevel p...

2000
Gabriel Loh Dana Henry

The increasing complexity of modern superscalar processors makes the evaluation of new designs more difficult. Current simulators such as Stanford’s SimOS [16] and the University of Wisconsin’s Simplescalar Toolset [2] perform detailed cycle-level simulation of the processor to obtain performance measurements at the cost of very slow simulation times. This report presents and analyzes an algori...

2004
Pradheep Elango Saisuresh Krishnakumaran Ramanathan Palaniappan

Large instruction window processors can achieve high performance by supplying more instructions during long latency load misses, thus effectively hiding these latencies. Continual Flow Pipeline (CFP) architectures provide high-performance by effectively increasing the number of actively executing instructions without increasing the size of the cycle-critical structures. A CFP consists of a Slic...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید