Performance Limits of Trace Caches

نویسندگان

Matt Postiff

Gary S. Tyson

Trevor N. Mudge

چکیده

A growing number of studies have explored the use of trace caches as a mechanism to increase instruction fetch bandwidth. The trace cache is a memory structure that stores statically non-contiguous but dynamically adjacent instructions in contiguous memory locations. When coupled with an aggressive trace or multiple branch predictor, it can fetch multiple basic blocks per cycle using a single-ported cache structure. This paper compares trace cache performance to the theoretical limit of a three-block fetch mechanism. The three-block fetch mechanism is modeled by an idealized 3-ported instruction cache with a zero-latency alignment network. Several new metrics are defined to formalize analysis of the trace cache. These include fragmentation, duplication, indexability, and efficiency metrics. We show that performance is more limited by branch mispredictions than ability to fetch multiple blocks per cycle. As branch prediction improves, high duplication and the resulting low efficiency are shown to be among the reasons that the trace cache does not reach its upper bound. Based on the shortcomings of the trace cache shown in this paper, we identify some potential future research areas.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Memshare: a Dynamic Multi-tenant Key-value Cache

Web application performance heavily relies on the hit rate of DRAM key-value caches. Current DRAM caches statically partition memory across applications that share the cache. This results in under utilization and limits cache hit rates. We present Memshare, a DRAM key-value cache that dynamically manages memory across applications. Memshare provides a resource sharing model that guarantees rese...

متن کامل

Trace Caches in the Context of other Cache Enhancements

Cache memories are now standard components of modern computer systems. They have proven extremely useful in bridging the gap between CPU and DRAM speeds, which continues to grow. Consequently, there has been a great deal of research into making caches more aggressive. A speciic type of cache is the \trace cache" which stores dynamic sequences of instructions as opposed to sequential contiguous ...

متن کامل

Using Dynamic Branch Behavior for Power-Efficient Instruction Fetch

Power consumption has become an increasing concern in high performance microprocessor design in terms of packaging and cooling cost. The fetch unit including instruction cache contributes a large portion of the total power consumption in the microprocessor. The instruction cache itself suffers some hidden power consumption due to dynamic control flows. Although capturing the dynamic control flo...

متن کامل

Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses

Modern CPUs often use large physically-indexed caches that are direct-mapped or have low associativities. Such caches do not interact well with virtual memory systems. An improperly placed physical page will end up in a wrong place in the cache, causing excessive conflicts with other cached pages. Page coloring has been proposed to reduce the conflict misses by carefully placing pages in the ph...

متن کامل

A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches

This paper compares the trace-sampling techniques of set sampling and time sampling. Using the multi-billion-reference traces of Borg et al., we apply both techniques to multi-megabyte caches, where sampling is most valuable. We evaluate whether either technique meets a 10% sampling goal: a method meets this goal if, at least 90% of the time, it estimates the trace’s true misses per instruction...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

J. Instruction-Level Parallelism

دوره 1 شماره

صفحات -

تاریخ انتشار 1999

Performance Limits of Trace Caches

نویسندگان

چکیده

منابع مشابه

Memshare: a Dynamic Multi-tenant Key-value Cache

Trace Caches in the Context of other Cache Enhancements

Using Dynamic Branch Behavior for Power-Efficient Instruction Fetch

Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses

A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches

عنوان ژورنال:

اشتراک گذاری