Performance Limits of Trace Caches
نویسندگان
چکیده
A growing number of studies have explored the use of trace caches as a mechanism to increase instruction fetch bandwidth. The trace cache is a memory structure that stores statically non-contiguous but dynamically adjacent instructions in contiguous memory locations. When coupled with an aggressive trace or multiple branch predictor, it can fetch multiple basic blocks per cycle using a single-ported cache structure. This paper compares trace cache performance to the theoretical limit of a three-block fetch mechanism. The three-block fetch mechanism is modeled by an idealized 3-ported instruction cache with a zero-latency alignment network. Several new metrics are defined to formalize analysis of the trace cache. These include fragmentation, duplication, indexability, and efficiency metrics. We show that performance is more limited by branch mispredictions than ability to fetch multiple blocks per cycle. As branch prediction improves, high duplication and the resulting low efficiency are shown to be among the reasons that the trace cache does not reach its upper bound. Based on the shortcomings of the trace cache shown in this paper, we identify some potential future research areas.
منابع مشابه
Memshare: a Dynamic Multi-tenant Key-value Cache
Web application performance heavily relies on the hit rate of DRAM key-value caches. Current DRAM caches statically partition memory across applications that share the cache. This results in under utilization and limits cache hit rates. We present Memshare, a DRAM key-value cache that dynamically manages memory across applications. Memshare provides a resource sharing model that guarantees rese...
متن کاملTrace Caches in the Context of other Cache Enhancements
Cache memories are now standard components of modern computer systems. They have proven extremely useful in bridging the gap between CPU and DRAM speeds, which continues to grow. Consequently, there has been a great deal of research into making caches more aggressive. A speciic type of cache is the \trace cache" which stores dynamic sequences of instructions as opposed to sequential contiguous ...
متن کاملUsing Dynamic Branch Behavior for Power-Efficient Instruction Fetch
Power consumption has become an increasing concern in high performance microprocessor design in terms of packaging and cooling cost. The fetch unit including instruction cache contributes a large portion of the total power consumption in the microprocessor. The instruction cache itself suffers some hidden power consumption due to dynamic control flows. Although capturing the dynamic control flo...
متن کاملImproving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses
Modern CPUs often use large physically-indexed caches that are direct-mapped or have low associativities. Such caches do not interact well with virtual memory systems. An improperly placed physical page will end up in a wrong place in the cache, causing excessive conflicts with other cached pages. Page coloring has been proposed to reduce the conflict misses by carefully placing pages in the ph...
متن کاملA Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches
This paper compares the trace-sampling techniques of set sampling and time sampling. Using the multi-billion-reference traces of Borg et al., we apply both techniques to multi-megabyte caches, where sampling is most valuable. We evaluate whether either technique meets a 10% sampling goal: a method meets this goal if, at least 90% of the time, it estimates the trace’s true misses per instruction...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Instruction-Level Parallelism
دوره 1 شماره
صفحات -
تاریخ انتشار 1999