Mesoscale Performance Simulations of Multicore Processor Systems with Asynchronous Memory Access
نویسندگان
چکیده
Increasing on-chip transistor densities allow for a myriad of design choices for modern multicore processors. However, conducting a meaningful design space exploration of large systems with detailed cycle-accurate simulations using large, complex workloads can be very time-consuming and can adversely impact product schedules. This is due to three main reasons: 1) the high (human) cost of developing cycle-accurate simulators, 2) long simulation times for any sufficiently detailed simulator of a large, complex system, and 3) long running times for modern workloads. While statistical sampling techniques address workload run lengths, there exists no proven technique to replace the use of detailed cycle-accurate simulators for design space exploration. In this paper, we introduce mesoscale simulation, which is a methodology for design space exploration that mitigates the cost of cycle-accurate simulators. Mesoscale simulation is a hybrid approach that combines elements of high-level modeling and lowlevel cycle-accurate simulators to enable the construction of fast, high-fidelity performance models. Such models can be used to quickly explore vast areas of the design space and prune it down to manageable levels for cycle-accurate simulator based studies. We describe a proof-of-concept mesoscale implementation of the memory subsystem of the Cell/B.E. processor and discuss results from running various workloads. $ This work was done while the author was employed by IBM STG Boeblingen. # This work was done while the author was employed by IBM Research Austin.
منابع مشابه
Proposed Feature Selection for Dynamic Thermal Management in Multicore Systems
Increasing the number of cores in order to the demand of more computing power has led to increasing the processor temperature of a multi-core system. One of the main approaches for reducing temperature is the dynamic thermal management techniques. These methods divided into two classes, reactive and proactive. Proactive methods manage the processor temperature, by forecasting the temperature be...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملMANAGER: A Multicore Shared Cache Energy Saving Technique for QoS Systems
Last level caches (LLCs) contribute significantly to processor power consumption. Saving LLC energy in multicore QoS systems is especially challenging, since aggressive energy saving techniques may lead to failure in providing QoS. We present MANAGER, a multicore shared cache energy saving technique for quality-of-service systems. Using dynamic profiling, MANAGER periodically predicts cache acc...
متن کاملDesign of a novel congestion-aware communication mechanism for wireless NoC architecture in multicore systems
Hybrid Wireless Network-on-Chip (WNoC) architecture is emerged as a scalable communication structure to mitigate the deficits of traditional NOC architecture for the future Multi-core systems. The hybrid WNoC architecture provides energy efficient, high data rate and flexible communications for NoC architectures. In these architectures, each wireless router is shared by a set of processing core...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009