The Cost of Uncore in Throughput-Oriented Many-Core Processors
نویسنده
چکیده
Achieving performance through traditional techniques such as extracting more instruction level parallelism or increasing clock frequencies are losing their effectiveness due to the power wall. Multi-core processors have been put forth as a more power-performance efficient means of continuing performance scaling while coping with the realities of a power-limited design. Extrapolating the increase in the number of cores leads us to “many-core” systems, potentially containing hundreds of cores. The multi-/many-core approach is no panacea, however. As the number of cores increases, the overall system will need to provide more cache resources to feed all of these cores, and an everincreasingly complex interconnection network to tie all of these cores together. These additional “uncore” components are not free, and, unless carefully controlled, they may limit the effectiveness of many-core systems. We introduce a simple extension to Hill and Marty’s recent Amdahl’s Lawbased multi-core cost/performance model to account for the uncore components. From this model, we conclude that to sustain the scalability of future many-core systems, the uncore components must be designed to scale sub-linearly with respect to the overall core count.
منابع مشابه
System-wide Performance Counter Measurements: Offcore, Uncore, and Northbridge Performance Events in Modern Processors
Modern processors often have many processing cores in one package (or socket). Traditional hardware performance counters measure only values on a single core. A chip package has many resources which are packagewide and thus need a separate performance reporting mechanism. The values for these shared and off-core resources are reported as “offcore”, “uncore” or “northbridge” events.
متن کاملAn Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors
This paper presents a survey of architectural features among four generations of Intel server processors (Sandy Bridge, Ivy Bridge, Haswell, and Broadwell) with a focus on performance with floating point workloads. Starting on the core level and going down the memory hierarchy we cover instruction throughput for floating-point instructions, L1 cache, address generation capabilities, core clock ...
متن کاملUltra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کاملParallel Packet Processing on Multi-core and Many- core Processors
The Service-oriented Router (SoR), a highly functional router based on a novel router architecture, enables unprecedented web services traditional routers were unable to provide. The SoR performs Deep Packet Inspection (DPI) to analyze Layer 7 information, which is becoming increasingly difficult due to the substantial increase in Internet traffic. Meanwhile, multi-core processors and general-p...
متن کاملA Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints
One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008