Forwardflow: Scalable, RAM-Based Dataflow Execution
نویسندگان
چکیده
Power (and thermal) limits have forced an industry-wide shift from increasingly complex uniprocessors to multicore chips with 4, 8, and even 16 simpler processor cores. Yet Amdahl’s Law suggests that these cores should not be too simple, lest they exacerbate even a parallel application’s sequential bottlenecks. Furthermore, running all cores at full speed will soon exceed the chip’s power envelope. Ideally, future CMPs should use cores that trade-off power and performance, allowing the system to scale up a core’s instruction-level parallelism (ILP) and memory-level parallelism (MLP) to improve sequential performance. This work presents the Forwardflow microarchitecture, which executes instructions out-of-order using RAM-based structures in lieu of non-scalable CAMor matrix-based mechanisms. Forwardflow dynamically builds an explicit internal dataflow representation from a conventional ISA, using forward dependence pointers to guide instruction wakeup, selection, and issue. Because all of Forwardflow’s major data structures are RAM-based, the instruction window scales large enough to tolerate long memory access times.
منابع مشابه
HeDGE: Hybrid Dataflow Graph Execution in the Issue Logic
Exposing more instruction-level parallelism in out-of-order superscalar processors requires increasing the number of dynamic in-flight instructions. However, large instruction windows increase power consumption and latency in the issue logic. We propose a design called Hybrid Dataflow Graph Execution (HeDGE) for conventional Instruction Set Architectures (ISAs). HeDGE explicitly maintains depen...
متن کاملAccelerating NWChem Coupled Cluster Through Dataflow-Based Execution
Numerical techniques used for describing many-body systems, such as the Coupled Cluster methods (CC) of the quantum chemistry package NWCHEM, are of extreme interest to the computational chemistry community in fields such as catalytic reactions, solar energy, and bio-mass conversion. In spite of their importance, many of these computationally intensive algorithms have traditionally been thought...
متن کاملTideflow: a Dataflow-inspired Execution Model for High Performance Computing Programs Tideflow: a Dataflow-inspired Execution Model for High Performance Computing Programs
Traditional programming, execution and optimization techniques have been shown to be inadequate to exploit the features of computer processors with many cores. In particular, previous research shows that traditional paradigms are insufficient to harness the opportunities of manycore processors: (1) traditional execution models do not provide constructs rich enough to express parallel programs, ...
متن کاملOptimizing Interrupt-Driven Embedded Software
Software for embedded microcontroller units (MCUs) represents both an interesting opportunity and a difficult challenge for compiler optimization. Since these systems tend to be small—often limited to a few KB of on-chip RAM—highly aggressive techniques are feasible and worthwhile. On the other hand, the effectiveness of traditional dataflow analyses is limited by their inability to cope with i...
متن کاملSpeculative Thread Execution in a Multithreaded Dataflow Architecture
Instruction Level Parallelism (ILP) in modern Superscalar and VLIW processors is achieved using out-of-order execution, branch predictions, value predictions, and speculative executions of instructions. These techniques are not scalable. This has led to multithreading and multi-core systems. However, such processors require compilers to automatically extract thread level or task level paralleli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008