Historically, performance analysis has focused on monolithic applications executing on large, stand-alone parallel systems. In such a domain, measurement and post-mortem analysis and code optimization suffice to eliminate performance bottlenecks and optimize applications. Most existing performance analysis systems (e.g., SvPablo [1], Medea [2], and Paragraph [5]) use only post-mortem analysis. ...