Recurrence Plots in Analysis of Computer Systems ; CU-CS-1067-10
نویسنده
چکیده
To analyze computer systems or compare them to one another, measurements are taken as the program runs. This is called a performance trace, and it can involve large amounts of time series data. Currently, statistical tools are used for analysis but this approach loses information by ignoring the ordered nature of the data. We propose improving computer performance metrics by viewing the computer as a dynamical system and studying the time varying behavior. We proposed accomplishing this by applying recurrence plots (RPs) to the data [6]. This approach provides a graphical characterization of the system that can (in some cases) filter though noise, identify non-stationary patterns, and make periodic behavior immediately apparent [7]. I tested the hypotheses that RPs will allow us to easily compare systems as well as to identify specific points in the data set that warrant further investigation. We found that RPs are not an appropriate tool for this purpose. We believe this is partially due to the fact that the scale patterns in time series data from hardware traces is much smaller than the entire time series. This difference of 1-2 orders of magnitude makes the patterns difficult to see. In addition, RPs appear too sensitive to noise to be useful for computer applications. Background The sensitivity to small perturbations is a defining characteristic of chaotic systems and is well known in computer systems (ie. irreproducibility, bugs, etc). It is exhibited within the architectural implementation (hardware) as well as code (software) and makes analysis difficult and error prone. In previous work, Bradley, Diwan, and Mytkowicz have shown that a nonlinear dynamics model of computer systems captures the effects of both factors [8]. Current work is focused on understanding the effects. This involves running and analyzing performance traces on hardware as well as simulators. This generates large amounts of time series data with significant noise. The conjecture we will explore here is that RPs will help with the data analysis by providing an additional tool for interpreting and comparing these complex systems [6]. Methods A recurrence plot (RP) is a tool which allows recurrence patterns in time series data to be visualized in two dimensions [9]. This is done by plotting points where the trajectory at time i is close to the trajectory at time j. This can produce a colored graph corresponding to a range of differences (unthresholded), or a black and white graph (thresholded) where only the points outside a specified threshold are plotted. Mathematically, a thresholded RP corresponds to Ri,j = 1 if: |xi − xj | < τ else Ri,j = 0, where R is the matrix corresponding to the recurrence plot, xt corresponds to displacement at time t, and τ is some threshold value. An RP always contains the line y = x about which it is symmetric (reflecting the fact that the system’s state is equal to itself at every point in time). Lines parallel to this diagonal signify recurrences. This is useful in identifying periodicities, limit cycles, or chaotic behavior. Recurrence plots allow one to look beyond noise in the system as well as to identify non-stationary patterns and other interesting points [3, 1]. The data to be studied here consists of example data generated for the purpose of understanding the tools as well as instructions per cycle (IPC) data taken from a simulator. They both consist of some measurement as a function of time. We also analyzed cache misses (a memory usage metrics) but these results are not included in the report. Data was gathered by other members of the team (specifically Todd and Stephen) from simulators as well as real hardware systems. Measurements were taken every hundred thousand cycles and later normalized to ”per cycle” count to produce IPC. When calibrating the RP software, I down sampled the data in order to achieve a manageable size. The goal of this work was to evaluate RPs as a tool for identifying patterns and points of interest in hardware traces. The data from such traces is long and difficult to interpret. We explored whether RPs aided in the analysis of these data. PART 2: Understand and Calibrate Tools In order to evaluate the effectiveness of recurrence plots in identifying points of interest in large time series data, I first looked at data with known characteristics. This served the purpose of both developing my understanding of the tools as well as determining the strengths and weaknesses of RP analysis. I chose to look at the effects of two characteristics that are likely in our experimental data: noise and drift. Noise Experiment: Noise is present in virtually every experimental data set. In our hardware traces (from real systems), many steps have been taken to limit the amount of noise, but it is impossible to completely eliminate it. For example, we can ensure that there are no superfluous programs running when we are taking our measurements but we cannot turn off the operating system. Therefore, our data contains known noise from the operating system interrupting and using cycles of the CPU to perform tasks unrelated to our test programs. It is highly likely that there exists noise from a variety of unknown factors. In an effort to quantify effects of noise on the analysis, I generated example data with varying amounts of noise. The data was generated from the following function: x(t) = sin(t) + 1 2 sin(2t) + c ∗ r Noise was introduced in the last term where r is a random number from a gaussian distribution, and c is some scaling constant. Various levels of noise were introduced and are quantified below as a percentage of the range of the original time series. The following analysis shows data with no noise (red) compared against data with 2% noise (green) and completely random data (100% noise, blue). Other levels of noise (1%, 5%, 10%) were also analyzed but are not shown due to space considerations. Figure 1: Time series of periodic data with noise Figure 1 shows a snapshot of the time series for visualization purposes (the entire series contain 3,000 points)[10]. As you can see, the time series with 2% noise closely matches the series without noise while the random series does not (which is expected). This data was analyzed using TISEAN, a package for analysis of time series data [4]; select results are shown below. The first graph of Figure 2 shows mutual information functions for each level of noise. This is a way of measuring the degree of dependency between two variables. In the above case, it measures the dependency of two points in the time series, given they are x distance apart (where x is measured along the x-axis). We can see that the periodic signal retains a higher degree of mutual information as compared to the signal with 2% noise. The random signal retains no mutual information after the first step (which, again, is expected.) Figure 2b shows TISEAN’s false nearest neighbors function (FNN) is used to estimate the dimension of compressed data. We expect our traces to be manydimensional but we are only measuring one dimension. FNN gives us a way of determining which points appear close in our trace data but are not actually in (a) Mutual Information (b) False Nearest (c) Delay Figure 2: TISEAN analysis the system. We can see that the periodic signal is significantly lower than the signals with noise. We would ideally like the false nearest measure under 0.1. Figure 2c is a visual representation of the reconstructed dynamics according to the delay coordinate embedding theorem [5]. This accurately represents the 2% noise signal as a “fuzzy” approximation of the periodic signal (clean red line, partially visible underneath.) The random signal was left off for clarity (it has no discernible pattern and covers the entire plot.) Drift Experiment: We would also like to be able to identify drift, a form of non-stationarity, in our data. Drift describes a phenomena where a recurring pattern is obscured by a constant force in some direction. For example, a swimmer’s periodic motion can be obscured by the constant current in a river. I analyzed the ability of our analysis to detect drift by looking at data generated from the following function: x(t) = sin(t) + 1 2 sin(2t) + c ∗ t where c is some small constant. Following is the beginning of a periodic time series (red) graphed with c = 0.1 (green). You can see that their pattern is similar but the latter is drifting upwards. Non-stationarity is difficult to detect with common analysis tools such as a statistical mean, which ignores time varying behavior. But, it should be easy to identify with recurrence plots. For example, the above pattern that reoccurs but is affected by drift upwards (positive) reflects this with diagonal lines (recurrences) close to the center diagonal but fewer further out, as seen below. PART 3: Simulator data (IPC) The next step in evaluating recurrence plots as a tool for pattern detection in hardware traces is to analyze data from simulators. They are more complex and difficult to analyze than the above toy systems but are much simpler than real systems (i.e., hardware traces from real computers.) Simulators are a common way to analyze computer systems because they allow researchers to look at various parameters such as cache size and configuration, processor speed, and much more without buying specific machine. This Figure 3: Time series of periodic data with drift Figure 4: RP of periodic data with drift method is also used in industry, allowing the manufacturer to do some analysis of a processor even before it is built. Simulators can take a large number of measurements as the virtual machine runs a program, and the entire environment can be controlled and measured. The simplification of simulators is both a benefit and detriment. It is very useful to isolate variables in an experiment and, therefore, simulators can be very helpful in determining the effect of each parameter. But, often, this can be very misleading because real systems do not operate in isolated, controlled environments. In fact, it is often the interaction of different parts (cache size, processor speed, operating system, specific program implementation, etc) that dominates the dynamics. This can result in simulator traces that are vastly different than their corresponding hardware traces. While we recognize the limited connection between simulator results and hardware results, we chose to test RPs as a tool on various simulator runs. For the duration of the experiment, the simulator was configured as follows with approximately 2.4 Ghz processor and 4 MB L2 cache. I began by running the following two programs on the simulator: • repeated row major matrix initialization • repeated column major matrix initialization. The programs are very simple and are given below:
منابع مشابه
Complexity of Seismic Process: A Mini-Review
Submit Manuscript | http://medcraveonline.com Abbreviations: EQ: Earthquake; CS: Complex System; AE: Acoustic Emission; ETAS: Epidemic-Type Aftershock Sequence; B-G: Boltzman-Gibbs; PSHA: Probabilistic Seismic Hazard Analysis; SHA: Seismic Hazard Assessment; SOC: Self-Organized Criticality; NESM: Non-Extensive Statistical Mechanics; IFS: Iterated Function Systems; RP: Recurrent Plots; RQA: Recu...
متن کاملA Domain View of Timed Behaviors
The intention of this paper is to introduce a timed extension of transition systems with independence, and to study its categorical interrelations with other timed ”true-concurrent” models. In particular, we show the existence of a chain of coreflections leading from a category of the model of timed transition systems with independence to a category of a specially defined model of marked Scott ...
متن کاملبهکارگیری روش غیرخطی منحنی بازگشتی برای شناسایی مؤلّفههای حافظهای برمبنای تک ثبت
Abstract: The purpose of this study was to apply recurrence plots on event related potentials (ERPs) recorded during memory recognition tests. EEG signals recorded during memory retrieval in four scalp region were used. Two most important ERP’s components corresponding to memory retrieval, FN400 and LPC, were detected in recurrence plots computed for single-trial EEGs. In addition, the RQA was ...
متن کاملتشخیص خودکار الگوهای پاتولوژیک ریوی در تصاویر HRCT بیماران مبتلا به ILD
Abstract: The purpose of this study was to apply recurrence plots on event related potentials (ERPs) recorded during memory recognition tests. EEG signals recorded during memory retrieval in four scalp region were used. Two most important ERP’s components corresponding to memory retrieval, FN400 and LPC, were detected in recurrence plots computed for single-trial EEGs. In addition, the RQA was ...
متن کاملRecurrence plots for the analysis of complex systems
Recurrence is a fundamental property of dynamical systems, which can be exploited to characterise the system’s behaviour in phase space.A powerful tool for their visualisation and analysis called recurrence plotwas introduced in the late 1980’s. This report is a comprehensive overview covering recurrence based methods and their applications with an emphasis on recent developments. After a brief...
متن کاملDynamic characterization and predictability analysis of wind speed and wind power time series in Spain wind farm
The renewable energy resources such as wind power have recently attracted more researchers’ attention. It is mainly due to the aggressive energy consumption, high pollution and cost of fossil fuels. In this era, the future fluctuations of these time series should be predicted to increase the reliability of the power network. In this paper, the dynamic characteristics and short-term predictabili...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015