Modeling High Dimensional Time Series
نویسنده
چکیده
This paper investigates the effectiveness of the recently proposed Gaussian Process Dynamical Model (GPDM) on high dimensional chaotic time series. The GPDM takes a Bayesian approach to modeling high-dimensional time series data, using the Gaussian process Latent Variable model (GPLVM) for nonlinear dimensionality reduction combined with a nonlinear dynamical model in latent space. The GPDM is evaluated on chaotic time series data sampled from the Lorenz attractor and two nonlinear highdimensional projections of the same trajectory. Introduction Some high dimensional datasets are characterized by a much smaller intrinsic dimension, and several techniques have been proposed for modeling and understanding this kind of data [1, 2, 11]. These techniques typically produce a mapping from high dimensional to low dimensional space, are characterized by a number of free parameters, and are not necessarily efficient for modeling time series data. Gaussian Processes, on the other hand, form a model from low to high dimensional space, which is intuitively more natural (consistent with how real-world datasets are generated). Furthermore, Gaussian Processes are a Bayesian method, averaging over model parameters rather than discovering them through trial and error. And recently, Gaussian Processes were shown to have a natural extension called Gaussian Process Dynamical Models (GPDM) for modeling high-dimensional time series data [3]. In [3], Wang demonstrated the effectiveness of GPDM’s for human motion, both for cyclic data (walking) and for short non-cyclic trajectories (golf club swings). In this paper, we investigate the effectiveness of the GPDM on chaotic data—in particular, we analyze GPDM’s performance on a collection of artificial chaotic datasets. Related Work Several years ago, the major dimensionality reduction techniques were linear. For example, Principal Components Analysis (PCA) performs an eigendecomposition on the data, and projects it along the dimensions of the highest eigenvalues, in attempt to preserve the majority of the variability of the dataset. Multi-Dimensional Scaling (MDS) performs linear dimensionality reduction by attempting to ensure that the distances (normally Euclidean) between objects should be the same in all dimensions. Both PCA and MDS have been proven to converge to the true underlying manifold in the limit of an infinite amount of data, as long as the underlying manifold is linear. The main drawback of both PCA and MDS is their inability to model nonlinear manifolds—fortunately both models have nonlinear extensions, as well as extensions for modeling time-series data. Kernel PCA (KPCA) was introduced as a kernel method for introducing nonlinearities into the PCA dimensionality reduction. Isomap is a nonlinear dimensionality reduction technique introduced by Tenenbaum, Silva and Langford [11]. A graph is constructed on the high-dimensional data, and shortest paths on the graph are used as the metric for multidimensional scaling (MDS). The technique is global, since the graph is fully connected. It is proved that Isomap converges to the true low-dimensional nonlinear manifold, given dense enough data, just as PCA and MDS are guaranteed to converge to the true low-dimensional linear manifold. Jenkins and Mataric extended the graph-based dimensionality reduction technique Isomap to handle temporal data in ST-Isomap [6]. Locally Linear Embedding (LLE) is another nonlinear dimensionality reduction technique, introduced by Saul and Roweis [1]. As opposed to Isomap, this technique is local; each data point is only concerned with its nearest neighbors in feature space. A temporal extension of the LLE framework is identified in [13]. Theoretical Model The Gaussian Process is a generalization of the Gaussian distribution[8]. Informally, parametric models can be reformulated in the nonparametric Gaussian Process framework: 1. The model is made nonlinear. 2. The model is made probabilistic. 3. The model is treated as having an infinite number of hidden parameters. For example, a Bayesian treatment of Neural Networks leads naturally to a technique now known as Gaussian Processes for regression. This model neatly addresses network architecture, stopping criteria, regularization, model complexity, error estimation (both for model parameters and predictions), and automatic penalization of over-complex or over-flexible models [9]. Furthermore, empirical evaluations of Gaussian Processes for regression show that it is competitive with Neural Networks [9]. In 2003, Lawrence introduced a probabilistic method for Principal Component Analysis [10]. He then showed that this model entailed a linear function prior, and that by generalizing to nonlinear Gaussian Process priors, one arrives at an elegant nonlinear dimensionality reduction technique, the Gaussian Process Latent Variable Model (GPLVM). The GPLVM optimizes model parameters simultaneously with the latent space locations that correspond to the observed feature vectors. Gaussian Process Dynamical Models In a time-series modeling problem, we are given timelabeled measurements Y. The GPDM model is obtained by maximizing the probability of the underlying low dimensional points X and the associated hyperparameters simultaneously. The resulting model consists of a mapping from low dimensional space to high dimensional space (XàY), and a model of the dynamics in the low dimensional space (XtàXt+1) as depicted in Diagram 1. The combination of these models can be used to perform time series forecasting in the original high-dimensional space. Diagram 1. Diagram of the GPDM. State vectors propagate through latent space (x) over time, and are projected into the observed feature space (y). The GPDM can be constructed by considering Markov dynamics and Gaussian noise processes, and functions to model the mapping in latent space over time and to model the projection from latent to feature space, each with Gaussian noise contributions. (equations are reproduced from [14]) Here x is a latent point, y is a feature point, A and B are hyperparameters, and the n are Gaussian noise terms. Now consider that each mapping is constructed from the family of basis functions, with hyperparameters a and b. In general it is a difficult problem to induce the number of weights, the weights themselves, and the basis functions. However it has been shown [14] that in the Bayesian framework these parameters can be marginalized out, and the resulting probabilistic model is: Here Y is the vector of feature points, W is a set of hyperparameters, and Ky is a kernel matrix, here the RBF kernel. A different argument is required to obtain the dynamical probability model (since neighbors in time are co-dependent) [14]: (Lawrence points out that this entails an awkward prior, and recommends instead fixing the hyperparameters to do the dynamical mapping—here we follow Wang’s formulation.) There is flexibility in the choice of kernel for this term: in analogy with past work on dynamical systems, Wang recommends a combination of RBF and linear terms, and shows them to work together effectively. Given an observed feature dataset Y, the GPDM model hyperparameters (alpha and beta) and corresponding latent points X are obtained by minimizing the negative log posterior: We use Lawrence’s implementation (available online), which minimizes this objective function using the scaled conjugate gradient technique. Experiments Wang showed GPDM to be a successful modeling technique for cyclic data and for short non-cyclic trajectories [3]. In this paper, we investigate the effectiveness of the GPDM on chaotic datasets with a series of experiments. Gaussian Process algorithms generally scale as O(n^3) since the kernel matrix must be inverted at each step, so dataset size has been restricted for these experiments. Experiment 1 First, we investigate the chaotic trajectory from Figure 1. This figure depicts the x-z projection of the Lorenz attractor with [a,b,r]= [16,4,45]. The data was generated with 4 order Runge-Kutta with a fixed time step of 0.001. The initial point for the trajectory is selected after the initial transient has finished. The final dataset was subsampled (every 5 data point) to extend the trajectory, and reduce the total number of points. The results for 5-nearest neighbor LLE dimensionality reduction on this data are shown in Figure 2 (without the temporal extension in [13].) The results for GPLVM are shown in Figure 3, and the results for GPDM are shown in Figure 4. The results from LLE (Figure 2) are easy to interpret. The left lobe in Figure 2 corresponds to the right lobe in Figure 1. The major features are captured, but the trajectories are somewhat garbled near the intersection of the lobes. The right lobe in Figure 2 also has collapsed 2 orbits into one. The GPLVM results (Figure 3) provide several advantages over LLE. In particular, the result of dimensionality reduction is not merely the sequence of latent points, but a complete probability distribution in the latent space. The probability distribution of the right lobe in Figure 3 looks qualitatively correct, although there are several breaks in the latent trajectories. The lobe on the left looks problematic; rather than a hollow-circle of probability it is a filled-circle. This means that the trajectory would be as likely to be found in the middle as on the edge, which is incorrect for this dataset. The GPDM (Figure 4) overcomes the problems in both the LLE and the GPLVM; the breaks have been closed and the probability distribution closely matches the trajectory. Furthermore, the latent points appear to correspond nearly isometrically to the points in the original x-z projection of the data. Figure 1. This is the dataset for the first run of experiments. This is the x-z projection of the Lorenz attractor with [a,b,r]= [16,4,45]. Figure 2. The result of dimensionality reduction on the Lorenz data from Figure 1 using Locally Linear Embedding (LLE) with 5 nearest neighbors. Figure 3. The result of dimensionality reduction on the Lorenz data from Figure 1 using GPLVM. Figure 4. The result of dimensionality reduction on the Lorenz data from Figure 1 using GPDM. Experiment 2 The previous experiment showed GPDM to be an effective technique, but only reduced the dimension from the original 3-dimensional space to 2-dimensional space. In the next experiment, we artificially convert the original 3-dimensional time series data into 100dimensional data (10x10 grayscale images) before applying the GPDM. In particular, we choose the y-value of the original 3D data point to represent the color of a grayscale ellipse against a black background. The width and height of the ellipse are determined by the x-value and z-value of the data point, respectively. This projection is made without noise. The 500 point trajectory is depicted in this representation in Figure 5 (every other point is shown). The result of GPDM on the 100-dimensional dataset is shown in Figure 6. The scaled conjugate gradient optimization terminates after a surprisingly low number of iterations; the reason for this is not investigated here. The 2-orbit lobe (right lobe) looks qualitatively correct. The 3-orbit lobe, however, appears to have some problems. First, the size of the lobe is reduced; however, GPDM is not a global method, so we wouldn’t expect this level of geometry to be preserved. Second, the probability looks like it is not centered on the trajectory. While it is apparent in the original Lorenz data that many points are nearly overlapping, it is possibly that our mapping has created new neighbor pairs in feature space that were not neighbors in the original 3D space (that is, some information may be lost in the projection). Figure 5. This shows the artificial projection of the original 3D Lorenz data into 100 dimensional (10x10) grayscale images. Figure 6. The result of GPDM on the 100-dimensional dataset. Experiment 3 The data has been projected noiselessly into 9-dimensional space by the following highly nonlinear many-to-one function: Y= {x * y, sqrt( z ), y / (abs( z ) + 1 ), z * z, cos( x ), abs( y ), z * x, z * y, x * z } where x,y and z are the original Lorenz values, and Y is the resultant 9-dimensional feature vector. Note that this function is many-to-one; the absolute value and cosine functions destroy information in the projection. We would not expect this information to be discovered by any dimensionality reduction technique. The result of GPDM on this 9-dimensional feature space is shown in Figure 7. There are several gaps, and somewhat broken trajectories, but in general, the lowdimensional projection obtained from the GPDM is surprisingly good for this nonlinear and many-to-one dataset. Figure 7. The results of GPDM on the nonlinear projection of Lorenz data described in Experiment 3. Conclusions and Future Work The GPDM has been shown to be an effective technique for modeling and visualizing high dimensional chaotic time series data. Here, we have demonstrated a number of qualitative results of GPDM’s effectiveness on a short trajectory in the Lorenz attractor and some of its nonlinear projections. This work has focused on a particular subset of experiments to investigate the possibility of using GPDM for modeling high dimensional time series. However, there are many remaining questions, issues and caveats: The results presented here have been primarily qualitative. Future work should quantify the behaviors observed here, possibly through a time series prediction, perhaps using the mean-prediction technique described in [3]. It is difficult to make absolute claims about such short chaotic trajectories. Future work could use an approximation to the Gaussian Process algorithm, avoiding the O(n^3) behavior, and allowing for analysis of more substant ial trajectory sizes. Neil Lawrence [7] shows that constraining points in feature space to be nearby can produce a smooth model and alleviate several problems with the dimensionality reduction, even in the absence of a latent dynamical model. Future work could evaluate the combination of this so-called back-constrained model with a latent dynamical model. Another future direction could look for a way to exploit the chaotic geometry to model the underlying dynamics (perhaps following techniques such as those in Farmer’s method [12]). It would be interesting to plot the effectiveness of this technique as a function of increasing lyapunov exponent; for example running Experiment 1 repeatedly for increasing R in the Lorenz attractor to see how the performance degrades as the lyapunov exponent increases. In our experiments, we have chosen to map the data into 2D for ease of visualization; however, chaotic datasets can only occur in d>=3. Future work should determine whether reduction to 2D is inherently flawed or if a higher dimensional projection can be more effective. In future work, this model should be evaluated on real-world high dimensional chaotic time series datasets, and compared to standard methods for time series analysis and prediction. In these experiments noise has been ignored. We would like to see how gracefully the model degrades with the incremental introduction of noise. AcknowledgmentsThanks to Neil Lawrence for publishing his GPLVM and GPDM code, supporting utilitysoftware, and documentation, which were used in our experiments. Thanks to Jack Wangfor his thesis. References[1] Roweis S., Saul L., Nonlinear dimensionality reduction by locally linear embedding.Science, v.290 no.5500 , Dec.22, 2000. pp.2323--2326.[2] Mikra S., Scholkopf, B., Smola, A., Muller, K., Scholz, M., Ratsch, G. Kernel PCAand De-Noising in Feature Spaces, NIPS, 1998[3] Wang, J., Fleet, D., Hertzmann, A. Gaussian Process Dynamical Models. Proc. NIPS2005[4] Zoeter, O. and Heskes, T. Hierarchical visualization of time-series data usingswitching linear dynamical systems. Proceedings UAI, pages 1201--1214, 2003. [5] Dorffner, G. Neural networks for time series processing. Neural Network World,6(4):447–468, 1996.[6] Jenkins, O. C. and Mataric, M. J. A Spatio-temporal Extension to Isomap NonlinearDimension Reduction. Proceedings, International Conference on Machine Learning(ICML-2004), Banff, Canada, July 4-8, 2004, 441-448.[7] Lawrence, N. Recent unpublished work. http://www.dcs.shef.ac.uk/~neil/gplvmcpp/[8] M. Seeger. Gaussian processes for machine learning. International Journal of NeuralSystems, 14(2):69-106, 2004.[9] MacKay, D. J. C., A Practical Bayesian Framework for Backprop Networks[10] Lawrence, N. Gaussian Process Latent Variable Models for Visualization of HighDimensional Data. Neural Information Processing Systems, 2003.[11] Tenenbaum, J. B., Silva, V., Langford, J.C. A Global Geometric Framework forNonlinear Dimensionality Reduction. Science, Vol 290, December 2000[12] Farmer, J.D., Sidorowich J. Exploiting chaos to predict the future and reduce noise.In Y. C. Lee, editor, Evolution, Learning, and Cognition, page 277. World Scientific,1988.[13] Tangkuampien T., Chin. T. J. Locally Linear Embedding for Markerless HumanMotion Capture using Multiple Cameras. In Proceedings of Digital Image Computing:Techniques and Applications, 2005.[14] Wang, J. M. Gaussian Process Dynamical Models for Human Motion. Master'sthesis, University of Toronto. September, 2005.http://www.dgp.toronto.edu/~jmwang/gpdmthesis.pdf
منابع مشابه
Missing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کاملHigh-performance three-dimensional maneuvers control in the area of spacecraft
Contemporary research is improving techniques to maneuvers control in the area of spacecraft. In the aspect of further development of investigations, a high-performance strategy of maneuvers control is proposed in the present research to be applicable to deal with a class of the aforementioned spacecrafts. In a word, the main subject behind the research is to realize a high-performance three-di...
متن کاملEvaluation of the Efficiency of the Adaptive Neuro Fuzzy Inference System (ANFIS) in the Modeling of the Ionosphere Total Electron Content Time Series Case Study: Tehran Permanent GPS Station
Global positioning system (GPS) measurements provide accurate and continuous 3-dimensional position, velocity and time data anywhere on or above the surface of the earth, anytime, and in all weather conditions. However, the predominant ranging error source for GPS signals is an ionospheric error. The ionosphere is the region of the atmosphere from about 60 km to more than 1500 km above the eart...
متن کاملRisk prediction based on a time series case study: Tazareh coal mine
In this work, the time series modeling was used to predict the Tazareh coal mine risks. For this purpose, initially, a monthly analysis of the risk constituents including frequency index and incidence severity index was performed. Next, a monthly time series diagram related to each one of these indices was for a nine year period of time from 2005 to 2013. After extrusion of the trend, seasonali...
متن کاملAnalytical D’Alembert Series Solution for Multi-Layered One-Dimensional Elastic Wave Propagation with the Use of General Dirichlet Series
A general initial-boundary value problem of one-dimensional transient wave propagation in a multi-layered elastic medium due to arbitrary boundary or interface excitations (either prescribed tractions or displacements) is considered. Laplace transformation technique is utilised and the Laplace transform inversion is facilitated via an unconventional method, where the expansion of complex-valued...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006