Large Scale Clustering of Dependent Curves
نویسندگان
چکیده
In this paper, we introduce a model-based method for clustering multiple curves or functionals under spatial dependence specified up to a set of unknown parameters. The functionals are decomposed using a semi-parametric model where the fixed effects account for the large-scale clustering association and the random effects for the small scale spatialdependence variability. The clustering model assumes the clustering membership as a realization from a Markov random field. Within our estimation framework, the emphasis is on a large number of functionals/spatial units with sparsely sampled time points. To overcome the computational cost resulting from large dependence matrix operations, the estimation algorithm includes a two-stage approximation: low-ranked kernel-based decomposition of the dependence matrix and Incomplete Choslesky Factorization of the kernel matrix. We assess the performance of our clustering approach within a simulation study. The simulation results show enhanced clustering estimation accuracy of our method compared with other existing model-based clustering methods under a series of settings: small number of time points, low signal-to-noise ratio and different spatial dependence structures. Many case studies will fall within our clustering framework, but we focus on obtaining fine-grid spatial clusters for demographics trends including ethnicity and income for five southern states of US over the past 11 years.
منابع مشابه
A partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملOnline Aggregation of Coherent Generators Based on Electrical Parameters of Synchronous Generators
This paper proposes a novel approach for coherent generators online clustering in a large power system following a wide area disturbance. An interconnected power system may become unstable due to severe contingency when it is operated close to the stability boundaries. Hence, the bulk power system controlled islanding is the last resort to prevent catastrophic cascading outages and wide area bl...
متن کاملCentralized Clustering Method To Increase Accuracy In Ontology Matching Systems
Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...
متن کاملتجمع بیماری در مقیاسی وسیع و کاربرد آن در مطالعات اپیدمیولوژی و بهداشت
Spatial autocorrelation statistics provide summary information about the spatial arrangement of data in a map. In fact, these statistics compare neighboring area values in order to assess the level of large scale clustering. Whenever a large number of neighboring areas have either relatively large or relatively small values, large scale clustering may be detected. Detecting such clustering is a...
متن کاملPredictive Modeling of Large-scale Curves and Its Application on GDP Prediction of Multi-regions
Traditional approach to predict large-scale sequential curves is to build model separately according to every curve, which causes heavy and complicated modeling workload inevitably. Therefore the existing approach is lack of manipuility in the application. A new method is proposed in this paper to solve this problem. By reducing model types of curves, clustering curves and modeling by clusters,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008