Calibrated Nonparametric Scan Statistics for Anomalous Pattern Detection in Graphs
نویسندگان
چکیده
We propose a new approach, the calibrated nonparametric scan statistic (CNSS), for more accurate detection of anomalous patterns in large-scale, real-world graphs. Scan statistics identify connected subgraphs that are interesting or unexpected through maximization likelihood ratio statistic; particular, (NPSSs) with higher than expected proportion individually significant nodes. However, we show recently proposed NPSS methods miscalibrated, failing to account over multiplicity subgraphs. This results both reduced power subtle signals, and low precision detected subgraph even stronger signals. Thus develop statistical approach recalibrate NPSSs, correctly adjusting multiple hypothesis testing taking underlying graph structure into account. While recalibration, based on randomization testing, is computationally expensive, an efficient (approximate) algorithm new, closed-form lower bounds (on maximum nodes given size, under null no patterns). These advances, along integration recent core-tree decomposition methods, enable CNSS scale large graphs, substantial improvement accuracy Extensive experiments semi-synthetic datasets demonstrated validate effectiveness our comparison state-of-the-art counterparts.
منابع مشابه
Scan Statistics for the Online Detection of Locally Anomalous Subgraphs
OF DISSERTATION Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Statistics The University of New Mexico Albuquerque, New Mexico
متن کاملFast generalized subset scan for anomalous pattern detection
We propose Fast Generalized Subset Scan (FGSS), a new method for detecting anomalous patterns in general categorical data sets. We frame the pattern detection problem as a search over subsets of data records and attributes, maximizing a nonparametric scan statistic over all such subsets. We prove that the nonparametric scan statistics possess a novel property that allows for efficient optimizat...
متن کاملBayesian Network Scan Statistics for Multivariate Pattern Detection
We review three recently proposed scan statistic methods for multivariate pattern detection. Each method models the relationship between multiple observed and hidden variables using a Bayesian network structure, drawing inferences about the underlying pattern type and the affected subset of the data. We first discuss the multivariate Bayesian scan statistic (MBSS) proposed by Neill and Cooper (...
متن کاملScan Statistics on Enron Graphs
We introduce a theory of scan statistics on graphs and apply the ideas to the problem of anomaly detection in a time series of Enron email graphs. Corresponding author: Carey E. Priebe =
متن کاملScan Statistics for Interstate Alliance Graphs
In this paper we discuss work on graphs defined in terms of alliances between countries. We will use scan statistics to investigate years in which there are an unusual number of agreements, not just between one country and its allies, but amongst the allies themselves. This is related to work on email “chatter” discussed in Priebe et al. [2006]. In this section we will lay out the basic graph t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i4.20339