FURL: Fixed-memory and Uncertainty Reducing Local Triangle Counting for Graph Streams
نویسندگان
چکیده
How can we accurately estimate local triangles for all nodes in simple and multigraph streams? Local triangle counting in a graph stream is one of the most fundamental tasks in graph mining with important applications including anomaly detection and social network analysis. Although there have been several local triangle counting methods in a graph stream, their estimation has a large variance which results in low accuracy, and they do not consider multigraph streams which have duplicate edges. In this paper, we propose FURL, an accurate local triangle counting method for simple and multigraph streams. FURL improves the accuracy by reducing a variance through biased estimation and handles duplicate edges for multigraph streams. Also, FURL handles a stream of any size by using a fixed amount of memory. Experimental results show that FURL outperforms the state-of-the-art method in accuracy and performs well in multigraph streams. In addition, we report interesting patterns discovered from real graphs by FURL, which include unusual local structures in a user communication network and a core-periphery structure in the Web.
منابع مشابه
Efficient Algorithms for Approximate Triangle Counting
Counting the number of triangles in a graph has many important applications in network analysis. Several frequently computed metrics like the clustering coefficient and the transitivity ratio need to count the number of triangles in the network. Furthermore, triangles are one of the most important graph classes considered in network mining. In this paper, we present a new randomized algorithm f...
متن کاملOn Sampling from Massive Graph Streams
We propose Graph Priority Sampling (GPS), a new paradigm for order-based reservoir sampling from massive graph streams. GPS provides a general way to weight edge sampling according to auxiliary and/or size variables so as to accomplish various estimation goals of graph properties. In the context of subgraph counting, we show how edge sampling weights can be chosen so as to minimize the estimati...
متن کاملMicrobio-ecology and hydro-geochemistry of saline sulfur springs of Ghale-Madreseh, Khuzestan, Iran
Ghale-madreseh is the first point that the saline and sulfurous streams flow into Tembi River, one of the well-known saline rivers in Khuzestan province, Iran. This river is one of the main sources of increasing Karun River’s salinity, which is the largest river in Iran in terms of discharge. There are three saline and sulfurous springs (Shour-1, Shour-2m and Namak Springs) as well as a drinkab...
متن کاملMicrobio-ecology and hydro-geochemistry of saline sulfur springs of Ghale-Madreseh, Khuzestan, Iran
Ghale-madreseh is the first point that the saline and sulfurous streams flow into Tembi River, one of the well-known saline rivers in Khuzestan province, Iran. This river is one of the main sources of increasing Karun River’s salinity, which is the largest river in Iran in terms of discharge. There are three saline and sulfurous springs (Shour-1, Shour-2m and Namak Springs) as well as a drinkab...
متن کاملDiSLR: Distributed Sampling with Limited Redundancy For Triangle Counting in Graph Streams
Given a web-scale graph that grows over time, how should its edges be stored and processed on multiple machines for rapid and accurate estimation of the count of triangles? e count of triangles (i.e., cliques of size three) has proven useful in many applications, including anomaly detection, community detection, and link recommendation. For triangle counting in large and dynamic graphs, recent...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1611.06615 شماره
صفحات -
تاریخ انتشار 2016