Scalable Nearest Neighbors with Guarantees in Large and Composite Networks
نویسندگان
چکیده
We address the problem of k Nearest Neighbor (kNN) search in networks, according to a random walk proximity measure called Effective Importance. Our approach retrieves the exact top neighbors at query time without relying on off-line indexing or summaries of the entire network. This makes it suitable for very large dynamic networks, as well as for composite network overlays mixed at query time. We provide scalability and flexibility without compromising the quality of results due to theoretical bound guarantees that we develop and incorporate in our search procedure. We incrementally construct a subgraph of the underlying network, sufficient to obtain the exact top k neighbors. We guide the construction of the relevant subgraph in order to achieve fast refinement of the lower and upper proximity bounds, which in turn enables effective pruning of infeasible candidates. We apply our kNN search algorithm on social, information and biological networks and demonstrate the effectiveness and scalability of our approach. For networks in the order of a million nodes, our method retrieves the exact top 20 using less than 0.2% of the network edges in a fraction of a second on a conventional desktop machine without prior indexing. When employed for nearest neighbors search in composite network overlays, it scales linearly with the number of networks mixed in the overlay.
منابع مشابه
Fast Nearest Neighbors in Large and Composite Networks
We address the problem of k Nearest Neighbor (kNN) search in networks using a random walk based proximity measure. Our approach retrieves the exact top neighbors at query time without relying on off-line indexing or summaries of the entire network. This makes it suitable for very large networks, as well as for composite network overlays mixed at query time. We provide scalability and flexibilit...
متن کاملComparison and evaluation of the performance of data-driven models for estimating suspended sediment downstream of Doroodzan Dam
Dams control most of the sediment entering the reservoir by creating static environments. However, sediment leaving the dam depends on various factors such as dam management method, inlet sediment, water height in the reservoir, the shape of the reservoir, and discharge flow. In this research, the amount of suspended sediment of Doroodzan Dam based on a statistical period of 25 years has been i...
متن کاملK-Nearest Neighbor Search in Peer-to-Peer Systems
Data classification in large scale systems, such as peer-to-peer networks, can be very communication-expensive and impractical due to the huge amount of available data and lack of central control. Frequent data updates pose even more difficulties when applying existing classification techniques in peer-to-peer networks. We propose a distributed, scalable and robust classification algorithm base...
متن کاملA Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملFast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles
Popular outlier detection methods require the pairwise comparison of objects to compute the nearest neighbors. This inherently quadratic problem is not scalable to large data sets, making multidimensional outlier detection for big data still an open challenge. Existing approximate neighbor search methods are designed to preserve distances as well as possible. In this article, we present a highl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010