Nearest-neighbor method using multiple neighborhood similarities for social media data mining
نویسندگان
چکیده
Currently, Nearest-Neighbor approaches (NN) have been applied to large scale real world image data mining. However, the following three disadvantages prevent them from wider application compared to other machine learning methods: (i) the performance is inferior on small datasets; (ii) the performance will degrade for data with high dimensions; (iii) they are heavily dependent on the chosen feature and distance measure. In this paper, we try to overcome the three mentioned intrinsic weaknesses by taking the abundant and diversified content of social media images into account. Firstly, we propose a novel neighborhood similarity measure which encodes both the local density information and semantic information, thus it has better generalization power than the original image-to-image similarity. Secondly, to enhance the scalability, we adopt kernelized Locality Sensitive Hashing (KLSH) to conduct approximated nearest neighbor search by utilizing a set of kernels calculated on several complementary image features. Finally, to enhance the robustness on diversified genres of images, we propose to fuse the discrimination power of different features by combining multiple neighborhood similarities calculated on different features/kernels with the entire retrieved nearest labeled and unlabeled image via the hashing systems. Experimental results on visual categorization on the Caltech-256 and two social media databases show the advantage of our method over traditional NN methods using the
منابع مشابه
Bayesian adaptive nearest neighbor
The k nearest neighbor classification (k-NN) is a very simple and popular method for classification. However, it suffers from a major drawback, it assumes constant local class posterior probability. It is also highly dependent on and sensitive to the choice of the number of neighbors k. In addition, it severely lacks the desired probabilistic formulation. In this article, we propose a Bayesian ...
متن کاملDetection of some Tree Species from Terrestrial Laser Scanner Point Cloud Data Using Support-vector Machine and Nearest Neighborhood Algorithms
acquisition field reference data using conventional methods due to limited and time-consuming data from a single tree in recent years, to generate reference data for forest studies using terrestrial laser scanner data, aerial laser scanner data, radar and Optics has become commonplace, and complete, accurate 3D data from a single tree or reference trees can be recorded. The detection and identi...
متن کاملNon-zero probability of nearest neighbor searching
Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...
متن کاملModelling Climatic Parameters Affecting the Annual Yield of Rheum Ribes Rangeland Species using Data Mining Algorithms
Identification of climatic characteristics affecting the annual yield of Rheum Ribes can be useful in management and development of this species in the rangelands. In this research, the annual yield of this species in Khorasan-Razavi province based on 74 climatic parameters during a ten-year period evaluated and affecting climatic parameters extracted using data mining methods. First, the role ...
متن کاملNeighborhood-Restricted Mining and Weighted Application of Association Rules for Recommenders
Association rule mining algorithms such as Apriori were originally developed to automatically detect patterns in sales transactions and were later on also successfully applied to build collaborative filtering recommender systems (RS). Such rule mining-based RS not only share the advantages of other model-based systems such as scalability or robustness against different attack models, but also h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neurocomputing
دوره 95 شماره
صفحات -
تاریخ انتشار 2012