One sketch for all: Theory and Application of Conditional Random Sampling

نویسندگان

  • Ping Li
  • Kenneth Ward Church
  • Trevor J. Hastie
چکیده

Abstract Conditional Random Sampling (CRS) was originally proposed for efficiently computing pairwise (l2, l1) distances, in static, large-scale, and sparse data. This study modifies the original CRS and extends CRS to handle dynamic or streaming data, which much better reflect the real-world situation than assuming static data. Compared with many other sketching algorithms for dimension reductions such as stable random projections, CRS exhibits a significant advantage in that it is “one-sketch-for-all.” In particular, we demonstrate the effectiveness of CRS in efficiently computing the Hamming norm, the Hamming distance, the lp distance, and the χ distance. A generic estimator and an approximate variance formula are also provided, for approximating any type of distances. We recommend CRS as a promising tool for building highly scalable systems, in machine learning, data mining, recommender systems, and information retrieval.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conditional Random Sampling: A Sketch-based Sampling Technique for Sparse Data

Abstract We1 develop Conditional Random Sampling (CRS), a technique particularly suitable for sparse data. In large-scale applications, the data are often highly sparse. CRS combines sketching and sampling in that it converts sketches of the data into conditional random samples online in the estimation stage, with the sample size determined retrospectively. This paper focuses on approximating p...

متن کامل

مقایسه مدل‌های لجستیک حاشیه‌ای با اندازه‌گیری مکرر و لجستیک شرطی در بررسی عوامل موثر بر پرفشاری خون

Background and purpose: To analyze the data in which the correlation between observations are to be considered, a general method is using marginal model with repeated measures, yet there is another method called conditional model with random clusters. Âccording to the binary responses, the aim of the present study is to compare the efficiency of these two models in studying the risk factors a...

متن کامل

Application of Sequential Gaussian Conditional Simulation to Underground Mine Design Under Grade Uncertainty

In mining projects, all uncertainties associated with a project must be considered to determine the feasibility study. Grade uncertainty is one of the major components of technical uncertainty that affects the variability of the project. Geostatistical simulation, as a reliable approach, is the most widely used method to quantify risk analysis to overcome the drawbacks of the estimation methods...

متن کامل

Efficient Simulation of a Random Knockout Tournament

We consider the problem of using simulation to efficiently estimate the win probabilities for participants in a general random knockout tournament. Both of our proposed estimators, one based on the notion of “observed survivals” and the other based on conditional expectation and post-stratification, are highly effective in terms of variance reduction when compared to the raw simulation estimato...

متن کامل

Application of the theory of reasoned action to promoting breakfast consumption

Background: Breakfast is the most important daily meal, but neglected more than other meals by children and adolescents. The aim of this study was to evaluate the effectiveness of an educational intervention, based on the Theory of Reasoned Action (TRA) to increase breakfast consumption among school children in Bandar Abbas, Iran. Methods: In this quasi experimental study which was conducted...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008