Piecewise Linear Histograms for Selectivity Estimation

نویسنده

  • Xiaohui Yu
چکیده

Selectivity estimation of queries is of critical importance to query optimization. In order to get accurate estimations, database management systems must maintain statistics to capture the underlying data distribution. Histograms are extensively used in commercial database systems for this purpose. Most current histogram techniques make the assumption that all values in a single bucket appear with the same frequency, which rarely holds true in practice. Though wavelet-based histograms have recently appear as a strong alternative, several problems inherent with the wavelet techniques prevent it from practical use in database systems. In this paper, we propose a new type of histograms called piecewise linear histograms. Frequencies of attribute values in a bucket of the piecewise linear histogram is t by a line using linear least squares regression, and the coeecients are stored as synopsis of the underlying data distribution. Moreover, since nding the best partition of the domain into buckets is a NP-hard problem, we propose a heuris-tic to eeciently determine the boundaries of buckets. Experimental results show that the piecewise linear histograms signiicantly outperform wavelet-based histograms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A robust thresholding algorithm for unimodal image histograms

This article introduces a method to determine the threshold of unimodal image histograms in a robust manner. It is based on a piecewise linear regression that finds the two segments that fit the descending slope of the histogram. The algorithm gives a good estimation of the threshold, and is practically insensitive to the noise distribution, to the quantity of objects to segment, and to random ...

متن کامل

Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms

Let p be an unknown and arbitrary probability distribution over [0, 1). We consider the problem of density estimation, in which a learning algorithm is given i.i.d. draws from p and must (with high probability) output a hypothesis distribution that is close to p. The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram,...

متن کامل

Piecewise linear density estimation for sampled data

Abstract – Nonparametric density estimation is considered for a discretely observed stationary continuous-time process. For each of three given time sampling procedures either random or deterministic, we establish that histograms and frequency polygons can reach the same optimal L2-rates as in the independent and identically distributed case. Moreover, thanks to a suitable “high frequency” samp...

متن کامل

Smooth Interpolating Histograms with Error Guarantees

Accurate selectivity estimations are essential for query optimization decisions where they are typically derived from various kinds of histograms which condense value distributions into compact representations. The estimation accuracy of existing approaches typically varies across the domain, with some estimations being very accurate and some quite inaccurate. This is in particular unfortunate ...

متن کامل

Query-Condition-Aware Histograms in Selectivity Estimation Method

The paper shows an adaptive approach to the query selectivity estimation problem for queries with a range selection condition based on continuous attributes. The selectivity factor estimates a size of data satisfying a query condition. This estimation is calculated at the initial stage of the query processing for choosing the optimal query execution plan. A non-parametric estimator of probabili...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001