Sharp analysis of low-rank kernel matrix approximations
نویسنده
چکیده
We consider supervised learning problems within the positive-definite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to infinite-dimensional feature spaces, a common practical limiting difficulty is the necessity of computing the kernel matrix, which most frequently leads to algorithms with running time at least quadratic in the number of observations n, i.e., O(n). Low-rank approximations of the kernel matrix are often considered as they allow the reduction of running time complexities to O(pn), where p is the rank of the approximation. The practicality of such methods thus depends on the required rank p. In this paper, we show that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods, and is often seen as the implicit number of parameters of non-parametric estimators. This result enables simple algorithms that have sub-quadratic running time complexity, but provably exhibit the same predictive performance than existing algorithms, for any given problem instance, and not only for worst-case situations.
منابع مشابه
On the Impact of Kernel Approximation on Learning Accuracy
Kernel approximation is commonly used to scale kernel-based algorithms to applications containing as many as several million instances. This paper analyzes the effect of such approximations in the kernel matrix on the hypothesis generated by several widely used learning algorithms. We give stability bounds based on the norm of the kernel approximation for these algorithms, including SVMs, KRR, ...
متن کاملOn the numerical rank of radial basis function kernels in high dimension
Low-rank approximations are popular methods to reduce the high computational cost of algorithms involving large-scale kernel matrices. The success of low-rank methods hinges on the matrix rank, and in practice, these methods are effective even for high-dimensional datasets. The practical success has elicited the theoretical analysis of the function rank in this paper, which is an upper bound of...
متن کاملLearning the kernel matrix via predictive low-rank approximations
Efficient and accurate low-rank approximations to multiple data sources are essential in the era of big data. The scaling of kernel-based learning algorithms to large datasets is limited by the O(n) complexity associated with computation and storage of the kernel matrix, which is assumed to be available in most recent multiple kernel learning algorithms. We propose a method to learn simultaneou...
متن کاملAsymptotic error bounds for kernel-based Nyström low-rank approximation matrices
Many kernel-based learning algorithms have the computational load scaled with the sample size n due to the column size of a full kernel Gram matrix K. This article considers the Nyström low-rank approximation. It uses a reduced kernel ?̂?, which is n×m, consisting of m columns (say columns i1, i2,···, im) randomly drawn from K. This approximation takes the form K ≈ ?̂?U?̂?, where U is the reduced ...
متن کاملAsymptotic error bounds for kernel-based Nystrm low-rank approximation matrices
• Many kernel-based learning algorithms have the computational load. • The Nyström low-rank approximation is designed for reducing the computation. • We propose the spectrum decomposition condition with a theoretical justification. • Asymptotic error bounds on eigenvalues and eigenvectors are derived. • Numerical experiments are provided for covariance kernel and Wishart matrix. AMS subject cla...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013