Test Set Bounds for Relational Data that vary with Strength of Dependence Test Set Bounds for Relational Data that Vary with Strength of Dependence

نویسندگان

  • Amit Dhurandhar
  • Alin Dobra
چکیده

A large portion of the data that is collected in various application domains such as online social networking, finance, biomedicine, etc. is relational in nature. A subfield of Machine Learning namely; Statistical Relational Learning (SRL) is concerned with performing statistical inference on relational data. A defining property of relational data that separates it from independently and identically distributed data (i.i.d.) is the existence of correlations between individual datapoints. A major portion of the theory developed in machine learning assumes the data is i.i.d. In this paper we develop theory for the relational setting. In particular, we derive distribution-free bounds on the generalization error of a classifier for the relational setting, where the class of data generation models we consider are inspired from the type joint distributions that are represented by relational classification models developed by the SRL community. A key aspect of the bound we derive is that the tightness of the bound is a function of the strength of dependence between related datapoints, with the bound reducing to the standard Hoeffding’s or McDiarmid’s inequality when there is no dependence. To the best of our knowledge this is the first bound for relational data whose tightness varies with the strength of dependence. Moreover, the bound provides insight in the computation of effective sample size which is an important notion introduced by Jensen and Neville (2002).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Test Set Bounds for Relational Data that vary with Strength of Dependence

A large portion of the data that is collected in various application domains such as online social networking, finance, biomedicine, etc. is relational in nature. A subfield of Machine Learning namely; Statistical Relational Learning (SRL) is concerned with performing statistical inference on relational data. A defining property of relational data that separates it from independently and identi...

متن کامل

Auto-correlation Dependent Bounds for Relational Data

A large portion of the data that is collected in various application domains such as online social networking, finance, biomedicine, etc. is relational in nature. A subfield of Machine Learning namely; Statistical Relational Learning (SRL) is concerned with performing statistical inference on relational data. A defining property of relational data that separates it from independently and identi...

متن کامل

Efficiency Evaluation and Ranking DMUs in the Presence of Interval Data with Stochastic Bounds

On account of the existence of uncertainty, DEA occasionally faces the situation of imprecise data, especially when a set of DMUs include missing data, ordinal data, interval data, stochastic data, or fuzzy data. Therefore, how to evaluate the efficiency of a set of DMUs in interval environments is a problem worth studying. In this paper, we discussed the new method for evaluation and ranking i...

متن کامل

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Prediction of Pervious Concrete Permeability and Compressive Strength Using Artificial Neural Networks

Pervious concrete is a concrete mixture prepared from cement, aggregates, water, little or no fines, and in some cases admixtures. The hydrological property of pervious concrete is the primary reason for its reappearance in construction. Much research has been conducted on plain concrete, but little attention has been paid to porous concrete, particularly to the analytical prediction modeling o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009