A note on measuring overlap

abstract:

In measuring the overlap between two sets A and B (e.g. libraries, databases) one is obliged to calculate the overlap O(A|B) of A with respect to B (i.e. the fraction of elements of B that are also in A) and of O(B|A) of B with respect to A (i.e. the fraction of elements in A that are also in B). Theoretically this requires two samples. In this paper we explain that one sample can suffice to determine confidence intervals for both O(A|B) and O(B|A). The paper closes with the example of measuring the overlap between the secondary sources in mathematics MathSciNet and Zentralblatt MATH and with a remark on the estimation of the Jaccard index.

برای دسترسی به متن کامل این مقاله و 9 میلیون مقاله دیگر ابتدا ثبت نام کنید.

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

ورود

Similar resources

Abstract of spoken word recognition is proposed. This model is particularly concerned with extraction of cues from the signal leading to a specification of a word in terms of bundles of distinctive features, which are assumed to be the building blocks of words. In the model proposed, auditory input is chunked into a set of successive time slices. It is assumed that the derivation of the underly...

Measuring development, growth and welfare is an important issue in normative and positive economics.The issue is more critical in developing economies where a good statistical indicator of income, livingstandard or poverty is crucial for decision-makers in corporate, government, non-government andinternational organizations in their for-profit or non-profit plans to promote business and trade, ...

Overlap syndrome, which is known as the coexistence of chronic obstructive pulmonary disease (COPD) and obstructive sleep apnea (OSA), was first defined by Flenley. Although it can refer to concomitant occurrence of any of the pulmonary diseases and OSA, overlap syndrome is commonly considered as the coexistence of OSA and COPD. This disease has unique adverse health consequences distinct from ...

Hyperspectral sensors provide a large number of spectral bands. This massive and complex data structure of hyperspectral images presents a challenge to traditional data processing techniques. Therefore, reducing the dimensionality of hyperspectral images without losing important information is a very important issue for the remote sensing community. We propose to use overlap-based feature weigh...

Niche overlap is increasingly used as a way of measuring the intensity of interorganizational competition. This paper examines and compares various distance and cosine measures of niche overlap. The analysis in this paper shows that Euclidean distance applied to the raw data as well as to transformed data is not an appropriate measure of niche overlap. An alternative measure from the cosine fam...

× خانه ژورنال ها ثبت نام ورود