Efficient Training for Positive Unlabeled Learning

نویسنده

  • Emanuele Sansone
چکیده

Positive unlabeled learning (PU learning) refers to the task of learning a binary classifier from only positive and unlabeled data [1]. This problem arises in various practical applications, like in multimedia/information retrieval [2], where the goal is to find samples in an unlabeled data set that are similar to the samples provided by a user, as well as for applications of outlier detection [3] or semi-supervised novelty detection [4]. The works in [5] and [6] have recently shown that PU learning can be formulated as a risk minimization problem. In particular, expressing the risk with a convex loss function, like the double Hinge loss, allows to achieve better classification performance than those ones obtained by using other loss functions. Nevertheless, the works have only focused in analysing the generalization performance obtained by using different loss functions, without considering the efficiency of training. In that regard, we propose a novel algorithm, which optimizes efficiently the risk minimization problem stated in [6]. In particular, we show that the storage complexity of our approach scales only linearly with the number of training samples. Concerning the training time, we show experimentally on different benchmark data sets that our algorithm exhibits the same quadratic behaviour of existing optimization algorithms implemented in highly-efficient libraries. The rest of the paper is organized as follows. In Section 2 we review the formulation of the PU learning problem and we enunciate for the first time the Representer theorem. In Section 3 we derive the convex formulation of the problem by using the double Hinge loss function. In Section 4 we propose an algorithm to solve the optimization problem and we finally conclude with the last section by describing the experimental evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Classification and Co-training from Positive and Unlabeled Examples

In the general framework of semi-supervised learning from labeled and unlabeled data, we consider the specific problem of learning from a pool of positive data, without any negative data but with the help of unlabeled data. We study a naive Bayes algorithm PNB from positive and unlabeled examples. Then, we consider the case where the number of positive examples is quite small, assuming that the...

متن کامل

A Novel Reliable Negative Method Based on Clustering for Learning from Positive and Unlabeled Examples

This paper investigates a new approach for training text classifiers when only a small set of positive examples is available together with a large set of unlabeled examples. The key feature of this problem is that there are no negative examples for learning. Recently, a few techniques have been reported are based on building a classifier in two steps. In this paper, we introduce a novel method ...

متن کامل

Discriminative Learning of Selectional Preference from Unlabeled Text

We present a discriminative method for learning selectional preferences from unlabeled text. Positive examples are taken from observed predicate-argument pairs, while negatives are constructed from unobserved combinations. We train a Support Vector Machine classifier to distinguish the positive from the negative instances. We show how to partition the examples for efficient training with 57 tho...

متن کامل

Estimate Unlabeled-Data-Distribution for Semi-supervised PU Learning

Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class beyond the labeled data categories. This problem has been widely studied in recent year...

متن کامل

Positive Unlabeled Learning for Data Stream Classification

Learning from positive and unlabeled examples (PU learning) has been investigated in recent years as an alternative learning model for dealing with situations where negative training examples are not available. It has many real world applications, but it has yet to be applied in the data stream environment where it is highly possible that only a small set of positive data and no negative data i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1608.06807  شماره 

صفحات  -

تاریخ انتشار 2016