Margin-based Semi-supervised Learning

نویسندگان

  • Junhui Wang
  • Xiaotong Shen
  • Tommi Jaakkola
چکیده

In classification, semi-supervised learning occurs when a large amount of unlabeled data is available with only a small number of labeled data. In such a situation, how to enhance predictability of classification through unlabeled data is the focus. In this article, we introduce a novel large margin semi-supervised learning methodology, utilizing grouping information from unlabeled data, together with the concept of margins, in a form of regularization controlling the interplay between labeled and unlabeled data. Based on this methodology, we develop two specific machines involving support vector machines and ψ-learning, denoted as SSVM and SPSI, through difference convex programming. In addition, we estimate the generalization error using both labeled and unlabeled data, for tuning regularizers. Finally, our theoretical and numerical analyses indicate that the proposed methodology achieves the desired objective of delivering high performance in generalization, particularly against some strong performers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonparametric Maximum Margin Similarity for Semi-Supervised Learning

1. Nonparametric Label Propagation (LP) has been proven to be effective for semi-supervised learning problems, and it predicts the labels for unlabeled data by a harmonic solution of an energy minimization problem which encourages local smoothness of the labels in accordance with the similarity graph. 2. On the other hand, the success of LP algorithms highly depends on the underlying similarity...

متن کامل

Graph Quality Judgement: A Large Margin Expedition

Graph as a common structure of machine learning, has played an important role in many learning tasks such as graph-based semi-supervised learning (GSSL). The quality of graph, however, seriously affects the performance of GSSL; moreover, an inappropriate graph may even cause deteriorated performance, that is, GSSL using unlabeled data may be outperformed by direct supervised learning with only ...

متن کامل

Semi-supervised Multi-label Classification - A Simultaneous Large-Margin, Subspace Learning Approach

Labeled data is often sparse in common learning scenarios, either because it is too time consuming or too expensive to obtain, while unlabeled data is almost always plentiful. This asymmetry is exacerbated in multi-label learning, where the labeling process is more complex than in the single label case. Although it is important to consider semisupervised methods for multi-label learning, as it ...

متن کامل

Max-Margin Deep Generative Models for (Semi-)Supervised Learning

Deep generative models (DGMs) are effective on learning multilayered representations of complex data and performing inference of input data by exploring the generative ability. However, it is relatively insufficient to empower the discriminative ability of DGMs on making accurate predictions. This paper presents max-margin deep generative models (mmDGMs) and a class-conditional variant (mmDCGMs...

متن کامل

Regularized Boost for Semi-Supervised Learning

Semi-supervised inductive learning concerns how to learn a decision rule from a data set containing both labeled and unlabeled data. Several boosting algorithms have been extended to semi-supervised learning with various strategies. To our knowledge, however, none of them takes local smoothness constraints among data into account during ensemble learning. In this paper, we introduce a local smo...

متن کامل

The Pessimistic Limits of Margin-based Losses in Semi-supervised Learning

We show that for linear classifiers defined by convex marginbased surrogate losses that are monotonically decreasing, it is impossible to construct any semi-supervised approach that is able to guarantee an improvement over the supervised classifier measured by this surrogate loss. For non-monotonically decreasing loss functions, we demonstrate safe improvements are possible.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009