Semisupervised Learning for Computational Linguistics

نویسندگان

Steven Abney

David Madigan

Vincent Ng

چکیده

Semi-supervised learning is by no means an unfamiliar concept to natural language processing researchers. Labeled data has been used to improve unsupervised parameter estimation procedures such as the EM algorithm and its variants since the beginning of the statistical revolution in NLP (e.g., Pereira and Schabes (1992)). Unlabeled data has also been used to improve supervised learning procedures, the most notable examples being the successful applications of self-training and co-training to word sense disambiguation (Yarowsky 1995) and named entity classification (Collins and Singer 1999). Despite its increasing importance, semi-supervised learning is not a topic that is typically discussed in introductory machine learning texts (e.g., Mitchell (1997), Alpaydin (2004)) or NLP texts (e.g., Manning and Schütze (1999), Jurafsky andMartin (2000)). Consequently, to learn about semi-supervised learning research, one has to consult the machine-learning literature. This can be a daunting task for NLP researchers who have little background in machine learning. Steven Abney’s book Semisupervised Learning for Computational Linguistics is targeted precisely at such researchers, aiming to provide them with a “broad and accessible presentation” of topics in semi-supervised learning. According to the preamble, the reader is assumed to have taken only an introductory course in NLP “that include statistical methods — concretely the material contained in Jurafsky andMartin (2000) andManning and Schütze (1999).”Nonetheless, I agreewith the author that any NLP researcher who has a solid background in machine learning is ready to “tackle the primary literature on semisupervised learning, and will probably not find this book particularly useful” (page 11). As the author promises, the book is self-contained and quite accessible to those who have little background in machine learning. In particular, of the 12 chapters in the book, three are devoted to preparatory material, including: a brief introduction to machine learning, basic unconstrained and constrained optimization techniques (e.g., gradient descent and the method of Lagrange multipliers), and relevant linear-algebra concepts (e.g., eigenvalues, eigenvectors, matrix and vector norms, diagonalization). The remaining chapters focus roughly on six types of semi-supervised learning methods:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Book Reviews: Semisupervised Learning for Computational Linguistics by Steven Abney

متن کامل

Analysis of Semi-Supervised Learning with the Yarowsky Algorithm

The Yarowsky algorithm is a rule-based semisupervised learning algorithm that has been successfully applied to some problems in computational linguistics. The algorithm was not mathematically well understood until (Abney 2004) which analyzed some specific variants of the algorithm, and also proposed some new algorithms for bootstrapping. In this paper, we extend Abney’s work and show that some ...

متن کامل

Web Corpus Construction

The Web is the main source of data in modern computational linguistics. Other volumes in the same series, for example, Introductions to Opinion Mining (Liu 2012) and Semisupervised Machine Learning (Søgaard 2013), start their problem statements by referring to data from the Web. This volume starts its own introduction by praising Web corpora for their size, ease of construction, and availabilit...

متن کامل

Expanding textual entailment corpora fromWikipedia using co-training

In this paper we propose a novel method to automatically extract large textual entailment datasets homogeneous to existing ones. The key idea is the combination of two intuitions: (1) the use of Wikipedia to extract a large set of textual entailment pairs; (2) the application of semisupervised machine learning methods to make the extracted dataset homogeneous to the existing ones. We report emp...

متن کامل

Robust Semi-Supervised Learning through Label Aggregation

Semi-supervised learning is proposed to exploit both labeled and unlabeled data. However, as the scale of data in real world applications increases significantly, conventional semisupervised algorithms usually lead to massive computational cost and cannot be applied to large scale datasets. In addition, label noise is usually present in the practical applications due to human annotation, which ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Semisupervised Learning for Computational Linguistics

نویسندگان

چکیده

منابع مشابه

Book Reviews: Semisupervised Learning for Computational Linguistics by Steven Abney

Analysis of Semi-Supervised Learning with the Yarowsky Algorithm

Web Corpus Construction

Expanding textual entailment corpora fromWikipedia using co-training

Robust Semi-Supervised Learning through Label Aggregation

عنوان ژورنال:

اشتراک گذاری