Observational Initialization of Type-Supervised Taggers

نویسندگان

Hui Zhang

John DeNero

چکیده

Recent work has sparked new interest in type-supervised part-of-speech tagging, a data setting in which no labeled sentences are available, but the set of allowed tags is known for each word type. This paper describes observational initialization, a novel technique for initializing EM when training a type-supervised HMM tagger. Our initializer allocates probability mass to unambiguous transitions in an unlabeled corpus, generating token-level observations from type-level supervision. Experimentally, observational initialization gives state-of-the-art type-supervised tagging accuracy, providing an error reduction of 56% over uniform initialization on the Penn English Treebank.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries

Past work on learning part-of-speech taggers from tag dictionaries and raw data has reported good results, but the assumptions made about those dictionaries are often unrealistic: due to historical precedents, they assume access to information about labels in the raw and test sets. Here, we demonstrate ways to learn hidden Markov model taggers from incomplete tag dictionaries. Taking the MINGRE...

متن کامل

Simple Semi-Supervised Training of Part-Of-Speech Taggers

Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classification task. This simplifies semisupervised training considerably. Our prefered semi-supervised method combines tri-training (Li and Zhou, 2005) and disagreement-based co-training. On the Wall Street Journal, we obtain an erro...

متن کامل

Unsupervised Part-of-Speech Induction

Part-of-Speech (POS) tagging is an old and fundamental task in natural language processing. While supervised POS taggers have shown promising accuracy, it is not always feasible to use supervised methods due to lack of labeled data. In this project, we attempt to unsurprisingly induce POS tags by iteratively looking for a recurring pattern of words through a hierarchical agglomerative clusterin...

متن کامل

Cse 250b Project Assignment 4

The goal of this project is to implement the Semi-Supervised Recursive Autoencoders (RAE) with random word initialization and reproduce the result of Socher et. al [1] using the Movie Reviews dataset. With a correct understanding of gradient meaning in semi-supervised RAE, our implementation achieves 76.9% accuracy with only 0.1% difference to the authors’ result. The experiments also show that...

متن کامل

Clustering-based initialization of Learning Classifier Systems - Effects on model performance, readability and induction time

The present paper investigates whether an ‘‘informed’’ initialization process can help supervised LCS algorithms evolve rulesets with better characteristics, including greater predictive accuracy, shorter training times, and/or more compact knowledge representations. Inspired by previous research suggesting that the initialization phase of evolutionary algorithms may have a considerable impact ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Observational Initialization of Type-Supervised Taggers

نویسندگان

چکیده

منابع مشابه

Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries

Simple Semi-Supervised Training of Part-Of-Speech Taggers

Unsupervised Part-of-Speech Induction

Cse 250b Project Assignment 4

Clustering-based initialization of Learning Classifier Systems - Effects on model performance, readability and induction time

عنوان ژورنال:

اشتراک گذاری