Observational Initialization of Type-Supervised Taggers
نویسندگان
چکیده
Recent work has sparked new interest in type-supervised part-of-speech tagging, a data setting in which no labeled sentences are available, but the set of allowed tags is known for each word type. This paper describes observational initialization, a novel technique for initializing EM when training a type-supervised HMM tagger. Our initializer allocates probability mass to unambiguous transitions in an unlabeled corpus, generating token-level observations from type-level supervision. Experimentally, observational initialization gives state-of-the-art type-supervised tagging accuracy, providing an error reduction of 56% over uniform initialization on the Penn English Treebank.
منابع مشابه
Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
Past work on learning part-of-speech taggers from tag dictionaries and raw data has reported good results, but the assumptions made about those dictionaries are often unrealistic: due to historical precedents, they assume access to information about labels in the raw and test sets. Here, we demonstrate ways to learn hidden Markov model taggers from incomplete tag dictionaries. Taking the MINGRE...
متن کاملSimple Semi-Supervised Training of Part-Of-Speech Taggers
Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classification task. This simplifies semisupervised training considerably. Our prefered semi-supervised method combines tri-training (Li and Zhou, 2005) and disagreement-based co-training. On the Wall Street Journal, we obtain an erro...
متن کاملUnsupervised Part-of-Speech Induction
Part-of-Speech (POS) tagging is an old and fundamental task in natural language processing. While supervised POS taggers have shown promising accuracy, it is not always feasible to use supervised methods due to lack of labeled data. In this project, we attempt to unsurprisingly induce POS tags by iteratively looking for a recurring pattern of words through a hierarchical agglomerative clusterin...
متن کاملCse 250b Project Assignment 4
The goal of this project is to implement the Semi-Supervised Recursive Autoencoders (RAE) with random word initialization and reproduce the result of Socher et. al [1] using the Movie Reviews dataset. With a correct understanding of gradient meaning in semi-supervised RAE, our implementation achieves 76.9% accuracy with only 0.1% difference to the authors’ result. The experiments also show that...
متن کاملClustering-based initialization of Learning Classifier Systems - Effects on model performance, readability and induction time
The present paper investigates whether an ‘‘informed’’ initialization process can help supervised LCS algorithms evolve rulesets with better characteristics, including greater predictive accuracy, shorter training times, and/or more compact knowledge representations. Inspired by previous research suggesting that the initialization phase of evolutionary algorithms may have a considerable impact ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014