What Can We Get From 1000 Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages
نویسندگان
چکیده
In this paper we address the problem of multilingual part-of-speech tagging for resource-poor languages. We use parallel data to transfer part-of-speech information from resource-rich to resourcepoor languages. Additionally, we use a small amount of annotated data to learn to “correct” errors from projected approach such as tagset mismatch between languages, achieving state-of-the-art performance (91.3%) across 8 languages. Our approach is based on modest data requirements, and uses minimum divergence classification. For situations where no universal tagset mapping is available, we propose an alternate method, resulting in state-of-the-art 85.6% accuracy on the resource-poor language Malagasy.
منابع مشابه
سیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملIncreasing the Quality and Quantity of Source Language Data for Unsupervised Cross-Lingual POS Tagging
Bilingual corpora offer a promising bridge between resource-rich and resource-poor languages, enabling the development of natural language processing systems for the latter. English is often selected as the resource-rich language, but another choice might give better performance. In this paper, we consider the task of unsupervised cross-lingual POS tagging, and construct a model that predicts t...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملBuilding Powerful Dependency Parsers for Resource-Poor Languages
In this paper, we present an approach to building dependency parsers for the resource-poor languages without any annotated resources on the target side. Compared with the previous studies, our approach requires less human annotated resources. In our approach, we first train a POS tagger and a parser on the source treebank. Then, they are used to parse the source sentences in bilingual data. We ...
متن کاملTagMiner: A Semisupervised Associative POS Tagger Effective for Resource Poor Languages
We present here, TagMiner, a data mining approach for part-of-speech (POS) tagging, an important Natural language processing (NLP) classification task. It is a semi-supervised associative classification method for POS tagging. Existing methods for building POS taggers require extensive domain and linguistic knowledge and resources. Our method uses combination of a small POS tagged corpus and a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014