parts of speech tagging

Towards Robust Cross-Domain Domain Adaptation for Part-of-Speech Tagging

2013

Tobias Schnabel Hinrich Schütze

We investigate the robustness of domain adaptation (DA) representations and methods across target domains using part-ofspeech (POS) tagging as a case study. We find that there is no single representation and method that works equally well for all target domains. In particular, there are large differences between target domains that are more similar to the source domain and those that are less s...

متن کامل

Part-of-speech Tagset and Corpus Development for Igbo, an African Language

2014

Ikechukwu E. Onyenwe Chinedu Uchechukwu Mark Hepple

This project aims to develop linguistic resources to support computational NLP research on the Igbo language. The starting point for this project is the development of a new part-of-speech tagging scheme based on the EAGLES tagset guidelines, adapted to incorporate additional language internal features. The tags are currently being used in a part-of-speech annotation task for the development of...

متن کامل

Improving Part-of-Speech Tagging of Historical Text by First Translating to Modern Text

2016

Erik Tjong Kim Sang

We explore the task of automatically assigning syntactic tags (known as part-of-speech tags) like Noun and Verb to words in seventeenth-century Dutch text. Tools exist for performing this task for modern texts but they perform poorly on historical texts because of language changes. We test several methods for translating the words in the historical text to modern equivalents before applying the...

متن کامل

کنترل بهینه یسیستم های تأخیری متغیر با زمان با استفادهاز نظریه ی موجک ها

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه علوم پایه دامغان 1389

سلیمان اسدی میانایی, اکبر هاشمش برزآباذی, نرگس تولایی,

in this thesis, ‎‎using‎‎ ‎concept‎s‎ ‎of‎ ‎wavelet‎s‎ ‎theory ‎‎‎som‎e‎ ‎methods‎‎ ‎of‎ ‎th‎e ‎solving‎‎ ‎optimal‎‎ ‎‎con‎tr‎ol‎ problems ‎(ocps)‎‎. ‎g‎overned by time-delay systems is investigated. ‎th‎is‎ thesis contains ‎tw‎o parts. ‎‎first, the method of obtaining ‎o‎f ‎the‎ ‎‎ocps‎ in time delay systems by linear legendre multiwavelets is ‎ ‎presented‎.‎‎‎‎ the main advantage of the meth...

15 صفحه اول

Grammar-based tools for the creation of tagging resources for an unresourced language: the case of Northern Sotho

2006

Ulrich Heid Elsabé Taljard Danie J. Prinsloo

We describe an architecture for the parallel construction of a tagger lexicon and an annotated reference corpus for the part-of-speech tagging of Nothern Sotho, a Bantu language of South Africa, for which no tagged resources have been available so far. Our tools make use of grammatical properties (morphological and syntactic) of the language. We use symbolic pretagging, followed by stochastic t...

متن کامل

Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance

2011

Shay B. Cohen Dipanjan Das Noah A. Smith

We describe a method for prediction of linguistic structure in a language for which only unlabeled data is available, using annotated data from a set of one or more helper languages. Our approach is based on a model that locally mixes between supervised models from the helper languages. Parallel data is not used, allowing the technique to be applied even in domains where human-translated texts ...

متن کامل

Distributional Part-of-Speech Tagging

1995

Hinrich Schitze

This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in con$ezt instead of word ~ypes. The algorithm is evaluated on the Brown Corpus.

متن کامل

Variation in noun and pronoun frequencies in a sociohistorical corpus of English

Journal: :LLC 2011

Tanja Säily Terttu Nevalainen Harri Siirtola

Many corpus linguists make the tacit assumption that part-of-speech frequencies remain constant during the period of observation. In this article, we will consider two related issues: (1) the reliability of part-of-speech tagging in a diachronic corpus, and (2) shifts in tag ratios over time. The purpose is both to serve the users of the corpus by making them aware of potential problems, and to...

متن کامل

Adapting a Parser to Clinical Text by Simple Pre-processing Rules

2013

Maria Skeppstedt

Sentence types typical to Swedish clinical text were extracted by comparing sentence part-of-speech tag sequences in clinical and in standard Swedish text. Parsings by a syntactic dependency parser, trained on standard Swedish, were manually analysed for the 33 sentence types most typical to clinical text. This analysis resulted in the identification of eight error types, and for two of these e...

متن کامل

Opinion Sentences Extraction and Polarity Classification Using Automatically Generated Templates

2010

Wan-Chi Huang Meng-Chun Lin Shih-Hung Wu

The paper reports the approach of cyut system in NTCIR-8 MOAT subtask. We submitted the results of opinion judgment and polarity judgment in Traditional Chinese. Our study focused on automatically generated templates as the only features of classifier. The templates combining words with Part-of-speech or named-entity (POS/NE) tags are acquired from the training set. Experiment results show that...

متن کامل