Part-of-Speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic Texts
نویسندگان
چکیده
We demonstrate an approach for inducing a tagger for historical languages based on existing resources for their modern varieties. Tags from a Present Day English Bible are projected to a Middle English Bible using multiple alignment approaches and are smoothed with a bigram tagger. Finally, we train a maximum entropy tagger on the output of the bigram tagger on the target text and test it on tagged Middle English text.
منابع مشابه
Induction of Fine-Grained Part-of-Speech Taggers via Classifier Combination and Crosslingual Projection
This paper presents an original approach to part-of-speech tagging of fine-grained features (such as case, aspect, and adjective person/number) in languages such as English where these properties are generally not morphologically marked. The goals of such rich lexical tagging in English are to provide additional features for word alignment models in bilingual corpora (for statistical machine tr...
متن کاملImproving English-Russian sentence alignment through POS tagging and Damerau-Levenshtein distance
The present paper introduces approach to improve English-Russian sentence alignment, based on POS-tagging of automatically aligned (by HunAlign) source and target texts. The initial hypothesis is tested on a corpus of bitexts. Sequences of POS tags for each sentence (exactly, nouns, adjectives, verbs and pronouns) are processed as “words” and DamerauLevenshtein distance between them is computed...
متن کاملDiscovery of Discourse-Related Language Contrasts through Alignment Discrepancies in English-German Translation
In this paper, we analyse alignment discrepancies for discourse structures in English-German parallel data – sentence pairs, in which discourse structures in target or source texts have no alignment in the corresponding parallel sentences. The discourse-related structures are designed in form of linguistic patterns based on the information delivered by automatic part-of-speech and dependency an...
متن کاملUsing Comparable Collections of Historical Texts for Building a Diachronic Dictionary for Spelling Normalization
In this paper, we argue that comparable collections of historical written resources can help overcoming typical challenges posed by heritage texts enhancing spelling normalization, POS-tagging and subsequent diachronic linguistic analyses. Thus, we present a comparable corpus of historical German recipes and show how such a comparable text collection together with the application of innovative ...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007