linguistically

Linguistically-Motivated Grammar Extraction, Generalization and Adaptation

2005

Yu-Ming Hsieh Duen-Chi Yang Keh-Jiann Chen

In order to obtain a high precision and high coverage grammar, we proposed a model to measure grammar coverage and designed a PCFG parser to measure efficiency of the grammar. To generalize grammars, a grammar binarization method was proposed to increase the coverage of a probabilistic contextfree grammar. In the mean time linguistically-motivated feature constraints were added into grammar rul...

متن کامل

ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions

2016

Olga Uryupina Ron Artstein Antonella Bristot Federica Cavicchio Kepa Joseba Rodríguez Massimo Poesio

This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phen...

متن کامل

Exploring linguistically-rich patterns for question generation

2011

Sérgio Curto Ana Cristina Mendes Luísa Coheur

Linguistic patterns reflect the regularities of Natural Language and their applicability is acknowledged in several Natural Language Processing tasks. Particularly, in the task of Question Generation, many systems depend on patterns to generate questions from text. The approach we follow relies on patterns that convey lexical, syntactic and semantic information, automatically learned from large...

متن کامل

Must sound change be linguistically motivated ? *

2007

Robert Blust

A number of well-documented sound changes in Austronesian languages do not appear to be either phonetically or phonologically motivated. Although it is possible that some of these changes involved intermediate steps for which we have no direct documentation, the assumption that this was always the case appears arbitrary, and is in violation of Occam’s Razor. These data thus raise the question w...

متن کامل

Learning linguistically valid pronunciations from acoustic data

2003

Françoise Beaufays Ananth Sankar Shaun Williams Mitch Weintraub

We describe an algorithm to learn word pronunciations from acoustic data. The algorithm jointly optimizes the pronunciation of a word using (a) the acoustic match of this pronunciation to the observed data, and (b) how “linguistically reasonable” the pronunciation is. Variations of word pronunciations in the recognition dictionary (which was created by linguists), are used to train a model of w...

متن کامل

Linguistically Motivated Unsupervised Segmentation for Machine Translation

2010

Mark Fishel Harri Kirik

In this paper we use statistical machine translation and morphology information from two different morphological analyzers to try to improve translation quality by linguistically motivated segmentation. The morphological analyzers we use are the unsupervised Morfessor morpheme segmentation and analyzer toolkit and the rule-based morphological analyzer T3. Our translations are done using the Mos...

متن کامل

Annotation of Sign and Gesture Cross-linguistically

2017

Inge Zwitserlood Asli Özyürek Pamela Perniss

This paper discusses the construction of a cross-linguistic, bimodal corpus containing three modes of expression: expressions from two sign languages, speech and gestural expressions in two spoken languages and pantomimic expressions by users of two spoken languages who are requested to convey information without speaking. We discuss some problems and tentative solutions for the annotation of u...

متن کامل

Constructing Linguistically Motivated Structures from Statistical Grammars

2011

Ali Basirat Heshaam Faili

This paper discusses two Hidden Markov Models (HMM) for linking linguistically motivated XTAG grammar and the automatically extracted LTAG used by MICA parser. The former grammar is a detailed LTAG enriched with feature structures. And the latter one is a huge size LTAG that due to its statistical nature is well suited to be used in statistical approaches. Lack of an efficient parser and sparse...

متن کامل

A Linguistically-Based Segmentation of Complex Sentences

2007

Vladislav Kubon Markéta Lopatková Martin Plátek Patrice Pognan

The paper describes a method of dividing complex sentences into segments, easily detectable and linguistically motivated units, which may provide a basis for further processing of complex sentences. The method has been developed for Czech as a language representing languages with relatively high degree of word-order freedom. The paper introduces important terms, describes a segmentation chart, ...

متن کامل

Linguistically debatable or just plain wrong?

2014

Barbara Plank Dirk Hovy Anders Søgaard

In linguistic annotation projects, we typically develop annotation guidelines to minimize disagreement. However, in this position paper we question whether we should actually limit the disagreements between annotators, rather than embracing them. We present an empirical analysis of part-of-speech annotated data sets that suggests that disagreements are systematic across domains and to a certain...

متن کامل