A filter for syntactically incomparable parallel sentences
نویسندگان
چکیده
منابع مشابه
Interpreting Syntactically Ill-Formed Sentences
The paper discusses three different kinds of syntactic ill-formedness: ellipsis, conjunctions, and actual syntactic errors. It is shown how a new grammatical formalism, based on a two-level repr_e sentation of the syntactic knowledge is used to cope with Ill-formed sentences. The basic control struc ture of the parser is briefly sketched; the paper shows that it can be applied without any subst...
متن کاملBuilding a Parallel Bilingual Syntactically Annotated Corpus
This paper describes a process of building a bilingual syntactically annotated corpus, the PCEDT (Prague Czech-English Dependency Treebank). The corpus is being created at Charles University, Prague, and the release of this corpus as Linguistic Data Consortium data collection is scheduled for the spring of 2004. The paper discusses important decisions made prior to the start of the project and ...
متن کاملGrouping Synonymous Sentences from a Parallel Corpus
Abstract Recently, natural language processing researches have focused on data or processing techniques for paraphrasing. Unfortunately, however, we have little data for paraphrasing. There are some research reports on collecting synonymous expressions with parallel corpus, though no suitable corpus for collecting a set of paraphrases is yet available. Therefore, we obtain a few variations of e...
متن کاملParallel-Wiki: A Collection of Parallel Sentences Extracted from Wikipedia
Parallel corpora are essential resources for certain Natural Language Processing tasks such as Statistical Machine Translation. However, the existing publically available parallel corpora are specific to limited genres or domains, mostly juridical (e.g. JRC-Acquis) and medical (e.g. EMEA), and there is a lack of such resources for the general domain. This paper addresses this issue and presents...
متن کاملAligning Sentences in Parallel Corpora
In this paper we describe a statistical technique for aligning sentences with their translations in two parallel corpora. In addition to certain anchor points that are available in our da.ta, the only information about the sentences that we use for calculating alignments is the number of tokens that they contain. Because we make no use of the lexical details of the sentence, the alignment compu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Linguistics in the Netherlands 2019
سال: 2019
ISSN: 0929-7332,1569-9919
DOI: 10.1075/avt.00029.kro