نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027  

2016
Yuval Marton Kristina Toutanova

We present E-TIPSY, a search query corpus annotated with named Entities, Term Importance, POS tags, and SYntactic parses. This corpus contains crowdsourced (gold) annotations of the three most important terms in each query. In addition, it contains automatically produced annotations of named entities, part-of-speech tags, and syntactic parses for the same queries. This corpus comes in two forma...

Journal: :CoRR 2015
Ümit Mersinli

Associative measures are “mathematical formulas determining the strength of association between two or more words based on their occurrences and cooccurrences in a text corpus” (Pecina, 2010, p. 138). The purpose of this paper is to test the 12 associative measures that Text-NSP (Banerjee & Pedersen, 2003) contains on a 10-million-word subcorpus of Turkish National Corpus (TNC) (Aksan et.al., 2...

1992
Jing-Shin Chang

Rule-based approaches have been the dominant paradigm in developing MT systems. Such approaches, however, suffer from difficulties in knowledge acquisition to meet the wide variety and time-changing characteristics of the real text. To attack this problem, some statistical translation models and supporting tools had been developed in the last few years. However, a simple statistical model often...

2002
António Branco José Leitão João Ricardo Silva Luís Gomes

In this paper, we describe the Nexing Corpus and report on the tools implemented and the tasks undertaken for its development. The Nexing Corpus includes (i) a collection of written transcriptions of verbal data elicited during a psycholinguistic experiment on syllogistic reasoning; and (ii) performance data concerning that experiment, such as latencies, confidence levels and accuracy of answer...

2008
Jonathan Chevelu Nelly Barbot Olivier Boëffard Arnaud Delhay

This article is interested in the problem of the linguistic content of a speech corpus. Depending on the target task, the phonological and linguistic content of the corpus is controlled by collecting a set of sentences which covers a preset description of phonological attributes under the constraint of an overall duration as small as possible. This goal is classically achieved by greedy algorit...

2006
Elke Teich John A. Bateman Richard Eckart

As the interest in annotated corpora is spreading, there is increasing concern with using existing language technology for corpus processing. In this paper we explore the idea of using natural language generation systems for corpus annotation. Resources for generation systems often focus on areas of linguistic variability that are under-represented in analysis-directed approaches. Therefore, ma...

2010
Karthik Visweswariah Vijil Chenthamarakshan Nanda Kambhatla

Hindi and Urdu share a common phonology, morphology and grammar but are written in different scripts. In addition, the vocabularies have also diverged significantly especially in the written form. In this paper we show that we can get reasonable quality translations (we estimated the Translation Error rate at 18%) between the two languages even in absence of a parallel corpus. Linguistic resour...

2016
Carole Lailler Anaïs Landeau Frédéric Béchet Yannick Estève Paul Deléglise

In this article, we present the RATP-DECODA Corpus which is composed by a set of 67 hours of speech from telephone conversations of a Customer Care Service (CCS). This corpus is already available on line at http://sldr.org/sldr000847/fr in its first version. However, many enhancements have been made in order to allow the development of automatic techniques to transcript conversations and to cap...

2006
Ulrik Sandborg-Petersen

The last decade has seen a large increase in the number of available corpus query systems. Some of these are optimized for a particular kind of linguistic annotation (e.g., time-aligned, treebank, word-oriented, etc.). In this paper, we report on our own corpus query system, called Emdros. Emdros is very generic, and can be applied to almost any kind of linguistic annotation using almost any li...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید