linguistic corpus

نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027 فیلتر نتایج به سال:

E-TIPSY: Search Query Corpus Annotated with Entities, Term Importance, POS Tags, and Syntactic Parses

2016

Yuval Marton Kristina Toutanova

We present E-TIPSY, a search query corpus annotated with named Entities, Term Importance, POS tags, and SYntactic parses. This corpus contains crowdsourced (gold) annotations of the three most important terms in each query. In addition, it contains automatically produced annotations of named entities, part-of-speech tags, and syntactic parses for the same queries. This corpus comes in two forma...

متن کامل

Associative Measures and Multi-word Unit Extraction in Turkish

Journal: :CoRR 2015

Ümit Mersinli

Associative measures are “mathematical formulas determining the strength of association between two or more words based on their occurrences and cooccurrences in a text corpus” (Pecina, 2010, p. 138). The purpose of this paper is to test the 12 associative measures that Text-NSP (Banerjee & Pedersen, 2003) contains on a 10-million-word subcorpus of Turkish National Corpus (TNC) (Aksan et.al., 2...

متن کامل

Why Corpus-Based Statistics-Oriented Machine Translation

1992

Jing-Shin Chang

Rule-based approaches have been the dominant paradigm in developing MT systems. Such approaches, however, suffer from difficulties in knowledge acquisition to meet the wide variety and time-changing characteristics of the real text. To attack this problem, some statistical translation models and supporting tools had been developed in the last few years. However, a simple statistical model often...

متن کامل

Nexing Corpus: a corpus of verbal protocols on syllogistic reasoning

2002

António Branco José Leitão João Ricardo Silva Luís Gomes

In this paper, we describe the Nexing Corpus and report on the tools implemented and the tasks undertaken for its development. The Nexing Corpus includes (i) a collection of written transcriptions of verbal data elicited during a psycholinguistic experiment on syllogistic reasoning; and (ii) performance data concerning that experiment, such as latencies, confidence levels and accuracy of answer...

متن کامل

Comparing Set-Covering Strategies for Optimal Corpus Design

2008

Jonathan Chevelu Nelly Barbot Olivier Boëffard Arnaud Delhay

This article is interested in the problem of the linguistic content of a speech corpus. Depending on the target task, the phonological and linguistic content of the corpus is controlled by collecting a set of sentences which covers a preset description of phonological attributes under the constraint of an overall duration as small as possible. This goal is classically achieved by greedy algorit...

متن کامل

Corpus Annotation By Generation

2006

Elke Teich John A. Bateman Richard Eckart

As the interest in annotated corpora is spreading, there is increasing concern with using existing language technology for corpus processing. In this paper we explore the idea of using natural language generation systems for corpus annotation. Resources for generation systems often focus on areas of linguistic variability that are under-represented in analysis-directed approaches. Therefore, ma...

متن کامل

Urdu and Hindi: Translation and sharing of linguistic resources

2010

Karthik Visweswariah Vijil Chenthamarakshan Nanda Kambhatla

Hindi and Urdu share a common phonology, morphology and grammar but are written in different scripts. In addition, the vocabularies have also diverged significantly especially in the written form. In this paper we show that we can get reasonable quality translations (we estimated the Translation Error rate at 18%) between the two languages even in absence of a parallel corpus. Linguistic resour...

متن کامل

Enhancing The RATP-DECODA Corpus With Linguistic Annotations For Performing A Large Range Of NLP Tasks

2016

Carole Lailler Anaïs Landeau Frédéric Béchet Yannick Estève Paul Deléglise

In this article, we present the RATP-DECODA Corpus which is composed by a set of 67 hours of speech from telephone conversations of a Customer Care Service (CCS). This corpus is already available on line at http://sldr.org/sldr000847/fr in its first version. However, many enhancements have been made in order to allow the development of automatic techniques to transcript conversations and to cap...

متن کامل

Observing Eurolects: Corpus analysis of linguistic variation in EU law

Journal: :Comparative Legilinguistics 2019

متن کامل

Querying Both Parallel And Treebank Corpora: Evaluation Of A Corpus Query System

2006

Ulrik Sandborg-Petersen

The last decade has seen a large increase in the number of available corpus query systems. Some of these are optimized for a particular kind of linguistic annotation (e.g., time-aligned, treebank, word-oriented, etc.). In this paper, we report on our own corpus query system, called Emdros. Emdros is very generic, and can be applied to almost any kind of linguistic annotation using almost any li...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید