linguistic corpus

نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027 فیلتر نتایج به سال:

Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text

2004

Stephanie Strassel

This paper describes ongoing efforts at Linguistic Data Consortium to create shared evaluation resources for improved speech-to-text technology. The DARPA EARS Program (Effective, Affordable, Reusable Speech-to-Text) is focused on enabling core STT technology to produce rich, highly accurate output in a range of languages and speaking styles. The aggressive EARS program goals motivate new appro...

متن کامل

Extracting and Visualizing Quotations from News Wires

2009

Éric Villemonte de la Clergerie Benoît Sagot Rosa Stern Pascal Denis Gaëlle Recourcé Victor Mignot

We introduce SAPIENS, a platform for extracting quotations from news wires, associated with their author and context. The originality of SAPIENS is that it relies on a deep linguistic processing chain, which allows for extracting quotations with a wide coverage and an extended definition, including quotations which are only partially quotes-delimited verbatim transcripts. We describe the archit...

متن کامل

Parallel Chinese-English Entities, Relations and Events Corpora

2016

Justin Mott Ann Bies Zhiyi Song Stephanie Strassel

This paper introduces the parallel Chinese-English Entities, Relations and Events (ERE) corpora developed by Linguistic Data Consortium under the DARPA Deep Exploration and Filtering of Text (DEFT) Program. Original Chinese newswire and discussion forum documents are annotated for two versions of the ERE task. The texts are manually translated into English and then annotated for the same ERE ta...

متن کامل

Syllable-final /s/ lenition in the LDC's callhome Spanish corpus

2000

Michelle A. Fox

This paper describes a data corpus which is being made available through the Linguistic Data Consortium (LDC) that codes lenition of syllable-final /s/ in Latin American Spanish in the LDC’s CallHome Spanish corpus. This lenition is a process whereby the /s/ may be aspirated (pronounced [h]) or deleted altogether. Since syllable-final /s/ is frequent in Spanish, lenition has a great effect on o...

متن کامل

Linguistic Corpus Search

2004

Christian Biemann Uwe Quasthoff Christian Wolff

Searching corpora with linguistic questions requires both additional information encoded in the corpus and efficiency as in “traditional” search engines. We describe a search engine-like approach to querying plain as well as part-of-speech-tagged monolingual corpora. This approach makes use of a ‘minimalist’ query language which nevertheless allows powerful searches by optionally ignoring posit...

متن کامل

Investigating Curatorial Voice with Corpus Linguistic Techniques

Journal: :Museum and Society 2020

متن کامل

Introducing the Reference Corpus of Contemporary Portuguese Online

2012

Michel Généreux Iris Hendrickx Amália Mendes

We present our work in processing the Reference Corpus of Contemporary Portuguese and its publication online. After discussing how the corpus was built and our choice of meta-data, we turn to the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries. The Web platform is described, and we show examples of linguistic resourc...

متن کامل

motivating factors of postposing in persian

Journal: :زبان شناسی و گویش های خراسان 0

محمد راسخ مهند مریم قیاسوند

postposing is a feature of spoken language. it is a process that postposes a preverbal constituent to a position after the verb, without changing the overall meaning of the sentence. in this study we focus on this linguistic phenomenon and its relation to factors such as grammatical weight, definiteness, animacy and information structure. to investigate the occurrence of this process and its in...

متن کامل

'BonTen' - Corpus Concordance System for 'NINJAL Web Japanese Corpus'

2016

Masayuki Asahara Kazuya Kawahara Yuya Takei Hideto Masuoka Yasuko Ohba Yuki Torii Toru Morii Yuki Tanaka Kikuo Maekawa Sachi Kato Hikari Konishi

The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising 25 billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents a corpus concordance system named...

متن کامل

Large Linguistic Corpus Reduction with SCP Algorithms

Journal: :Computational Linguistics 2015

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید