نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027  

2004
Stephanie Strassel

This paper describes ongoing efforts at Linguistic Data Consortium to create shared evaluation resources for improved speech-to-text technology. The DARPA EARS Program (Effective, Affordable, Reusable Speech-to-Text) is focused on enabling core STT technology to produce rich, highly accurate output in a range of languages and speaking styles. The aggressive EARS program goals motivate new appro...

2009
Éric Villemonte de la Clergerie Benoît Sagot Rosa Stern Pascal Denis Gaëlle Recourcé Victor Mignot

We introduce SAPIENS, a platform for extracting quotations from news wires, associated with their author and context. The originality of SAPIENS is that it relies on a deep linguistic processing chain, which allows for extracting quotations with a wide coverage and an extended definition, including quotations which are only partially quotes-delimited verbatim transcripts. We describe the archit...

2016
Justin Mott Ann Bies Zhiyi Song Stephanie Strassel

This paper introduces the parallel Chinese-English Entities, Relations and Events (ERE) corpora developed by Linguistic Data Consortium under the DARPA Deep Exploration and Filtering of Text (DEFT) Program. Original Chinese newswire and discussion forum documents are annotated for two versions of the ERE task. The texts are manually translated into English and then annotated for the same ERE ta...

2000
Michelle A. Fox

This paper describes a data corpus which is being made available through the Linguistic Data Consortium (LDC) that codes lenition of syllable-final /s/ in Latin American Spanish in the LDC’s CallHome Spanish corpus. This lenition is a process whereby the /s/ may be aspirated (pronounced [h]) or deleted altogether. Since syllable-final /s/ is frequent in Spanish, lenition has a great effect on o...

2004
Christian Biemann Uwe Quasthoff Christian Wolff

Searching corpora with linguistic questions requires both additional information encoded in the corpus and efficiency as in “traditional” search engines. We describe a search engine-like approach to querying plain as well as part-of-speech-tagged monolingual corpora. This approach makes use of a ‘minimalist’ query language which nevertheless allows powerful searches by optionally ignoring posit...

2012
Michel Généreux Iris Hendrickx Amália Mendes

We present our work in processing the Reference Corpus of Contemporary Portuguese and its publication online. After discussing how the corpus was built and our choice of meta-data, we turn to the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries. The Web platform is described, and we show examples of linguistic resourc...

Journal: :زبان شناسی و گویش های خراسان 0
محمد راسخ مهند مریم قیاسوند

postposing is a feature of spoken language. it is a process that postposes a preverbal constituent to a position after the verb, without changing the overall meaning of the sentence. in this study we focus on this linguistic phenomenon and its relation to factors such as grammatical weight, definiteness, animacy and information structure. to investigate the occurrence of this process and its in...

2016
Masayuki Asahara Kazuya Kawahara Yuya Takei Hideto Masuoka Yasuko Ohba Yuki Torii Toru Morii Yuki Tanaka Kikuo Maekawa Sachi Kato Hikari Konishi

The National Institute for Japanese Language and Linguistics, Japan (NINJAL) has undertaken a corpus compilation project to construct a web corpus for linguistic research comprising 25 billion words. The project is divided into four parts: page collection, linguistic analysis, development of the corpus concordance system, and preservation. This article presents a corpus concordance system named...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید