linguistic corpus

نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027 فیلتر نتایج به سال:

GoogleLing: The Web as a Linguistic Corpus

2002

Joseph Smarr

We describe software to transform any search engine or searchable corpus into a tool for linguistic research with a rich query syntax. We provide support for case sensitive searches, within-sentence and within-N-words match constraints, part-ofspeech restrictions on words, and “smart” verb-ending inflection wildcards. The software generalizes the query for the underlying search engine, and then...

متن کامل

A Chinese Corpus for Linguistic Research

1992

Chu-Ren Huang Keh-Jiann Chen

The project being reported on is a sub-project of the on-going research of the CKIP (Chinese Knowledge Information Processing) Group. This group was founded by Hsieh Ching-chun in 1986 and is currently directed by Kehjiann Chen and Chu-Ren Huang (Chang et al. 1989, Hsieh et al. 1989, Chen et al. 1991). The CKIP research is divided into three sub-projects according to their goals: 1) An On-line ...

متن کامل

Emdros - a text database engine for analyzed or annotated text

2004

Ulrik Sandborg-Petersen

Emdros is a text database engine for linguistic analysis or annotation of text. It is appliccable especially in corpus linguistics for storing and retrieving linguistic analyses of text, at any linguistic level. Emdros implements the EMdF text database model and the MQL query language. In this paper, I present both, and give an example of how Emdros can be useful in computational linguistics.

متن کامل

Linguistic analyses of the LAST MINUTE corpus

2012

Dietmar F. Rösner Manuela Kunze Mirko Otto Jörg Frommer

The LAST MINUTE corpus comprises multimodal records from a Wizard of Oz (WoZ) experiment with naturalistic dialogs between users and a simulated companion system. We report about analysing the transcripts of the user companion dialogs and about insights gained so far from this ongoing empirical research.

متن کامل

A Corpus-based Approach to Linguistic Function

2013

Hengbin Yan Jonathan J. Webster

In this paper, we present our recent experience in constructing a first-of-its-kind functional corpus based on the theoretical framework of Systemic Functional Linguistics. Annotated on selected texts from the Penn Treebank, the corpus was built by a collaborative team on web-based annotation platform with several advanced features. After a discussion on the background and motivation of the pro...

متن کامل

Linguistic Problems Based on Text Corpora

2013

Boris Iomdin Alexander Piperski Anton Somin

The paper is focused on self-contained linguistic problems based on text corpora. We argue that corpus-based problems differ from traditional linguistic problems because they make it possible to represent language variation. Furthermore, they often require basic statistical thinking from the students. The practical value of using data obtained from text corpora for teaching linguistics through ...

متن کامل

Cleaning the Europarl Corpus for Linguistic Applications

2014

Johannes Graën Dolores Batinic Martin Volk

We discovered several recurring errors in the current version of the Europarl Corpus originating both from theweb site of the European Parliament and the corpus compilation based thereon. The most frequent error was incompletely extracted metadata leaving non-textual fragments within the textual parts of the corpus files. This is, on average, the case for every second speaker change. We not onl...

متن کامل

Corpus-Based Linguistic Indicators for Aspectual Classification

1999

Eric V. Siegel

Fourteen indicators that measure the frequency of lexico-syntactic phenomena linguistically related to aspectual class are applied to aspectual classification. This group of indicators is shown to improve classification performance for two aspectual distinctions, stativity and completedness (i.e., telicity), over unrestricted sets of verbs from two corpora. Several of these indicators have not ...

متن کامل

Corpus-Based Linguistic Typology: A Comprehensive Approach

2014

Dirk Goldhahn Uwe Quasthoff Gerhard Heyer

This paper will have a holistic view at the field of corpus-based linguistic typology and present an overview of current advances at Leipzig University. Our goal is to use automatically created text data for a large variety of languages for quantitative typological investigations. In our approaches we utilize text corpora created for several hundred languages for crosslanguage quantitative stud...

متن کامل

Large Linguistic Corpus Reduction with SCP Algorithms

Journal: :Computational Linguistics 2015

Nelly Barbot Olivier Boëffard Jonathan Chevelu Arnaud Delhay

Linguistic corpus design is a critical concern for building rich annotated corpora useful in different domains of applications. For example, speech technologies such as ASR (Automatic Speech Recognition) or TTS (Text-to-Speech) need a huge amount of speech data to train datadriven models or to produce synthetic speech. Collecting data is always related to costs (recording speech, verifying anno...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید