نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027  

2002
Joseph Smarr

We describe software to transform any search engine or searchable corpus into a tool for linguistic research with a rich query syntax. We provide support for case sensitive searches, within-sentence and within-N-words match constraints, part-ofspeech restrictions on words, and “smart” verb-ending inflection wildcards. The software generalizes the query for the underlying search engine, and then...

1992
Chu-Ren Huang Keh-Jiann Chen

The project being reported on is a sub-project of the on-going research of the CKIP (Chinese Knowledge Information Processing) Group. This group was founded by Hsieh Ching-chun in 1986 and is currently directed by Kehjiann Chen and Chu-Ren Huang (Chang et al. 1989, Hsieh et al. 1989, Chen et al. 1991). The CKIP research is divided into three sub-projects according to their goals: 1) An On-line ...

2004
Ulrik Sandborg-Petersen

Emdros is a text database engine for linguistic analysis or annotation of text. It is appliccable especially in corpus linguistics for storing and retrieving linguistic analyses of text, at any linguistic level. Emdros implements the EMdF text database model and the MQL query language. In this paper, I present both, and give an example of how Emdros can be useful in computational linguistics.

2012
Dietmar F. Rösner Manuela Kunze Mirko Otto Jörg Frommer

The LAST MINUTE corpus comprises multimodal records from a Wizard of Oz (WoZ) experiment with naturalistic dialogs between users and a simulated companion system. We report about analysing the transcripts of the user companion dialogs and about insights gained so far from this ongoing empirical research.

2013
Hengbin Yan Jonathan J. Webster

In this paper, we present our recent experience in constructing a first-of-its-kind functional corpus based on the theoretical framework of Systemic Functional Linguistics. Annotated on selected texts from the Penn Treebank, the corpus was built by a collaborative team on web-based annotation platform with several advanced features. After a discussion on the background and motivation of the pro...

2013
Boris Iomdin Alexander Piperski Anton Somin

The paper is focused on self-contained linguistic problems based on text corpora. We argue that corpus-based problems differ from traditional linguistic problems because they make it possible to represent language variation. Furthermore, they often require basic statistical thinking from the students. The practical value of using data obtained from text corpora for teaching linguistics through ...

2014
Johannes Graën Dolores Batinic Martin Volk

We discovered several recurring errors in the current version of the Europarl Corpus originating both from theweb site of the European Parliament and the corpus compilation based thereon. The most frequent error was incompletely extracted metadata leaving non-textual fragments within the textual parts of the corpus files. This is, on average, the case for every second speaker change. We not onl...

1999
Eric V. Siegel

Fourteen indicators that measure the frequency of lexico-syntactic phenomena linguistically related to aspectual class are applied to aspectual classification. This group of indicators is shown to improve classification performance for two aspectual distinctions, stativity and completedness (i.e., telicity), over unrestricted sets of verbs from two corpora. Several of these indicators have not ...

2014
Dirk Goldhahn Uwe Quasthoff Gerhard Heyer

This paper will have a holistic view at the field of corpus-based linguistic typology and present an overview of current advances at Leipzig University. Our goal is to use automatically created text data for a large variety of languages for quantitative typological investigations. In our approaches we utilize text corpora created for several hundred languages for crosslanguage quantitative stud...

Journal: :Computational Linguistics 2015
Nelly Barbot Olivier Boëffard Jonathan Chevelu Arnaud Delhay

Linguistic corpus design is a critical concern for building rich annotated corpora useful in different domains of applications. For example, speech technologies such as ASR (Automatic Speech Recognition) or TTS (Text-to-Speech) need a huge amount of speech data to train datadriven models or to produce synthetic speech. Collecting data is always related to costs (recording speech, verifying anno...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید