نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027  

2016
Franco Salvetti John B. Lowe James H. Martin

We present an approach to creating corpora for use in detecting deception in text, including a discussion of the challenges peculiar to this task. Our approach is based on soliciting several types of reviews from writers and was implemented using Amazon Mechanical Turk. We describe the multi-dimensional corpus of reviews built using this approach, available free of charge from LDC as the Boulde...

1996
Keh-Jiann Chen Chu-Ren Huang Li-Ping Chang Hui-Li Hsu

The Academia Sinica Balanced Corpus (Sinica Corpus) is the first balanced Chinese corpus with part-of-speech tagging. The corpus (Sinica 2.0) is open to the research community through the WWW (http://www.sinica.edu.twiftms-binikiwi.sh). Current size of the corpus is 3.5 million words, and the immediate expansion target is five million words. Each text in the corpus is classified and marked acco...

2009
Viola Ganter Michael Strube

We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection, as well as shallow linguistic features.

2015
Katsunori Kotani Takehiko Yoshimi

In order to develop effective computerassisted language teaching systems for learners of English as a foreign language, it is first necessary to identify gaps between learners and native speakers in the four basic linguistic skills (reading, writing, pronunciation, and listening). To identify these gaps, the accuracy and fluency in language use between learners and native speakers should be com...

2004
Khalid Choukri Mahtab Nikkhou Niklas Paulsson

Broadcast news is a very rich source of Language Resources that has been exploited to develop and assess a large set of Human Language Technologies. Some examples include systems to: automatically produce text transcriptions of spoken data; identify the language of a text; translate a text from one language to another; identify topics in the news and retrieve all stories discussing a target top...

Journal: :IJCLCLP 2009
Cheng-Hsien Chen

Taking Mandarin Possessive Construction (MPC) as an example, the present study investigates the relation between lexicon and constructional schemas in a quantitative corpus linguistic approach. We argue that the wide use of raw frequency distribution in traditional corpus linguistic studies may undermine the validity of the results and reduce the possibility for interdisciplinary communication....

2009
Simone Pereira

This paper describes the methodology adopted in the construction of an annotated corpus for the study of zero anaphora in Portuguese, the ZAC corpus. To our knowledge, no such corpus exists at this time for the Portuguese language. The purpose of this linguistic resource is to promote the use of automatic discovery of linguistic parameters for anaphora resolution systems. Because of the complex...

2003
Shu-Ping Gong

This study proposes a corpus-based method to generate Mapping Principle of metaphors. In particular, Ahrens's (2002) Mapping Principle in the Conceptual Mapping Model (CM model) is simply based on the native speakers' intuition instead of analyzing it from huge linguistic data. In order to provide more convincing evidence to support the CM model, we adopt the corpus method to extract out the me...

2000
Catherine Macleod Nancy Ide Ralph Grishman

Linguistic research has become heavily reliant on text corpora over the past ten years. Such resources are becoming increasingly available through efforts such as the Linguistic Data Consortium (LDC) in the US and the European Language Resources Association (ELRA) in Europe. However, in the main the corpora that are gathered and distributed through these and other mechanisms consist of texts wh...

1994
Kelsey Taussig Jared Bernstein

Macrophone is a corpus of approximately 200,000 utterances, recorded over the telephone from a broad sample of about 5,000 American speakers. Sponsored by the Linguistic Data Consortium (LDC), it is the first of a series of similar data sets that will be colected for major languages of the world in a cooperative project called Polyphone. It is designed to provide telephone speech suitable for t...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید