linguistic corpus

نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027 فیلتر نتایج به سال:

Contrastive Approach towards Text Source Classification based on Top-Bag-of-Word Similarity

2008

Chu-Ren Huang Lung-Hao Lee

This paper proposes a method to automatically classify texts from different varieties of the same language. We show that similarity measure is a robust tool for studying comparable corpora of language variations. We take LDC’s Chinese Gigaword Corpus composed of three varieties of Chinese from Mainland China, Singapore, and Taiwan, as the comparable corpora. Top-bag-of-word similarity measures ...

متن کامل

Developing a hybrid NP parser

1997

Atro Voutilainen Lluís Padró

We describe the use of energy function optimisation in very shallow syntactic parsing. The approach can use linguistic rules and corpus-based statistics, so the strengths of both linguistic and statistical approaches to NLP can be combined in a single framework. The rules are contextual constraints for resolving syntactic ambiguities expressed as alternative tags, and the statistical language m...

متن کامل

Discontinuous Constituents: a Problematic Case for Parallel Corpora Annotation and Querying

2011

Marilisa Amoia Kerstin Kunz Ekaterina Lapshinova-Koltunski

In this paper, we discuss some linguistic phenomena that pose potential problems for multilevel linguistic annotation of parallel corpora in general and specifically for data encoding with state-of-art multilevel corpus querying tools such as CQP. We describe the strategy we use for integrating the standard hierarchical XML representation used to annotate such phenomena in our aligned bilingual...

متن کامل

Cross-Linguistic Knowledge Induction from Parallel Corpora

2008

Dan Tufiş

Parallel corpora encode extremely valuable linguistic knowledge, the revealing of which is facilitated by the recent advances in multilingual corpus linguistics. The linguistic decisions made by the human translators in order to faithfully convey the meaning of the source text can be traced and can bring evidence on linguistic facts which in a monolingual context might be overlooked by a comput...

متن کامل

The Standard of Chinese Corpus Metadata

2005

Tingting He Xiaoqi Xu

The normalization of corpus metadata plays a key role in building sharable corpora. However, there is no uniform specification for defining and processing metadata in Chinese corpus nowadays. This paper introduces a metadata system we’ve proposed for Chinese corpus. 46 elements are defined in all, which can be divided into 6 classes: information about copyright, information about background of ...

متن کامل

A Japanese Particle Corpus Built by Example-Based Annotation

2010

Hiroki Hanaoka Hideki Mima Jun'ichi Tsujii

This paper is a report on an on-going project of creating a new corpus focusing on Japanese particles. The corpus will provide deeper syntactic/semantic information than the existing resources. The initial target particle is to which occurs 22,006 times in 38,400 sentences of the existing corpus: the Kyoto Text Corpus. In this annotation task, an “example-based” methodology is adopted for the c...

متن کامل

Discrimination of Linguistic and Non-Linguistic Vocalizations in Spontaneous Speech: Intra- and Inter-Corpus Perspectives

2012

Felix Weninger Björn W. Schuller

We present a large-scale study on classification of linguistic and non-linguistic vocalizations including laughter, vocal noise, hesitation and consent on four corpora amounting to 46 h of spontaneous conversational speech. We consider training and testing on speaker-independent subsets of single corpora (intracorpus) as well as inter-corpus experiments where models built on one or more corpora...

متن کامل

A Writing Assistent Using Language Models Derived From the Web

2003

Sharon Tsai

In the field of Linguistic there exists many powerful tools for measuring the statistic characteristics of words and sentences. These tools rely on a corpus to which the data is compared. In order to get good and meaningful results from the tools available, a suitable corpus is thus needed. As the corpus is the key that ties the tools together, it is of uttermost importance. For most applicatio...

متن کامل

Sense Prediction Study: Two Corpus-driven Linguistic Approaches

Journal: :International Journal of Computer Processing of Languages 2011

متن کامل

A cross-linguistic corpus of forms meaning yes

Journal: :Linguistic Discovery 2006

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید