نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027  

2008
Chu-Ren Huang Lung-Hao Lee

This paper proposes a method to automatically classify texts from different varieties of the same language. We show that similarity measure is a robust tool for studying comparable corpora of language variations. We take LDC’s Chinese Gigaword Corpus composed of three varieties of Chinese from Mainland China, Singapore, and Taiwan, as the comparable corpora. Top-bag-of-word similarity measures ...

1997
Atro Voutilainen Lluís Padró

We describe the use of energy function optimisation in very shallow syntactic parsing. The approach can use linguistic rules and corpus-based statistics, so the strengths of both linguistic and statistical approaches to NLP can be combined in a single framework. The rules are contextual constraints for resolving syntactic ambiguities expressed as alternative tags, and the statistical language m...

2011
Marilisa Amoia Kerstin Kunz Ekaterina Lapshinova-Koltunski

In this paper, we discuss some linguistic phenomena that pose potential problems for multilevel linguistic annotation of parallel corpora in general and specifically for data encoding with state-of-art multilevel corpus querying tools such as CQP. We describe the strategy we use for integrating the standard hierarchical XML representation used to annotate such phenomena in our aligned bilingual...

2008
Dan Tufiş

Parallel corpora encode extremely valuable linguistic knowledge, the revealing of which is facilitated by the recent advances in multilingual corpus linguistics. The linguistic decisions made by the human translators in order to faithfully convey the meaning of the source text can be traced and can bring evidence on linguistic facts which in a monolingual context might be overlooked by a comput...

2005
Tingting He Xiaoqi Xu

The normalization of corpus metadata plays a key role in building sharable corpora. However, there is no uniform specification for defining and processing metadata in Chinese corpus nowadays. This paper introduces a metadata system we’ve proposed for Chinese corpus. 46 elements are defined in all, which can be divided into 6 classes: information about copyright, information about background of ...

2010
Hiroki Hanaoka Hideki Mima Jun'ichi Tsujii

This paper is a report on an on-going project of creating a new corpus focusing on Japanese particles. The corpus will provide deeper syntactic/semantic information than the existing resources. The initial target particle is to which occurs 22,006 times in 38,400 sentences of the existing corpus: the Kyoto Text Corpus. In this annotation task, an “example-based” methodology is adopted for the c...

2012
Felix Weninger Björn W. Schuller

We present a large-scale study on classification of linguistic and non-linguistic vocalizations including laughter, vocal noise, hesitation and consent on four corpora amounting to 46 h of spontaneous conversational speech. We consider training and testing on speaker-independent subsets of single corpora (intracorpus) as well as inter-corpus experiments where models built on one or more corpora...

2003
Sharon Tsai

In the field of Linguistic there exists many powerful tools for measuring the statistic characteristics of words and sentences. These tools rely on a corpus to which the data is compared. In order to get good and meaningful results from the tools available, a suitable corpus is thus needed. As the corpus is the key that ties the tools together, it is of uttermost importance. For most applicatio...

Journal: :International Journal of Computer Processing of Languages 2011

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید