نتایج جستجو برای: linguistic corpus

تعداد نتایج: 113027  

2007
Michel Généreux Marina Santini

In this paper we describe some explorations of the potential of genre-revealing features on automatic sentiment analysis. In particular, we use a small subset of the ‘linguistic facets’ employed in recent experiments on automatic genre identification in combination with more traditional sentiment-revealing features on two different single-genre corpora: a corpus of English blogs and a corpus of...

2012
Hadrien Gelas Laurent Besacier François Pellegrino

This article describes our efforts to provide ASR resources for Swahili, a Bantu language spoken in a wide area of East Africa. We start with an introduction on the language situation, both at linguistic and digital level. Then, we report the selected strategies to develop a text corpus, a pronunciation dictionary and a speech corpus for this under-resourced language. We explore methodologies a...

2012
Pawan Goyal Gérard P. Huet Amba P. Kulkarni Peter M. Scharf Ralph Bunker

Sanskrit, the classical language of India, presents specific challenges for computational linguistics: exact phonetic transcription in writing that obscures word boundaries, rich morphology and an enormous corpus, among others. Recent international cooperation has developed innovative solutions to these problems and significant resources for linguistic research. Solutions include efficient segm...

2011
Katsunori Kotani Takehiko Yoshimi Hiroaki Nanjo Hitoshi Isahara

A learner’s language data of speaking, writing, listening, and reading have been compiled for a learner corpus in this study. The language data consist of linguistic output and language processing. Linguistic output refers to data of pronunciation, sentences, listening comprehension rate, and reading comprehension rate. Language processing refers to processing time and learners’ self-judgment o...

2010
Fabienne Fritzinger

Compound splitting is an important problem in many NLP applications which must be solved in order to address issues of data sparsity. Previous work has shown that linguistic approaches for German compound splitting produce a correct splitting more often, but corpus-driven approaches work best for phrase-based statistical machine translation from German to English, a worrisome contradiction. We ...

2010
Sitanath Biswas

This paper describes a hybrid system that applies maximum entropy (MaxEnt) model with Hidden Markov model (HMM) and some linguistic rules to recognize name entities in Oriya language. The main advantage of our system is, we are using both HMM and MaxEnt model successively with some manually developed linguistic rules. First we are using MaxEnt to identify name entities in Oria corpus, then tagg...

2004
Anette Frank

In this paper we discuss motivations and strategies for generalising over instance-based frame assignment rules that we extract from frame-annotated corpora. Corpus-induced syntax-semantics mapping rules for frame assignment can be used for automatic semantic role labelling of unparsed text, but further, to extract linguistic knowledge for a lexical semantic resource with a general syntax-seman...

2008
Danilo Dayag

This paper aims to examine the generic structures and linguistic properties of ads in Philippine magazines. Taken from the Corpus of Asian Magazine Advertising: The Philippine Database, the corpus consists of seventy-four ads for consumer nondurables such as medicines, vitamins and food supplements, and cosmetic/beauty/personal hygiene products. The study found that the ads demonstrated prefere...

2012
Pawan Goyal Gérard Huet Amba Kulkarni Peter Scharf Ralph Bunker

Sanskrit, the classical language of India, presents specific challenges for computational linguistics: exact phonetic transcription in writing that obscures word boundaries, rich morphology and an enormous corpus, among others. Recent international cooperation has developed innovative solutions to these problems and significant resources for linguistic research. Solutions include efficient segm...

2006
Marco Baroni Motoko Ueyama

The Web is a potentially unlimited source of linguistic data; however, commercial search engines are not the best way for linguists to gather data from it. In this paper, we present a procedure to build language corpora by crawling and postprocessing Web data. We describe the construction of a very large Italian general-purpose Web corpus (almost 2 billion words) and a specialized Japanese “blo...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید