corpus analysis

Analysis of Language Variation Using a Large-Scale Corpus of Spontaneous Speech

2006

Kikuo Maekawa Hanae Koiso Hideki Ogura Masaya Yamaguchi Hideaki Kikuchi Takayuki Kagomiya Wataru Tsukahara Kiyoko Yoneyama Masako Fujimoto Kenya Nishikawa Yoko Mabuchi Yohichi Maki Kenji Yamazumi Takehiko Maruyama

Large-scale corpus of spontaneous speech can be a powerful tool for the study of language variation. Moreover, given that the corpus is publicly available, corpus-based analysis could open up the possibility of follow-up analysis in this area of linguistic study. Generally speaking, follow-up study is highly desirable in sciences but so far it has been virtually impossible in the area of socio-...

متن کامل

A Mono-lingual Corpus-Based Machine Translation of the Interlingua Method

1993

Eiji KOMATSU CUI Jin

This paper describes a prototype of an example-based machine translation system. In this system, key language resources are EDR corpus and concept classification dictionary. The corpus consists of a pair of sentences, their morphological representations, their syntactic representations, and their semantic representations. The semantic representations are described by an interlingua. Therefore t...

متن کامل

the influence of corpus luteum on follicular fluid composition from different size follicles and their relationship to serum concentrations in sanjabi ewes

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه رازی - دانشکده کشاورزی و منابع طبیعی 1392

کاوه محمدی خانقاه, هادی حجاریان, حامد کرمی شبانکاره, علیرضا عبدالمحمدی,

the aim of the present study was to determine the in?uence of presence or absence of corpus luteum (cl) on hormonal and metabolites composition of follicular ?uid (ff) harvested from different sized follicles and its relationship with blood serum concentrations in sanjabi ewes. ovaries and blood samples were collected from 60 clinically healthy adult ewes (sanjabi breed) 1–3 years of age in dio...

Multiple Correspondence Analysis, newspaper discourse and subregister

Journal: :Register studies 2021

Abstract This article introduces a new method for grouping keywords and examines the extent to which it also allows analysts explore interaction of discourse subregister. It uses multivariate statistical technique, Multiple Correspondence Analysis, reveal dimensions co-occur across texts corpus. These are then interpreted in terms discourses they contribute within data, thus forming basis corpu...

متن کامل

Corpus Analysis for Lexical Database Construction: A Case of Russian and Czech Wordnets

2004

Anna Sinopalnikova Pavel Smrz

The paper deals with corpus-based methods applied to the particular tasks of lexical database construction. Different techniques of the corpus analysis are discussed and their applicability for the tasks is assessed. Corpus management system Manatee + Bonito developed at the Faculty of Informatics, Masaryk University in Brno, Czech Republic, is presented as a tool that enables to perform all di...

متن کامل

Building and exploiting a French corpus for sentiment analysis (Construction et exploitation d'un corpus français pour l'analyse de sentiment) [in French]

2013

Marc Vincent Grégoire Winterstein

Building and exploiting a French corpus for sentiment analysis This work introduces a French corpus for sentiment analysis. We describe the construction and organization of the corpus. We then apply machine learning techniques to automatically predict whether a text is positive or negative (the opinion classification task). Two techniques are used : logistic regression and classification based ...

متن کامل

PhD Research Proposal - Visualising Software Corpus Analysis

2008

Craig Anslow

Despite the spread of software development and software usage, we have almost no dependable data on how software is actually written in practice. Understanding the shape of existing software is an important step to understanding what good software looks like. Our proposal is to undertake quantitative studies of the way software is actually written in practice and evolved over time by collecting...

متن کامل

Open Source Corpus Analysis Tools for Malay

2006

Timothy Baldwin Su'ad Awab

Tokenisers, lemmatisers and POS taggers are vital to the linguistic and digital furtherment of any language. In this paper, we present an open source toolkit for Malay incorporating a word and sentence tokeniser, a lemmatiser and a partial POS tagger, based on heavy reuse of pre-existing language resources. We outline the software architecture of each component, and present an evaluation of eac...

متن کامل

Thematic Analysis and Visualization of Textual Corpus

Journal: :CoRR 2011

Anja Habacha Chaïbi Ferihane Kboubi Mohamed Ben Ahmed

The semantic analysis of documents is a domain of intense research at present. The works in this domain can take several directions and touch several levels of granularity. In the present work we are exactly interested in the thematic analysis of the textual documents. In our approach, we suggest studying the variation of the theme relevance within a text to identify the major theme and all the...

متن کامل

Analysis of an Extended Interaction Quality Corpus

2015

Stefan Ultes María Jesús Platero Sánchez Alexander Schmitt Wolfgang Minker

The Interaction Quality paradigm has been suggested as evaluation method for Spoken Dialogue Systems and several experiments based on the LEGO corpus have shown its suitability. However, the corpus size was rather limited resulting in insufficient data for some mathematical models. Hence, we present an extension to the LEGO corpus. We validate the annotation process and further show that applyi...

متن کامل