corpus analysis

A Review Corpus for Argumentation Analysis

2014

Henning Wachsmuth Martin Trenkmann Benno Stein Gregor Engels Tsvetomira Palakarska

The analysis of user reviews has become critical in research and industry, as user reviews increasingly impact the reputation of products and services. Many review texts comprise an involved argumentation with facts and opinions on different product features or aspects. Therefore, classifying sentiment polarity does not suffice to capture a review’s impact. We claim that an argumentation analys...

متن کامل

Network Analysis with the Enron Email Corpus

Journal: :CoRR 2014

Johanna Hardin Ghassan Sarkis P. C. Urc

We use the Enron email corpus to study relationships in a network by applying six different measures of centrality. Our results came out of an in-semester undergraduate research seminar. The Enron corpus is well suited to statistical analyses at all levels of undergraduate education. Through this article’s focus on centrality, students can explore the dependence of statistical models on initial...

متن کامل

Lexical Semantic Techniques for Corpus Analysis

Journal: :Computational Linguistics 1993

James Pustejovsky Sabine Bergler Peter G. Anick

In this paper we outline a research program for computational linguistics, making extensive use of text corpora. We demonstrate how a semantic framework for lexical knowledge can suggest richer relationships among words in text beyond that of simple co-occurrence. The work suggests how linguistic phenomena such as metonymy and polysemy might be exploitable for semantic tagging of lexical items....

متن کامل

Corpus-based Stylistic Analysis of Tourism English

2010

Ning Kang Qiaofeng Yu

Tourism English belongs to the English for Specific Purpose (ESP) and it has its own stylistic features. This paper aims at analyzing the stylistic features of tourism English. Firstly, a large amount of authentic materials are collected from official tourism websites of Britain and the U.S., and then a corpus named Tourism English Corpus (TEC) is compiled. Freiburg-LOB Corpus of British Englis...

متن کامل

Corpus Analysis for TREC 5 Query Expansion

1996

Susan Gauch Jianying Wang

Accessing online information remains an inexact science. While valuable information can be found, typically many irrelevant documents are also retrieved and many relevant ones are missed. Terminology mismatches between the user's query and document contents is a main cause of retrieval failures. Expanding a user's query with related words can improve search performance, but the problem of ident...

متن کامل

Prosodic Analysis of a Corpus of Tales

2011

David Doukhan Albert Rilliard Sophie Rosset Martine Adda-Decker Christophe d'Alessandro

This paper presents a prosodic analysis of a corpus of 12 tales, read by one male speaker. The work is part of a project which aims at providing storytelling capacities to a humanoid robot. One main point is to improve text-to-speech synthesis expressivity according to a semi-automatic analysis of a given tale. Automatic tagging and prosodic stylization were applied to the corpus. The extracted...

متن کامل

Supporting CSCL with automatic corpus analysis technology

2005

Pinar Donmez Carolyn Penstein Rosé Karsten Stegmann Armin Weinberger Frank Fischer

Process analyses are becoming more and more standard in research on computer-supported collaborative learning. This paper presents the rational as well as results of an evaluation of a tool called TagHelper, designed for streamlining the process of multi-dimensional analysis of the collaborative learning process. In comparison with a hand-coded corpus coded with a 7 dimensional coding scheme, T...

متن کامل

A multimodal corpus for gesture expressivity analysis

2010

G. Caridakis J. Wagner A. Raouzaiou Z. Curto E. Andre K. Karpouzis

This work presents the design and implementation of corpus recording sessions along with some preliminary processing results. Captured modalities include speech and facial expressions but the focus is on hand gesture expressivity. Thus, this is the primary modality and is recorded using three methods: bare hands, Nintendo Wii remote controls and datagloves. Such a setup allows for multimodal af...

متن کامل

Morphological Analysis of the Slovak National Corpus

2006

Lucia Gianitsová

1. Basis of a morphological analysis of the Slovak National Corpus A question of morphological (or morphosyntactic) analysis has been a key problem for natural language processing (NLP) for several years. Automatic morphological annotation is a useful tool especially with regard to the corpus data processing. In this respect morphological annotation has been considered also during the developme...

متن کامل

CorpusReader: designing and querying multi-layer corpora

Journal: :TAL 2008

Sylvain Loiseau

CorpusReader is a framework for creating and querying multi-layer corpora, which contain several levels of analysis (morphology, syntax, semantics, etc.) and which are aimed at observing correlations between these levels. Building, representing and querying multi-layer corpora is complex. CorpusReader’s specificity essentially lies in merging the outputs of existing corpus analysis tools, avoid...

متن کامل