linear text

The beauty in a beast: Minimising the effects of diverse recording quality on vowel formant measurements in sociophonetic real-time studies

Journal: :Speech Communication 2017

Tamara Rathcke Jane Stuart-Smith Bernard Torsney Jonathan Harrington

Sociophonetic real-time studies of vowel variation and change rely on acoustic analyses of sound recordings made at different times, often using different equipment and data collection procedures. The circumstances of a recording are known to affect formant tracking and may therefore compromise the validity of conclusions about sound changes made on the basis of real-time data. In this paper, a...

متن کامل

Parsing Argumentation Structures in Persuasive Essays

Journal: :Computational Linguistics 2017

Christian Stab Iryna Gurevych

In this article, we present a novel approach for parsing argumentation structures. We identify argument components using sequence labeling at the token level and apply a new joint model for detecting argumentation structures. The proposed model globally optimizes argument component types and argumentative relations using integer linear programming. We show that our model considerably improves t...

متن کامل

Microblog Search Task at CLEF 2017: Query Generation using IR and LDA Topic Modeling Combination

2017

Malek Hajjem Cherif Chiraz Latiri

The microblogs search task at CLEF 2017 consists of developing a system to search the most relevant microblogs for cultural query in a collection about festivals in all languages. Our general approach to get this objective is the following: we propose to generate from the initial tweet queries, provided for the task, extended queries able to get an answer-rich set of microblogs. This is achieve...

متن کامل

Joint Lemmatization and Morphological Tagging with Lemming

2015

Thomas Müller Ryan Cotterell Alexander M. Fraser Hinrich Schütze

We present LEMMING, a modular loglinear model that jointly models lemmatization and tagging and supports the integration of arbitrary global features. It is trainable on corpora annotated with gold standard tags and lemmata and does not rely on morphological dictionaries or analyzers. LEMMING sets the new state of the art in token-based statistical lemmatization on six languages; e.g., for Czec...

متن کامل

Japanese Dependency Parsing Using a Tournament Model

2008

Masakazu Iwatate Masayuki Asahara Yuji Matsumoto

In Japanese dependency parsing, Kudo’s relative preference-based method (Kudo and Matsumoto, 2005) outperforms both deterministic and probabilistic CKY-based parsing methods. In Kudo’s method, for each dependent word (or chunk) a loglinear model estimates relative preference of all other candidate words (or chunks) for being as its head. This cannot be considered in the deterministic parsing me...

متن کامل

Efficient Inference and Structured Learning for Semantic Role Labeling

Journal: :TACL 2015

Oscar Täckström Kuzman Ganchev Dipanjan Das

We present a dynamic programming algorithm for efficient constrained inference in semantic role labeling. The algorithm tractably captures a majority of the structural constraints examined by prior work in this area, which has resorted to either approximate methods or off-theshelf integer linear programming solvers. In addition, it allows training a globally-normalized log-linear model with res...

متن کامل

A Log-Linear Model for Unsupervised Text Normalization

2013

Yi Yang Jacob Eisenstein

We present a unified unsupervised statistical model for text normalization. The relationship between standard and non-standard tokens is characterized by a log-linear model, permitting arbitrary features. The weights of these features are trained in a maximumlikelihood framework, employing a novel sequential Monte Carlo training algorithm to overcome the large label space, which would be imprac...

متن کامل

Text-independent speaker recognition using non-linear frame likelihood transformation

Journal: :Speech Communication 1998

Konstantin Markov Seiichi Nakagawa

When the reference speakers are represented by Gaussian mixture model (GMM), the conventional approach is to accumulate the frame likelihoods over the whole test utterance and compare the results as in speaker identi®cation or apply a threshold as in speaker veri®cation. In this paper we describe a method, where frame likelihoods are transformed into new scores according to some non-linear func...

متن کامل

SegGen: A Genetic Algorithm for Linear Text Segmentation

2007

Sylvain Lamprier Tassadit Amghar Bernard Levrat Frédéric Saubion

This paper describes SegGen, a new algorithm for linear text segmentation on general corpuses. It aims to segment texts into thematic homogeneous parts. Several existing methods have been used for this purpose, based on a sequential creation of boundaries. Here, we propose to consider boundaries simultaneously thanks to a genetic algorithm. SegGen uses two criteria: maximization of the internal...

متن کامل

Linear Text Segmentation using a Dynamic Programming Algorithm

2003

Athanasios Kehagias Pavlina Fragkou Vassilios Petridis

In this paper we introduce a dynamic programming algorithm to perform linear text segmentation by global minimization of a segmentation cost function which consists of: (a) within-segment word similarity and (b) prior information about segment length. The evaluation of the segmentation accuracy of the algorithm on Choi's text collection showed that the algorithm achieves the best segmentation a...

متن کامل