نتایج جستجو برای: lexical segmentation

تعداد نتایج: 95920  

2007
Andrew Rosenberg Mehrbod Sharifi Julia Hirschberg

Story segmentation of news broadcasts has been shown to improve the accuracy of the subsequent processes such as question answering and information retrieval. In previous work, a decision tree trained on automatically extracted lexical and acoustic features was trained to predict story boundaries, using hypothesized sentence boundaries to define potential story boundaries. In this paper, we emp...

Journal: :TAL 2006
Sophie Piérard Yves Bestgen

This research aims at validating a methodology for the study of segmentation markers in large corpora. Two indices signalling a thematic break in a text are proposed. The first is based on the presence of a paragraph mark and employs the odds ratio to identify the best markers. The second takes into account lexical cohesion between sentences via an index resulting from latent semantic analysis....

2014
Gabriel Synnaeve Isabelle Dautriche Benjamin Börschinger Mark Johnson Emmanuel Dupoux

This paper extends existing word segmentation models to take non-linguistic context into account. It improves the token F-score of a top performing segmentation models by 2.5% on a 27k utterances dataset. We posit that word segmentation is easier in-context because the learner is not trying to access irrelevant lexical items. We use topics from a Latent Dirichlet Allocation model as a proxy for...

2008
Maria Georgescul Alexander Clark Susan Armstrong

In this article we address the task of automatic text structuring into linear and nonoverlapping thematic episodes at a coarse level of granularity. In particular, we deal with topic segmentation on multi-party meeting recording transcripts, which pose specific challenges for topic segmentation models. We present a comparative study of two probabilistic mixture models. Based on lexical features...

2005
Gaël Dias Elsa Alves

Topic Segmentation is the task of breaking documents into topically coherent multiparagraph subparts. In particular, Topic Segmentation is extensively used in Text Summarization to provide more coherent results by taking into account raw document structure. However, most methodologies are based on lexical repetition that show evident reliability problems or rely on harvesting linguistic resourc...

Journal: :Journal of Chinese Language and Computing 2007
Na Ye Jingbo Zhu Huizhen Wang Matthew Y. Ma Bin Zhang

The Dotplotting method has been widely used for text segmentation for its merits in detecting lexical repetition in global context. However, a theoretical analysis of its segmentation criterion function finds several deficiencies. The original function can not make full use of the text structure features and does not suit the text segmentation task very well. We propose an improved model (MMD m...

2010
Camille Guinaudeau Guillaume Gravier Pascale Sébillot

The increasing quantity of video material requires methods to help users navigate such data, among which topic segmentation techniques. The goal of this article is to improve ASRbased topic segmentation methods to deal with peculiarities of professional-video transcripts (transcription errors and lack of repetitions) while remaining generic enough. To this end, we introduce confidence measures ...

2016
Linda Garami Anett Ragó Ferenc Honbolygó Valéria Csépe

Infants develop different kinds of long-term linguistic representation as early as in their first year of life. We examined the interaction of early lexical access and prosodic processing. It is proposed that familiar word forms are stored in a protolexicon before linking any concepts to them, enabling early (proto)lexical segmentation from fluent speech. Additionally, previous results strength...

Journal: :Psychological review 2008
Dennis Norris James M McQueen

A Bayesian model of continuous speech recognition is presented. It is based on Shortlist (D. Norris, 1994; D. Norris, J. M. McQueen, A. Cutler, & S. Butterfield, 1997) and shares many of its key assumptions: parallel competitive evaluation of multiple lexical hypotheses, phonologically abstract prelexical and lexical representations, a feedforward architecture with no online feedback, and a lex...

2000
Matthew Harold Davis John Bullinaria Morten Christiansen Gary Cottrell Tom Loucas

This thesis examines an important issue in spoken word recognition; how the perceptual system segments connected speech into lexical units or words. Research on this topic has investigated the role of different sources of information in dividing up the speech stream: acoustic cues in the speech signal, statistical regularities in the structure of the language or through the identification of in...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید