نتایج جستجو برای: lexical segmentation

تعداد نتایج: 95920  

2006
Jaime Arguello Carolyn Penstein Rosé

We introduce a novel topic segmentation approach that combines evidence of topic shifts from lexical cohesion with linguistic evidence such as syntactically distinct features of segment initial and final contributions. Our evaluation shows that this hybrid approach outperforms state-of-the-art algorithms even when applied to loosely structured, spontaneous dialogue. Further analysis reveals tha...

2008
Jacob Eisenstein Regina Barzilay Randall Davis

This paper explores the relationship between discourse segmentation and coverbal gesture. Introducing the idea of gestural cohesion, we show that coherent topic segments are characterized by homogeneous gestural forms and that changes in the distribution of gestural features predict segment boundaries. Gestural features are extracted automatically from video, and are combined with lexical featu...

2002
ELIZABETH SHRIBERG ANDREAS STOLCKE

This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech. We outline effective prosodic feature extraction, model architectures, and techniques to combine prosodic with lexical (word-based) information. We then survey a number of applications of the framework, and give results for automati...

2007
Chunyu Kit Hai Zhao

Supervised and unsupervised learning has seldom joined with and thus lend strength to each other in the field of Chinese word segmentation (CWS). This paper presents a novel approach to CWS that utilizes description length gain (DLG), an empirical goodness measure for unsupervised word discovery, to enhance the segmentation performance of conditional random field (CRF) learning. Specifically, w...

2013
Lan Du Wray L. Buntine Mark Johnson

We present a new hierarchical Bayesian model for unsupervised topic segmentation. This new model integrates a point-wise boundary sampling algorithm used in Bayesian segmentation into a structured topic model that can capture a simple hierarchical topic structure latent in documents. We develop an MCMC inference algorithm to split/merge segment(s). Experimental results show that our model outpe...

2006
Mengqiu Wang Yanxin Shi

Chinese word segmentation and Part-ofSpeech (POS) tagging have been commonly considered as two separated tasks. In this paper, we present a system that performs Chinese word segmentation and POS tagging simultaneously. We train a segmenter and a tagger model separately based on linear-chain Conditional Random Fields (CRF), using lexical, morphological and semantic features. We propose an approx...

2000
Branimir Boguraev Mary S. Neff

Summaries automatically derived by sentence extraction are known to exhibit some coherence degradation, readability deterioration, and topical under-representation. We propose a strategy for improving upon these problems, aiming to generate more cohesive summaries by analyzing the lexical cohesion factors in the source document texts. As an initial experiment, we have looked at one particular f...

2003
Andi Wu Zixin Jiang

This paper presents a model of language processing where word segmentation is an integral part of sentence analysis. We show that the use of a parser can enable us to achieve the best ambiguity resolution in word segmentation. The lexical component of this model resolves most of the ambiguities, but the final disambiguation takes place in the parsing process. In this model, word segmentation is...

1997
Matt H. Davis William D. Marslen-Wilson M. Gareth Gaskell

Earlier research has suggested that left embedded words (e.g. cat in catalog) present a problem for spoken word recognition since it is potentially unclear whether there is a word boundary at the offset of cat. Models of spoken word recognition have incorporated processes of competition so that the identification of embedded words can be delayed until longer interpretations have been ruled out....

2014
Matti Varjokallio Mikko Kurimo

String segmentation is an important and recurring problem in natural language processing and other domains. For morphologically rich languages, the amount of different word forms caused by morphological processes like agglutination, compounding and inflection, may be huge and causes problems for traditional word-based language modeling approach. Segmenting text into better modelable units is th...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید