lexical segmentation

The use of time during lexical processing and segmentation: A review

Journal: :Psychonomic Bulletin & Review 1997

The Role of Lexical Resources in CJK Natural Language Processing

2006

Jack Halpern

The role of lexical resources is often understated in NLP research. The complexity of Chinese, Japanese and Korean (CJK) poses special challenges to developers of NLP tools, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT). These difficulties are exacerbated by the lack of comprehensive lexical resources, e...

متن کامل

Clause-based Discourse Segmentation of Arabic Texts

2012

Iskandar Keskes Farah Benamara Lamia Hadrich Belguith

This paper describes a rule-based approach to segment Arabic texts into clauses. Our method relies on an extensive analysis of a large set of lexical cues as well as punctuation marks. Our analysis was carried out on two different corpus genres: news articles and elementary school textbooks. We propose a three steps segmentation algorithm: first by using only punctuation marks, then by relying ...

متن کامل

Segmentation strategies for spoken language recognition: evidence from semi-bilingual Japanese speakers of English

1996

Kiyoko Yoneyama

The present study investigates speech segmentation of English by semi-bilingual Japanese speakers of English. Two experiments were conducted with forty subjects. The first experiment used a syllable-monitoring task and the results did not show a moraic segmentation strategy, which is a languagespecific segmentation strategy for native speakers of Japanese. This suggests that they employed a gen...

متن کامل

Bayesian Unsupervised Topic Segmentation

2008

Jacob Eisenstein Regina Barzilay

This paper describes a novel Bayesian approach to unsupervised topic segmentation. Unsupervised systems for this task are driven by lexical cohesion: the tendency of wellformed segments to induce a compact and consistent lexical distribution. We show that lexical cohesion can be placed in a Bayesian context by modeling the words in each topic segment as draws from a multinomial language model a...

متن کامل

The Contribution of Lexical Resources to Natural Language Processing of CJK Languages

2006

Jack Halpern

The role of lexical resources is often understated in NLP research. The complexity of Chinese, Japanese and Korean (CJK) poses special challenges to developers of NLP tools, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT). These difficulties are exacerbated by the lack of comprehensive lexical resources, e...

متن کامل

Helyette: Inflectional Thesaurus for Agglutinative Languages

1993

Gábor Prószéky László Tihanyi

The inflectional thesaurus is a tool which (1) first performs the morphological segmentation of the input wordform, then (2) finds its stem's lexical base(s), (3) stores the suffix sequence situated on the right of the actual stem-allomorph, (4) offers the synonyms for the lexical base(s), and (5) generates the new word-form consisting of the adequate allomorph of the chosen stem and the adequa...

متن کامل

Automatic Paragraph Segmentation with Lexical and Prosodic Features

2016

Catherine Lai Mireia Farrús Johanna D. Moore

As long-form spoken documents become more ubiquitous in everyday life, so does the need for automatic discourse segmentation in spoken language processing tasks. Although previous work has focused on broad topic segmentation, detection of finer-grained discourse units, such as paragraphs, is highly desirable for presenting and analyzing spoken content. To better understand how different aspects...

متن کامل

Lexical and Sublexical Units in Speech Perception

Journal: :Cognitive science 2009

Ibrahima Giroux Arnaud Rey

Saffran, Newport, and Aslin (1996a) found that human infants are sensitive to statistical regularities corresponding to lexical units when hearing an artificial spoken language. Two sorts of segmentation strategies have been proposed to account for this early word-segmentation ability: bracketing strategies, in which infants are assumed to insert boundaries into continuous speech, and clusterin...

متن کامل

Phrase Structure in a Computational Model of Child Language Acquisition

2002

Helen Seville Peter Hancox

The problem of the acquisition of morpho-syntactic rules, as addressed by a number of existing computational models, is introduced. A distinction is made between ‘innatist’ models which presuppose the importance of innate linguistic knowledge (specifically, syntactic categories and X-Bar Theory), and ‘empiricist’ models, which reject such assumptions. It is argued that ‘empiricist’ models bette...

متن کامل