Modeling online word segmentation performance in structured artificial languages

نویسندگان

  • Stephan C. Meylan
  • Chigusa Kurumada
  • Mike Frank
  • Benjamin Börschinger
  • Mark Johnson
چکیده

Lexical dependencies abound in natural language: words tend to follow particular words or word categories. However, artificial language learning experiments exploring word segmentation have so far lacked such structure. In the present study, we explore whether simple inter-word dependencies influence the word segmentation performance of adult learners. We use a continuous testing paradigm instead of an experimentfinal test battery to reveal the trajectory of learning and to allow detailed comparison with three computational models of word segmentation. Adult performance on languages with dependencies is equal or lower to those without. Of the models tested, all perform worse on languages with dependencies, though a novel particle filter-based lexical segmentation model produces learning curves most similar to human subjects.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Handwritten Uygur Character String Recognition Method Combining Segmentation and Whole Word Recognition

As one of the official languages used in the Xinjiang Uygur Autonomous Region, researches on its handwriting recognition technology still lag behind, and the input method still stays in the keyboard code stage. Based on the previously developed UCpen2.0 handwriting sample database, this paper propose an algorithm that combines the whole-word recognition and segmentation recognition of Uygur; an...

متن کامل

Text classification in Asian languages without word segmentation

We present a simple approach for Asian language text classification without word segmentation, based on statistical -gram language modeling. In particular, we examine Chinese and Japanese text classification. With character -gram models, our approach avoids word segmentation. However, unlike traditional ad hoc -gram models, the statistical language modeling based approach has strong information...

متن کامل

Statistical Speech Segmentation and Word Learning in Parallel: Scaffolding from Child-Directed Speech

In order to acquire their native languages, children must learn richly structured systems with regularities at multiple levels. While structure at different levels could be learned serially, e.g., speech segmentation coming before word-object mapping, redundancies across levels make parallel learning more efficient. For instance, a series of syllables is likely to be a word not only because of ...

متن کامل

Vietnamese Word Segmentation

Word segmentation is the first and obligatory task for every NLP. For inflectional languages like English, French, Dutch,.. their word boundaries are simply assumed to be whitespaces or punctuations. Whilst in various Asian languages, including Chinese and Vietnamese, whitespaces are never used to determine the word boundaries, so one must resort to such higher levels of information as: informa...

متن کامل

Modeling human performance in statistical word segmentation.

The ability to discover groupings in continuous stimuli on the basis of distributional information is present across species and across perceptual modalities. We investigate the nature of the computations underlying this ability using statistical word segmentation experiments in which we vary the length of sentences, the amount of exposure, and the number of words in the languages being learned...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012