Towards unsupervised learning of constructions from text

نویسندگان

  • Krista Lagus
  • Oskar Kohonen
  • Sami Virpioja
چکیده

Statistical learning methods offer a route for identifying linguistic constructions. Phrasal constructions are interesting both from the viewpoint of cognitive modeling and for improving NLP applications such as machine translation. In this article, an initial model structure and search algorithm for attempting to learn constructions from plain text is described. An information-theoretic optimization criteria, namely the Minimum Description Length principle, is utilized. The method is applied to a Finnish corpus consisting of stories told by children.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Constructions of Natural Language: Statistical Models and Evaluations

Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Sami Virpioja Name of the doctoral dissertation Learning Constructions of Natural Language: Statistical Models and Evaluations Publisher School of Science Unit Department of Information and Computer Science Series Aalto University publication series DOCTORAL DISSERTATIONS 158/2012 Field of research Computer and Information Sci...

متن کامل

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

Automatic speech recognition (ASR) has been widely researched with supervised approaches, while many lowresourced languages lack audio-text aligned data, and supervised methods cannot be applied on them. In this work, we propose a framework to achieve unsupervised ASR on a read English speech dataset, where audio and text are unaligned. In the first stage, each word-level audio segment in the u...

متن کامل

Using Deep Learning Towards Biomedical Knowledge Discovery

A vast amount of knowledge exists within biomedical literature, publications, clinical notes and online content. Identifying hidden, interesting or previously unknown biomedical knowledge from free text resources using an automated approach remains an important challenge. Towards this problem, we investigate the use of deep learning methods that have shown significant promise in identifying hid...

متن کامل

Robust Error Detection: A Hybrid Approach Combining Unsupervised Error Detection and Linguistic Knowledge

This article presents a robust probabilistic method for the detection of context-sensitive spelling errors. The algorithm identifies lessfrequent grammatical constructions and attempts to transform them into more-frequent constructions while retaining similar syntactic structure. If the transformations result in lowfrequency constructions, the text is likely to contain an error. A first unsuper...

متن کامل

A Group Theoretic Perspective on Unsupervised Deep Learning

The modern incarnation of neural networks, now popularly known as Deep Learning (DL), accomplished record-breaking success in processing diverse kinds of signals vision, audio, and text. In parallel, strong interest has ensued towards constructing a theory of DL. This paper opens up a group theory based approach, towards a theoretical understanding of DL, in particular the unsupervised variant....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009