Learning visually grounded words and syntax for a scene description task
نویسندگان
چکیده
منابع مشابه
Learning visually grounded words and syntax for a scene description task
A spoken language generation system has been developed that learns to describe objects in computer-generated visual scenes. The system is trained by a ‘show-and-tell" procedure in which visual scenes are paired with natural language descriptions. Learning algorithms acquire probabilistic structures which encode the visual semantics of phrase structure, word classes, and individual words. Using ...
متن کاملLearning Visually Grounded Words and Syntax of Natural Spoken Language
Properties of the physical world have shaped human evolutionary design and given rise to physically grounded mental representations. These grounded representations provide the foundation for higher level cognitive processes including language. Most natural language processing machines to date lack grounding. This paper advocates the creation of physically grounded language learning machines as ...
متن کاملVisually-Grounded Bayesian Word Learning
Learning the meaning of a novel noun from a few labeled objects is one of the simplest aspects of learning a language, but approximating human performance on this task is still a significant challenge for current machine learning systems. Current methods typically fail to find the appropriate level of generalization in a concept hierarchy for a given visual stimulus. Recent work in cognitive sc...
متن کاملLearning Visually Grounded Sentence Representations
We introduce a variety of models, trained on a supervised image captioning corpus to predict the image features for a given caption, to perform sentence representation grounding. We train a grounded sentence encoder that achieves good performance on COCO caption and image retrieval and subsequently show that this encoder can successfully be transferred to various NLP tasks, with improved perfor...
متن کاملThe Origins of Syntax in Visually Grounded Robotic Agents
The paper proposes a set of principles and a general architecture that may explain how language and meaning may originate and complexify in a group of physically grounded distr ibuted agents. An experimental setup is introduced for concretising and validating specific mechanisms based on these principles. The setup consists of two robotic heads that watch a scene in which a robot moves around i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computer Speech & Language
سال: 2002
ISSN: 0885-2308
DOI: 10.1016/s0885-2308(02)00024-4