SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

نویسندگان

چکیده

Existing multimodal conversation agents have shown impressive abilities to locate absolute positions or retrieve attributes in simple scenarios, but they fail perform well when complex relative and information alignments are involved, which poses a bottleneck response quality. In this paper, we propose Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph (SPRING) of reasoning multi-hops spatial relations connecting them visual crowded situated scenarios. Specifically, design two types Question Answering (MQA) tasks pretrain the agent. All QA pairs utilized during pretraining generated novel Increment Graphs (ILG). pair difficulty labels automatically annotated by ILG used promote MQA-based Curriculum Learning. Experimental results verify SPRING's effectiveness, showing that it significantly outperforms state-of-the-art approaches on both SIMMC 1.0 2.0 datasets. We release our code data at https://github.com/LYX0501/SPRING.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental Dialogue Understanding and Feedback for Multiparty, Multimodal Conversation

In order to provide comprehensive listening behavior, virtual humans engaged in dialogue need to incrementally listen, interpret, understand, and react to what someone is saying, in real time, as they are saying it. In this paper, we describe an implemented system for engaging in multiparty dialogue, including incremental understanding and a range of feedback. We present an FML message extensio...

متن کامل

Langage de conversation multimodal pour agent conversationnel animé

This article falls within the realm of dialogue between a human and an Embodied Conversational Agent (ECA). We claim that a specific agent conversational language is needed for such interactions, based on the essential role of emotion in human communication. In order to define this language, we propose a library of Multimodal Conversation Acts is proposed, based in particular on speech acts and...

متن کامل

Patch Layout from Feature Graph

Structuring of surface meshes is a labor intensive task in reverse engineering. For example in CAD, scanned triangle meshes must be divided into characteristic/uniform patches to enable conversion into high-level spline surfaces. Typical industrial techniques, like rolling ball blends, are very labor intensive. We provide a novel, robust and quick algorithm for the automatic generation of a pat...

متن کامل

Social Interaction: Multimodal Conversation with Social Agents

including robotics, artificial life, and artificial ecosystems. We present a new approach to human-computer interaction, called so&Z interaction. Its main characteristics are summarized by the following three points. First, interactions are realized as multimodal (verbal and nonverbal) conversation using spoken language, facial expressions, and so on. Second, the conversants are a group of huma...

متن کامل

Incremental Layout in DynaDAG

E ective techniques have been developed for some important families of graph layouts, such as hierarchies, planar embeddings, orthogonal grids and forced-directed (spring) models [1]. These techniques have been incorporated in practical user interfaces that display static diagrams of relationships between objects [19, 18, 17]. Static diagrams are not completely satisfactory because in many situ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i11.26562