SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph
نویسندگان
چکیده
Existing multimodal conversation agents have shown impressive abilities to locate absolute positions or retrieve attributes in simple scenarios, but they fail perform well when complex relative and information alignments are involved, which poses a bottleneck response quality. In this paper, we propose Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph (SPRING) of reasoning multi-hops spatial relations connecting them visual crowded situated scenarios. Specifically, design two types Question Answering (MQA) tasks pretrain the agent. All QA pairs utilized during pretraining generated novel Increment Graphs (ILG). pair difficulty labels automatically annotated by ILG used promote MQA-based Curriculum Learning. Experimental results verify SPRING's effectiveness, showing that it significantly outperforms state-of-the-art approaches on both SIMMC 1.0 2.0 datasets. We release our code data at https://github.com/LYX0501/SPRING.
منابع مشابه
Incremental Dialogue Understanding and Feedback for Multiparty, Multimodal Conversation
In order to provide comprehensive listening behavior, virtual humans engaged in dialogue need to incrementally listen, interpret, understand, and react to what someone is saying, in real time, as they are saying it. In this paper, we describe an implemented system for engaging in multiparty dialogue, including incremental understanding and a range of feedback. We present an FML message extensio...
متن کاملLangage de conversation multimodal pour agent conversationnel animé
This article falls within the realm of dialogue between a human and an Embodied Conversational Agent (ECA). We claim that a specific agent conversational language is needed for such interactions, based on the essential role of emotion in human communication. In order to define this language, we propose a library of Multimodal Conversation Acts is proposed, based in particular on speech acts and...
متن کاملPatch Layout from Feature Graph
Structuring of surface meshes is a labor intensive task in reverse engineering. For example in CAD, scanned triangle meshes must be divided into characteristic/uniform patches to enable conversion into high-level spline surfaces. Typical industrial techniques, like rolling ball blends, are very labor intensive. We provide a novel, robust and quick algorithm for the automatic generation of a pat...
متن کاملSocial Interaction: Multimodal Conversation with Social Agents
including robotics, artificial life, and artificial ecosystems. We present a new approach to human-computer interaction, called so&Z interaction. Its main characteristics are summarized by the following three points. First, interactions are realized as multimodal (verbal and nonverbal) conversation using spoken language, facial expressions, and so on. Second, the conversants are a group of huma...
متن کاملIncremental Layout in DynaDAG
E ective techniques have been developed for some important families of graph layouts, such as hierarchies, planar embeddings, orthogonal grids and forced-directed (spring) models [1]. These techniques have been incorporated in practical user interfaces that display static diagrams of relationships between objects [19, 18, 17]. Static diagrams are not completely satisfactory because in many situ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i11.26562