multimodal input

On the Annotation of Multimodal Behavior and Computation of Cooperation Between Modalities

2000

Jean-Claude MARTIN Sarah GRIMARD Katerina ALEXANDRI

With the success of multimedia and mobile devices, humancomputer interfaces combining several communication modalities such as speech and gesture may lead to more "natural" humancomputer interaction. Yet, developing multimodal interfaces requires an understanding (and thus the observation and analysis) of human multimodal behavior. In the field of annotation of multimodal corpus, there is no st...

متن کامل

Speech and Gaze Control for Desktop Environments

2016

Emiliano Castellina Fulvio Corno

This chapter illustrates a multimodal system based on the integration of speechand gazebased inputs for interaction with a real desktop environment. In this system, multimodal interactions aim at overcoming the instrinsic limit of each input channel taken alone. The chapter introduces the main eye tracking and speech recognition technologies, and describes a multimodal system that integrates th...

متن کامل

Generic Pipelined Multi-Agents Architecture for Multimedia Multimodal Software Environments

Journal: :Journal of Object Technology 2004

Hicham Djenidi Amar Ramdane-Cherif Chakib Tadj Nicole Lévy

Multimodal human-computer interaction needs intelligent architectures in order to enhance the flexibility and naturelness of the user interface. These architectures have the ability to manage several multithreaded input signals from different input media in order to perform their fusion into intelligent commands. In this paper, a generic comprehensive agent-based architecture for multimodal eng...

متن کامل

Nuance: a Software Tool for Capturing Synchronous Data Streams from Multimodal Musical Systems

2012

Jordan Hochenbaum Ajay Kapur

In this paper we describe Nuance, a software application for recording synchronous data streams from modern musical systems that involve audio and gesture signals. Nuance currently supports recording data from a number of input sources including real-time audio, and any sensor system, musical interface, or instrument which outputs serial, Open Sound Control (OSC), or MIDI. Nuance is unique in t...

متن کامل

SmartKom: Towards Multimodal Dialogues with Anthropomorphic Interface Agents

2001

Wolfgang Wahlster Norbert Reithinger Anselm Blocher

SmartKom is a multimodal dialogue system that combines speech, gesture, and facial expressions for input and output. SmartKom provides an anthropomorphic and affective user interface through its personification of an interface agent. Understanding of spontaneous speech is combined with video-based recognition of natural gestures and facial expressions. One of the major scientific goals of Smart...

متن کامل

IntellWheels MMI: A Flexible Interface for an Intelligent Wheelchair

2009

Luís Paulo Reis Rodrigo A. M. Braga Márcio Sousa António Paulo Moreira

With the rising concern about the needs of people with physical disabilities and with the aging of the population there is a major concern of creating electronic devices that may improve the life of the physically handicapped and elderly person. One of these new solutions passes through the adaptation of electric wheelchairs in order to give them environmental perception, more intelligent capab...

متن کامل

Flexible Speech and Pen Interaction with Handheld Devices

2007

Sorin DUSAN Sriram RAMACHANDRAN

An emerging research direction in the field of pervasive computing is to voice-enable applications on handheld computers. Map-based applications can benefit the most from multimodal interfaces based on speech and pen input and graphics and speech output. However, implementing automatic speech recognition and speech synthesis on handheld computers is constrained by the relatively low computation...

متن کامل

Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning

Journal: :CoRR 2017

Yang Xian Yingli Tian

In this paper, a self-guiding multimodal LSTM (sg-LSTM) image captioning model is proposed to handle uncontrolled imbalanced real-world image-sentence dataset. We collect FlickrNYC dataset from Flickr as our testbed with 306, 165 images and the original text descriptions uploaded by the users are utilized as the ground truth for training. Descriptions in FlickrNYC dataset vary dramatically rang...

متن کامل

Improving Human Interface in Drawing Tool Using Speech, Mouse and Key-board

1995

Takuya Nishimoto Nobutoshi Shida Tetsunori Kobayashi Katsuhiko Shirai

This paper focuses on the utility of speech input. We proposed some principles of humen-computer interaction, which consist from the basic principles and organization principles of interface that are required for comfortable input systems. Then applying the principles, we discuss the desired organization of interface using speech, mouse and key-board and designed a multimodal drawing tool S-tgi...

متن کامل

Modeling input modality choice in mobile graphical and speech interfaces

Journal: :Int. J. Hum.-Comput. Stud. 2015

Stefan Schaffer Robert Schleicher Sebastian Möller

In this paper, we review three experiments with a mobile application that integrates graphical input with a touch-screen and a speech interface and develop a model for input modality choice in multimodal interaction. The model aims to enable simulation of multimodal human–computer interaction for automatic usability evaluation. The experimental results indicate that modality efficiency and inpu...

متن کامل