Crossmodal binding of audio-visual correspondent features
نویسندگان
چکیده
منابع مشابه
Binding crossmodal object features in perirhinal cortex.
Knowledge of objects in the world is stored in our brains as rich, multimodal representations. Because the neural pathways that process this diverse sensory information are largely anatomically distinct, a fundamental challenge to cognitive neuroscience is to explain how the brain binds the different sensory features that comprise an object to form meaningful, multimodal object representations....
متن کاملIntroducing Crossmodal Biometrics: Person Identification from Distinct Audio & Visual Streams
Person identification using audio or visual biometrics is a well-studied problem in pattern recognition. In this scenario, both training and testing are done on the same modalities. However, there can be situations where this condition is not valid, i.e. training and testing has to be done on different modalities. This could arise, for example, in covert surveillance. Is there any person specif...
متن کاملAudio-visual speaker conversion using prosody features
The article presents a joint audio-video approach towards speaker identity conversion, based on statistical methods originally introduced for voice conversion. Using the experimental data from the 3D BIWI Audiovisual corpus of Affective Communication, mapping functions are built between each two speakers in order to convert speaker-specific features: speech signal and 3D facial expressions. The...
متن کاملEnhancing audio speech using visual speech features
This work presents a novel approach to speech enhancement by exploiting the bimodality of speech and the correlation that exists between audio and visual speech features. For speech enhancement, a visually-derived Wiener filter is developed. This obtains clean speech statistics from visual features by modelling their joint density and making a maximum a posteriori estimate of clean audio from v...
متن کاملHierarchical discriminant features for audio-visual LVCSR
We propose the use of a hierarchical, two-stage discriminant transformation for obtaining audio-visual features that improve automatic speech recognition. Linear discriminant analysis (LDA), followed by a maximum likelihood linear transform (MLLT) is first applied on MFCC based audio-only features, as well as on visualonly features, obtained by a discrete cosine transform of the video region of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Vision
سال: 2005
ISSN: 1534-7362
DOI: 10.1167/5.8.874