A multilingual corpus for rich audio-visual scene description in a meeting-room environment
نویسندگان
چکیده
In this paper, we present a multilingual database specifically designed to develop technologies for rich audio-visual scene description in meeting-room environments. Part of that database includes the already existing CHIL audio-visual recordings, whose annotations have been extended. A relevant objective in the new recorded sessions was to include situations in which the semantic content can not be extracted from a single modality. The presented database, that includes five hours of rather spontaneously generated scientific presentations, was manually annotated using standard or previously reported annotation schemes, and will be publicly available for the research purposes.
منابع مشابه
The segmentation of multi-channel meeting recordings for automatic speech recognition
One major research challenge in the domain of the analysis of meeting room data is the automatic transcription of what is spoken during meetings, a task which has gained considerable attention within the ASR research community through the NIST rich transcription evaluations conducted over the last three years. One of the major difficulties in carrying out automatic speech recognition (ASR) on t...
متن کاملبررسی ابزارها و روشهای ایجاد محرمیت در خانهی زینتالملک شیراز منطبق بر آیات و روایات اسلامی
Introduction Privacy is known as one of the most basic features of Islamic architecture. Home is the most private places for the person so it is essential to provide confidentiality and privacy in it. Islam, Quran Verses and Hadiths of Prophet Mohammad and imams have focused on creating privacy in the houses. In this way, the privacy has been the basic principle on traditional architectures ...
متن کاملAudio-Visual Fused Online Context Analysis Toward Smart Meeting Room
Context-aware systems incorporate multimodal information to analyze contextual information in users’ environment and provide various proactive services according to dynamic context. In this paper, a novel online context analysis framework is proposed to support context-aware computing in smart meeting room. A novel dynamic context model is presented to model human group interactions. Robust aud...
متن کاملAV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking
Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the groundtruth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, vi...
متن کاملScene Understanding through Audio-Visual Fusion
Scene understanding involves the integration of a wide variety of information to produce a through description of the robot's environment. By integrating spatial, visual and audio cues, we could provide a greater amount of understanding than can be obtained using one of the modalities alone. In this paper, we describe our current work on using audition to enhance existing object detection and t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011