A multilingual corpus for rich audio-visual scene description in a meeting-room environment

نویسندگان

Taras Butko

Climent Nadeu

چکیده

In this paper, we present a multilingual database specifically designed to develop technologies for rich audio-visual scene description in meeting-room environments. Part of that database includes the already existing CHIL audio-visual recordings, whose annotations have been extended. A relevant objective in the new recorded sessions was to include situations in which the semantic content can not be extracted from a single modality. The presented database, that includes five hours of rather spontaneously generated scientific presentations, was manually annotated using standard or previously reported annotation schemes, and will be publicly available for the research purposes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The segmentation of multi-channel meeting recordings for automatic speech recognition

One major research challenge in the domain of the analysis of meeting room data is the automatic transcription of what is spoken during meetings, a task which has gained considerable attention within the ASR research community through the NIST rich transcription evaluations conducted over the last three years. One of the major difficulties in carrying out automatic speech recognition (ASR) on t...

متن کامل

بررسی ابزارها و روش‌های ایجاد محرمیت در خانه‎ی زینت‌الملک شیراز منطبق بر آیات و روایات اسلامی

Introduction Privacy is known as one of the most basic features of Islamic architecture. Home is the most private places for the person so it is essential to provide confidentiality and privacy in it. Islam, Quran Verses and Hadiths of Prophet Mohammad and imams have focused on creating privacy in the houses. In this way, the privacy has been the basic principle on traditional architectures ...

متن کامل

Audio-Visual Fused Online Context Analysis Toward Smart Meeting Room

Context-aware systems incorporate multimodal information to analyze contextual information in users’ environment and provide various proactive services according to dynamic context. In this paper, a novel online context analysis framework is proposed to support context-aware computing in smart meeting room. A novel dynamic context model is presented to model human group interactions. Robust aud...

متن کامل

AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking

Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the groundtruth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, vi...

متن کامل

Scene Understanding through Audio-Visual Fusion

Scene understanding involves the integration of a wide variety of information to produce a through description of the robot's environment. By integrating spatial, visual and audio cues, we could provide a greater amount of understanding than can be obtained using one of the modalities alone. In this paper, we describe our current work on using audition to enhance existing object detection and t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

A multilingual corpus for rich audio-visual scene description in a meeting-room environment

نویسندگان

چکیده

منابع مشابه

The segmentation of multi-channel meeting recordings for automatic speech recognition

بررسی ابزارها و روش‌های ایجاد محرمیت در خانه‎ی زینت‌الملک شیراز منطبق بر آیات و روایات اسلامی

Audio-Visual Fused Online Context Analysis Toward Smart Meeting Room

AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking

Scene Understanding through Audio-Visual Fusion

عنوان ژورنال:

اشتراک گذاری