The NESPOLE! voIP multilingual corpora in tourism and medical domains

نویسندگان

  • Nadia Mana
  • Susanne Burger
  • Roldano Cattoni
  • Laurent Besacier
  • Victoria MacLaren
  • John W. McDonough
  • Florian Metze
چکیده

In this paper we present the multilingual VoIP (Voice over Internet Protocol networks) corpora collected for the second showcase of the Nespole! project in the tourism and medical domains. The corpora comprise over 20 hours of human-tohuman monolingual dialogues in English, French, German and Italian: 66 dialogues in the tourism domain and 49 in the medical domain. We describe in detail the data collection (technical set-up, scenarios for each domain, recording procedure and data transcription), as well as statistically illustrated corpora and a preliminary data analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Italian NESPOLE! Corpus: a Multilingual Database with Interlingua Annotation in Tourism and Medical Domains

This paper presents the Italian NESPOLE! Database. The database consists of three parts: The first two, called DB-1 and DB-2 concern the tourism domain, while the third part, DB-3, concentrates on the medical domain. The database includes audio files, transcriptions, Interlingua annotations in IF (Interchange Format) and translations into English, French and German. We describe how the database...

متن کامل

The nespole! voIP dialogue database

This paper presents the status of the NESPOLE! data collection as of end of February, 2001. A multilingual VoIP (Voice over Internet Protocol networks) database consisting of 200 dialogues in 4 languages (English, German, Italian and French) was recorded and transcribed. Dialogue speakers were connected via a H323 video-conferencing terminal. We describe the task, the technical architecture, th...

متن کامل

NESPOLE!'s Multilingual and Multimodal Corpus

NESPOLE! is a EU/NSF jointly funded project exploring multilingual (speech-to-speech translation) and multimodal communication in e-services. The current system allows users speaking different languages (English, French, German and Italian) to interact on the tourism domain through the Internet using thin terminals (PCs with sound and video cards and H323 video-conferencing software). Web pages...

متن کامل

The NESPOLE ! Multimodal Speech-to-Speech Translation System: User Based System Improvements

This work discusses the results of two user studies aiming to evaluate the NESPOLE! speech-to-speech translation system, which provides for multilingual and multimodal communication in the tourism and in the medical domain, allowing users to interact through the Internet by sharing maps, web-pages and pen-based gestures. The purpose is to investigate the overall effectiveness of the combination...

متن کامل

Developing Parallel Sense-tagged Corpora with Wordnets

Semantically annotated corpora play an important role in natural language processing. This paper presents the results of a pilot study on building a sense-tagged parallel corpus, part of ongoing construction of aligned corpora for four languages (English, Chinese, Japanese, and Indonesian) in four domains (story, essay, news, and tourism) from the NTU-Multilingual Corpus. Each subcorpus is firs...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003