The NESPOLE! voIP multilingual corpora in tourism and medical domains
نویسندگان
چکیده
In this paper we present the multilingual VoIP (Voice over Internet Protocol networks) corpora collected for the second showcase of the Nespole! project in the tourism and medical domains. The corpora comprise over 20 hours of human-tohuman monolingual dialogues in English, French, German and Italian: 66 dialogues in the tourism domain and 49 in the medical domain. We describe in detail the data collection (technical set-up, scenarios for each domain, recording procedure and data transcription), as well as statistically illustrated corpora and a preliminary data analysis.
منابع مشابه
The Italian NESPOLE! Corpus: a Multilingual Database with Interlingua Annotation in Tourism and Medical Domains
This paper presents the Italian NESPOLE! Database. The database consists of three parts: The first two, called DB-1 and DB-2 concern the tourism domain, while the third part, DB-3, concentrates on the medical domain. The database includes audio files, transcriptions, Interlingua annotations in IF (Interchange Format) and translations into English, French and German. We describe how the database...
متن کاملThe nespole! voIP dialogue database
This paper presents the status of the NESPOLE! data collection as of end of February, 2001. A multilingual VoIP (Voice over Internet Protocol networks) database consisting of 200 dialogues in 4 languages (English, German, Italian and French) was recorded and transcribed. Dialogue speakers were connected via a H323 video-conferencing terminal. We describe the task, the technical architecture, th...
متن کاملNESPOLE!'s Multilingual and Multimodal Corpus
NESPOLE! is a EU/NSF jointly funded project exploring multilingual (speech-to-speech translation) and multimodal communication in e-services. The current system allows users speaking different languages (English, French, German and Italian) to interact on the tourism domain through the Internet using thin terminals (PCs with sound and video cards and H323 video-conferencing software). Web pages...
متن کاملThe NESPOLE ! Multimodal Speech-to-Speech Translation System: User Based System Improvements
This work discusses the results of two user studies aiming to evaluate the NESPOLE! speech-to-speech translation system, which provides for multilingual and multimodal communication in the tourism and in the medical domain, allowing users to interact through the Internet by sharing maps, web-pages and pen-based gestures. The purpose is to investigate the overall effectiveness of the combination...
متن کاملDeveloping Parallel Sense-tagged Corpora with Wordnets
Semantically annotated corpora play an important role in natural language processing. This paper presents the results of a pilot study on building a sense-tagged parallel corpus, part of ongoing construction of aligned corpora for four languages (English, Chinese, Japanese, and Indonesian) in four domains (story, essay, news, and tourism) from the NTU-Multilingual Corpus. Each subcorpus is firs...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003