Using a Language Technology Infrastructure for German in order to Anonymize German Sign Language Corpus Data

نویسندگان

  • Julian Bleicken
  • Thomas Hanke
  • Uta Salden
  • Sven Wagner
چکیده

For publishing sign language corpus data on the web, anonymization is crucial even if it is impossible to hide the visual appearance of the signers: In a small community, even vague references to third persons may be enough to identify those persons. In the case of the DGS Korpus (German Sign Language corpus) project, we want to publish data as a contribution to the cultural heritage of the sign language community while annotation of the data is still ongoing. This poses the question how well anonymization can be achieved given that no full linguistic analysis of the data is available. Basically, we combine analysis of all data that we have, including named entity recognition on translations into German. For this, we use the WebLicht language technology infrastructure. We report on the reliability of these methods in this special context and also illustrate how the anonymization of the video data is technically achieved in order to minimally disturb the viewer.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Willingness to Communicate (WTC) among Beginning-level German Learners: Teaching German as a Foreign Language in a U.S. University Classroom

This action research examines the concept of Willingness to Communicate (WTC) in a second language acquisition context. The researcher investigated the contributors of WTC in a foreign language classroom setting. Therefore, a multiple assignments method and sequence was applied. Participants of this study were students who matriculated in a United States (U.S.) undergraduate program, studying G...

متن کامل

A German Sign Language Corpus of the Domain Weather Report

All systems for automatic sign language translation and recognition, in particular statistical systems, rely on adequately sized corpora. For this purpose, we created the Phoenix corpus that is based on German television weather reports translated into German Sign Language. It comes with a rich annotation of the video data, a bilingual text-based sentence corpus and a monolingual German corpus.

متن کامل

Willingness to Communicate (WTC) among Beginning-level German Learners: Teaching German as a Foreign Language in a U.S. University Classroom

This action research examines the concept of Willingness to Communicate (WTC) in a second language acquisition context. The researcher investigated the contributors of WTC in a foreign language classroom setting. Therefore, a multiple assignments method and sequence was applied. Participants of this study were students who matriculated in a United States (U.S.) undergraduate program, studying G...

متن کامل

RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus

This paper introduces the RWTH-PHOENIX-Weather corpus, a video-based, large vocabulary corpus of German Sign Language suitable for statistical sign language recognition and translation. In contrast to most available sign language data collections, the RWTH-PHOENIX-Weather corpus has not been recorded for linguistic research but for the use in statistical pattern recognition. The corpus contains...

متن کامل

The ATIS Sign Language Corpus

Systems that automatically process sign language rely on appropriate data. We therefore present the ATIS sign language corpus that is based on the domain of air travel information. It is available for five languages, English, German, Irish sign language, German sign language and South African sign language. The corpus can be used for different tasks like automatic statistical translation and au...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016