Using a Language Technology Infrastructure for German in order to Anonymize German Sign Language Corpus Data
نویسندگان
چکیده
For publishing sign language corpus data on the web, anonymization is crucial even if it is impossible to hide the visual appearance of the signers: In a small community, even vague references to third persons may be enough to identify those persons. In the case of the DGS Korpus (German Sign Language corpus) project, we want to publish data as a contribution to the cultural heritage of the sign language community while annotation of the data is still ongoing. This poses the question how well anonymization can be achieved given that no full linguistic analysis of the data is available. Basically, we combine analysis of all data that we have, including named entity recognition on translations into German. For this, we use the WebLicht language technology infrastructure. We report on the reliability of these methods in this special context and also illustrate how the anonymization of the video data is technically achieved in order to minimally disturb the viewer.
منابع مشابه
Willingness to Communicate (WTC) among Beginning-level German Learners: Teaching German as a Foreign Language in a U.S. University Classroom
This action research examines the concept of Willingness to Communicate (WTC) in a second language acquisition context. The researcher investigated the contributors of WTC in a foreign language classroom setting. Therefore, a multiple assignments method and sequence was applied. Participants of this study were students who matriculated in a United States (U.S.) undergraduate program, studying G...
متن کاملA German Sign Language Corpus of the Domain Weather Report
All systems for automatic sign language translation and recognition, in particular statistical systems, rely on adequately sized corpora. For this purpose, we created the Phoenix corpus that is based on German television weather reports translated into German Sign Language. It comes with a rich annotation of the video data, a bilingual text-based sentence corpus and a monolingual German corpus.
متن کاملWillingness to Communicate (WTC) among Beginning-level German Learners: Teaching German as a Foreign Language in a U.S. University Classroom
This action research examines the concept of Willingness to Communicate (WTC) in a second language acquisition context. The researcher investigated the contributors of WTC in a foreign language classroom setting. Therefore, a multiple assignments method and sequence was applied. Participants of this study were students who matriculated in a United States (U.S.) undergraduate program, studying G...
متن کاملRWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus
This paper introduces the RWTH-PHOENIX-Weather corpus, a video-based, large vocabulary corpus of German Sign Language suitable for statistical sign language recognition and translation. In contrast to most available sign language data collections, the RWTH-PHOENIX-Weather corpus has not been recorded for linguistic research but for the use in statistical pattern recognition. The corpus contains...
متن کاملThe ATIS Sign Language Corpus
Systems that automatically process sign language rely on appropriate data. We therefore present the ATIS sign language corpus that is based on the domain of air travel information. It is available for five languages, English, German, Irish sign language, German sign language and South African sign language. The corpus can be used for different tasks like automatic statistical translation and au...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016