subtitles

Semantic Video Classification Based on Subtitles and Domain Terminologies

2007

Polyxeni Katsiouli Vassileios Tsetsos Stathes Hadjiefthymiades

In this paper we explore an unsupervised approach to classify video content by analyzing the corresponding subtitles. The proposed method is based on the WordNet lexical database and the WordNet domains and applies natural language processing techniques on video subtitles. The method is divided into several steps. The first step includes subtitle text preprocessing. During the next steps, a key...

متن کامل

Machine Translation of TV Subtitles for Large Scale Production

2010

Martin Volk Rico Sennrich Christian Hardmeier Frida Tidström

This paper describes our work on building and employing Statistical Machine Translation systems for TV subtitles in Scandinavia. We have built translation systems for Danish, English, Norwegian and Swedish. They are used in daily subtitle production and translate large volumes. As an example we report on our evaluation results for three TV genres. We discuss our lessons learned in the system de...

متن کامل

Wildlife recognition in nature documentaries with weak supervision from subtitles and external data

Journal: :Pattern Recognition Letters 2016

Aparna Nurani Venkitasubramanian Tinne Tuytelaars Marie-Francine Moens

We propose a weakly supervised framework for domain adaptation in a multi-modal context for multi-label classification. This framework is applied to annotate objects such as animals in a target video with subtitles, in the absence of visual demarcators. We start from classifiers trained on external data (the source, in our setting ImageNet), and iteratively adapt them to the target dataset usin...

متن کامل

Improving speech recognition and keyword search for low resource languages using web data

2015

Gideon Mendels Erica Cooper Victor Soto Julia Hirschberg Mark J. F. Gales Kate Knill Anton Ragni Haipeng Wang

We describe the use of text data scraped from the web to augment language models for Automatic Speech Recognition and Keyword Search for Low Resource Languages. We scrape text from multiple genres including blogs, online news, translated TED talks, and subtitles. Using linearly interpolated language models, we find that blogs and movie subtitles are more relevant for language modeling of conver...

متن کامل

Overlay Text Detection in Complex Video Background

2010

Chun-Cheng Lin Bo-Min Yen

The subtitles on video are very useful for us to understand video's contents. If we can extract text information from the subtitles, it will be very helpful to establish a database of video’s content that includes annotations and indexes. To extract text from videos, most text detection and extraction methods use the text color, background contrast, and texture information. However, the limitat...

متن کامل

Luke, I am Your Father: Dealing with Out-of-Domain Requests by Using Movies Subtitles

2014

David Ameixa Luísa Coheur Pedro Fialho Paulo Quaresma

Even when the role of a conversational agent is well known users persist in confronting them with Out-of-Domain input. This often results in inappropriate feedback, leaving the user unsatisfied. In this paper we explore the automatic creation/enrichment of conversational agents’ knowledge bases by taking advantage of natural language interactions present in the Web, such as movies subtitles. Th...

متن کامل

The AMARA Corpus: Building Parallel Language Resources for the Educational Domain

2014

Ahmed Abdelali Francisco Guzmán Hassan Sajjad Stephan Vogel

This paper presents the AMARA corpus of on-line educational content: a new parallel corpus of educational video subtitles, multilingually aligned for 20 languages, i.e. 20 monolingual corpora and 190 parallel corpora. This corpus includes both resource-rich languages such as English and Arabic, and resource-poor languages such as Hindi and Thai. In this paper, we describe the gathering, validat...

متن کامل

Online Presentations with PowerPoint Present Live Real-Time Automated Captions and Subtitles: Perceptions of Faculty and Administrators

Journal: :Online learning 2022

Captioning of recorded videos is beneficial to many and a matter compliance with accessibility regulations guidelines. Like captions, real-time captions can also be means implement the Universal Design for Learning checkpoint offer text-based alternatives auditory information. A cost-effective solution live online presentations use speech recognition technologies generate automated captions. In...

متن کامل

Dialogue Act Recognition for Text-based Sinhala

2015

Sudheera Palihakkara Dammina Sahabandu Ahsan Shamsudeen Chamika Bandara Surangika Ranathunga

This paper discusses the application of classical machine learning approaches to the task of Dialogue Act Recognition for text-based Sinhala. A study was carried out to identify a dialogue act tag set for Sinhala. A new corpus using Sinhala subtitles for English movies was created and was annotated with the selected dialogue acts. Evaluation of the dialogue act recognition system was performed ...

متن کامل

Large-scale Learning of Sign Language by Watching TV (Using Co-occurrences)

2013

Tomas Pfister James Charles Andrew Zisserman

We present a framework that automatically and quickly learns a large number of signs from sign language-interpreted TV broadcasts by exploiting supervisory information available in the subtitles. Our contributions are: (i) we show that, somewhat counter-intuitively, mouth patterns are highly informative for distinguishing words in a language for the Deaf, and their co-occurrence with signing ca...

متن کامل