نتایج جستجو برای: captioning order
تعداد نتایج: 908879 فیلتر نتایج به سال:
Image captioning is a challenging computer vision task, which aims to generate natural language description of an image. Most recent researches follow the encoder-decoder framework depends heavily on previous generated words for current prediction. Such methods can not effectively take advantage future predicted information learn complete semantics. In this paper, we propose Context-Aware Auxil...
The image captioning task has attracted great attention from many researchers, and significant progress been made in the past few years. Existing models, which mainly apply attention-based encoder-decoder architecture, achieve developments captioning. These however, are limited caption generation due to potential errors resulting inaccurate detection of objects incorrect objects. To alleviate l...
We integrated a practical digital video database system based on language and image analysis with components from digital video processing, still image search, information retrieval, dosed captioning processing. The attempt is to utilize the multiple modalities of information in video and implement data fusion among the multiple modalities. Keyframes are extracted to represent shots based on vi...
In this paper, the application of LVCSR (Large Vocabulary Continuous Speech Recognition) technology is investigated for real-time, resource-limited broadcast close captioning. The work focuses on transcribing live broadcast conversation speech to make such programs accessible to deaf viewers. Due to computational limitations, real time factor (RTF) and memory requirements are kept low during de...
While end-to-end neural machine translation (NMT) has achieved notable success in the past years in translating a handful of resource-rich language pairs, it still suffers from the data scarcity problem for low-resource language pairs and domains. To tackle this problem, we propose an interactive multimodal framework for zero-resource neural machine translation. Instead of being passively expos...
Human behavior understanding is arguably one of the most important mid-level components in artificial intelligence. In order to efficiently make use of data, multi-task learning has been studied in diverse computer vision tasks including human behavior understanding. However, multitask learning relies on task specific datasets and constructing such datasets can be cumbersome. It requires huge a...
In this paper we present a Travel Blog Assistant System that facilitates the travel blog writing by automatically selecting for each blog paragraph written by the user the most relevant images from an uploaded image set. In order to do this, the system first automatically adds metadata to the traveler’s photos based both on a Generic Visual Categorizer (visual keywords) and by exploiting cross-...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید