Grained Classification and Captioning Tasks

نویسندگان

Raghav Goyal

Farzaneh Mahdisoltani

Guillaume Berger

Waseem Gharbieh

Ingo Bax

Roland Memisevic

چکیده

Understanding concepts in the world remains one of the well-sought endeavours of ML. Whereas ImageNet enabled success in object recognition and various related tasks via transfer learning, the ability to understand physical concepts prevalent in the world still remains an unattained, yet desirable, goal. Video as a vision modality encodes how objects change across time with respect to pose, position, distance of observer, etc.; and has therefore been researched extensively as a data domain and for studying “common sense” physical concepts of objects.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning

Modern neural image captioning systems typically adopt the encoder-decoder framework consisting of two principal components: a convolutional neural network (CNN) for image feature extraction and a recurrent neural network (RNN) for caption generation. Inspired by the robustness analysis of CNN-based image classifiers to adversarial perturbations, we propose Show-and-Fool, a novel algorithm for ...

متن کامل

Video Captioning via Hierarchical Reinforcement Learning

Video captioning is the task of automatically generating a textual description of the actions in a video. Although previous work (e.g. sequence-to-sequence model) has shown promising results in abstracting a coarse description of a short video, it is still very challenging to caption a video containing multiple fine-grained actions with a detailed description. This paper aims to address the cha...

متن کامل

Image Representations and New Domains in Neural Image Captioning

We examine the possibility that recent promising results in automatic caption generation are due primarily to language models. By varying image representation quality produced by a convolutional neural network, we find that a state-of-theart neural captioning algorithm is able to produce quality captions even when provided with surprisingly poor image representations. We replicate this result i...

متن کامل

Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization

We propose a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent. Our approach – Gradient-weighted Class Activation Mapping (Grad-CAM), uses the gradients of any target concept (say logits for ‘dog’ or even a caption), flowing into the final convolutional layer to produce a coarse localiz...

متن کامل

Seeing with Humans: Gaze-Assisted Neural Image Captioning

Gaze reflects how humans process visual scenes and is therefore increasingly used in computer vision systems. Previous works demonstrated the potential of gaze for object-centric tasks, such as object localization and recognition, but it remains unclear if gaze can also be beneficial for scene-centric tasks, such as image captioning. We present a new perspective on gaze-assisted image captionin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

Grained Classification and Captioning Tasks

نویسندگان

چکیده

منابع مشابه

Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning

Video Captioning via Hierarchical Reinforcement Learning

Image Representations and New Domains in Neural Image Captioning

Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization

Seeing with Humans: Gaze-Assisted Neural Image Captioning

عنوان ژورنال:

اشتراک گذاری