Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics

نویسندگان

  • Douwe Kiela
  • Léon Bottou
چکیده

We construct multi-modal concept representations by concatenating a skip-gram linguistic representation vector with a visual concept representation vector computed using the feature extraction layers of a deep convolutional neural network (CNN) trained on a large labeled object recognition dataset. This transfer learning approach brings a clear performance gain over features based on the traditional bag-of-visual-word approach. Experimental results are reported on the WordSim353 and MEN semantic relatedness evaluation tasks. We use visual features computed using either ImageNet or ESP Game images.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cystoscopy Image Classication Using Deep Convolutional Neural Networks

In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...

متن کامل

A multi-scale convolutional neural network for automatic cloud and cloud shadow detection from Gaofen-1 images

The reconstruction of the information contaminated by cloud and cloud shadow is an important step in pre-processing of high-resolution satellite images. The cloud and cloud shadow automatic segmentation could be the first step in the process of reconstructing the information contaminated by cloud and cloud shadow. This stage is a remarkable challenge due to the relatively inefficient performanc...

متن کامل

Learning Neural Audio Embeddings for Grounding Semantics in Auditory Perception

Multi-modal semantics, which aims to ground semantic representations in perception, has relied on feature norms or raw image data for perceptual input. In this paper we examine grounding semantic representations in raw auditory data, using standard evaluations for multi-modal semantics. After having shown the quality of such auditorily grounded representations, we show how they can be applied t...

متن کامل

Deep embodiment: grounding semantics in perceptual modalities

Multi-modal distributional semantic models address the fact that text-based semantic models, which represent word meanings as a distribution over other words, suffer from the grounding problem. This thesis advances the field of multi-modal semantics in two directions. First, it shows that transferred convolutional neural network representations outperform the traditional bag of visual words met...

متن کامل

Multi-Neighborhood Convolutional Networks

We explore the role of scale for improved feature learning in convolutional networks. We propose multi-neighborhood convolutional networks, designed to learn image features at different levels of detail. Utilizing nonlinear scale-space models, the proposed multineighborhood model can effectively capture fine-scale image characteristics (i.e., appearance) using a small-size neighborhood, while c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014