Extracting Visual Patterns from Deep Learning Representations
نویسندگان
چکیده
Vector-space word representations based on neural network models can include linguistic regularities, enabling semantic operations based on vector arithmetic. In this paper, we explore an analogous approach applied to images. We define a methodology to obtain large and sparse vectors from individual images and image classes, by using a pre-trained model of the GoogLeNet architecture. We evaluate the vector-space after processing 20,000 ImageNet images, and find it to be highly correlated with WordNet lexical distances. Further exploration of image representations shows how semantically similar elements are clustered in that space, regardless of large visual variances (e.g., 118 kinds of dogs), and how the space distinguishes abstract classes of objects without supervision (e.g., living things from non-living things). Finally, we consider vector arithmetic, and find them to be related with image concatenation (e.g., “horse cart horse ' rickshaw”), image overlap (“Panda Brown bear ' Skunk”) and regularities (“Panda is to Brown bear as Skunk is to Badger”). All these results indicate that visual semantics contain a large amount of general information, and that those semantics can be extracted as vector representations from neural network models, making them available for further learning and reasoning.
منابع مشابه
Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision
Scene labeling is a challenging classification problem where each input image requires a pixel-level prediction map. Recently, deep-learning-based methods have shown their effectiveness on solving this problem. However, we argue that the large intra-class variation provides ambiguous training information and hinders the deep models’ ability to learn more discriminative deep feature representati...
متن کاملLearning Deep Generative Models
Building intelligent systems that are capable of extracting high-level representations from high-dimensional sensory data lies at the core of solving many artificial intelligence–related tasks, including object recognition, speech perception, and language understanding. Theoretical and biological arguments strongly suggest that building such systems requires models with deep architectures that ...
متن کاملTowards Deep Interpretability (mus-rover Ii): Learning Hierarchical Representations of Tonal Music
Music theory studies the regularity of patterns in music to capture concepts underlying music styles and composers’ decisions. This paper continues the study of building automatic theorists (rovers) to learn and represent music concepts that lead to human interpretable knowledge and further lead to materials for educating people. Our previous work took a first step in algorithmic concept learni...
متن کاملA Deep Learning Architecture for Image Representation, Visual Interpretability and Automated Basal-Cell Carcinoma Cancer Detection
This paper presents and evaluates a deep learning architecture for automated basal cell carcinoma cancer detection that integrates (1) image representation learning, (2) image classification and (3) result interpretability. A novel characteristic of this approach is that it extends the deep learning architecture to also include an interpretable layer that highlights the visual patterns that con...
متن کاملExtracting Visual Knowledge from the Web with Multimodal Learning
We consider the problem of automatically extracting visual objects from web images. Despite the extraordinary advancement in deep learning, visual object detection remains a challenging task. To overcome the deficiency of pure visual techniques, we propose to make use of meta text surrounding images on the Web for enhanced detection accuracy. In this paper we present a multimodal learning algor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1507.08818 شماره
صفحات -
تاریخ انتشار 2015