Supplementary material: Spatio-temporal Person Retrieval via Natural Language Queries
نویسندگان
چکیده
In this section, we provide the further details of the dataset statistics. The description length. We first analyze the description length (i.e., the number of words in a description). Figure 1 shows the distribution of the number of words in a description. We can see that our dataset contains various lengths of descriptions. The average length of descriptions in our dataset is 13.1. We also show the comparison of the average description length of our dataset to those of other datasets in Table 1. ReferIt [7] and Google RefExp [12] are the datasets of referring expressions, each of which is true of only a single region in an image. The descriptions in VisualGenome [10], MSR-VTT [20] and MSCOCO [2] focus on regions in images, whole images and videos, respectively. Even though a description in our dataset focuses on a single person, the average description length of our dataset is larger than not only those of the datasets of which descriptions focus on regions in images, but also those of the datasets of which descriptions focus on the whole images or videos. This implies that the descriptions in our dataset tend to contain more detailed information than those in other datasets. The number of annotated people in a clip. Figure 2 shows the distribution of the number of people who are annotated with bounding boxes and descriptions in a single clip. While many clips contain only one annotated person, some clips contain multiple annotated people. The number of occurrences of each high-frequency word. Figure 4 shows the number of occurrences of the most frequently occurring words (Stop words are excluded). We can see that high-frequency words involve various types of words such as colors, actions, clothes and places. Figure 5 shows the comparison of frequencies of words in Figure 4 between our dataset and VisualGenome. While the frequencies of words describing colors (e.g. black, white, blue and red) and people (e.g. man, woman, girl and boy) in our dataset are close to those in VisualGenome, the 10 15 20 25 30 35
منابع مشابه
Contextual Media Retrieval Using Natural Language Queries
The 21st century has seen a rapid increase in the abundance of mobile devices with cameras. This, along with the evolution of digital photography and the internet, has presented mankind with a virtual mine of media content. The increasing number of images and videos rich with metadata (timestamps, GPS location, camera orientation etc.) has the potential to act as a collective memory dispersed i...
متن کاملSpatio-Temporal Querying of Video Content Using SQL for Quantizable Video Databases
Multimedia database modeling and representation play an important role for efficient storage and retrieval of multimedia. Modeling of semantic video content that enables spatiotemporal queries is one of the challenging tasks. A video is called as “quantizable” if the instants of a video are enough for a person to imagine the missing scenes properly. A semantic query for quantizable videos can b...
متن کاملNatural Language Interface on a Video Data Model
Depending on a content-based spatio-temporal video data model, a natural language interface is implemented to query the video data. The queries, which are given as English sentences, are parsed using Link Parser, and the semantic representations of given queries are extracted from their syntactic structures using information extraction techniques. At the last step, the extracted semantic repres...
متن کاملVideo Question Answering via Hierarchical Spatio-Temporal Attention Networks
Open-ended video question answering is a challenging problem in visual information retrieval, which automatically generates the natural language answer from the referenced video content according to the question. However, the existing visual question answering works only focus on the static image, which may be ineffectively applied to video question answering due to the lack of modeling the tem...
متن کاملQuery-by-Gaming: Interactive spatio-temporal querying and retrieval using gaming controller
Spatio-temporal querying and retrieval is a challenging task due to the lack of simple user interfaces for building queries despite the availability of powerful indexing structures and querying languages. In this paper, we propose Query-by-Gaming scheme for spatio-temporal querying that can benefit from gaming controller for building queries. By using Query-by-Gaming, we introduce our spatio-te...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017