dataset

----A Versatile Polyphonic Music Dataset

2012

Zhiyao Duan Bryan Pardo

This is a polyphonic music dataset which can be used for versatile research problems, such as Multi-pitch Estimation and Tracking, Audio-score Alignment, Source Separation, etc. This dataset consists of the audio recordings of each part and the ensemble of ten pieces of four-part J.S. Bach chorales, as well as their MIDI scores, the ground-truth alignment between the audio and the score, the gr...

متن کامل

FCVID: Fudan-Columbia Video Dataset

2016

Yu-Gang Jiang Zuxuan Wu Jun Wang Xiangyang Xue Shih-Fu Chang

Recognizing visual contents in unconstrained videos has become a very important problem for many applications, such as Web video search and recommendation, smart content-aware advertising, robotics, etc. Existing datasets for video content recognition are either small or do not have reliable manual labels. In this work, we construct and release a new Internet video dataset called Fudan-Columbia...

متن کامل

The Berkeley 3D Object Dataset

2012

Allison Janoch Trevor Darrell Pieter Abbeel Jitendra Malik

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.

متن کامل

SDSS Dataset and SkyServer Workloads

2000

Ioan Raicu

From what I see in Figure 1, of the 1.3 million files in DR4, only 350K some files have any objects in them. What is also interesting is that the objects are not uniformly distributed over the data files, but that there is a normal-like distribution to the number of files with low/medium/high number of objects per file. Comment [IR1]: Are the 320M objects I have from DR4 representative of all t...

متن کامل

30Music Listening and Playlists Dataset

2015

Roberto Turrin Massimo Quadrana Andrea Condorelli Roberto Pagano Paolo Cremonesi

We introduce the 30Music dataset, a collection of listening and playlists data retrieved from Internet radio stations through Last.fm API. In this paper we describe the creation process, its content, and its possible uses. Attractive features of the 30Music dataset that differentiate it from existing public datasets include, among the others, (i) the user listening sessions complete of contextu...

متن کامل

Research on Microarray Dataset Mining

2010

Miao Wang Xuequn Shang Zhanhuai Li

With the rapid progress of bio-techniques of post genomic era, more and more bio-information needs to be analyzed. Using microarray data can reveal the structure of the transcriptional gene regulation processes. In this paper, we give an overview of recent research work of microarray data mining. We also introduce several woks which are developed, developing at present and future work, which ar...

متن کامل

Columbia MVSO Image Sentiment Dataset

Journal: :CoRR 2016

Vaidehi Dalmia Hongyi Liu Shih-Fu Chang

The Multilingual Visual Sentiment Ontology (MVSO) consists of 15,600 concepts in 12 different languages that are strongly related to emotions and sentiments expressed in images. These concepts are defined in the form of Adjective-Noun Pair (ANP), which are crawled and discovered from online image forum Flickr. In this work, we used Amazon Mechanical Turk as a crowd-sourcing platform to collect ...

متن کامل

The KIT Motion-Language Dataset

Journal: :Big data 2016

Matthias Plappert Christian Mandery Tamim Asfour

Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, although there have been years of research in this area, no standardized and openly available data set exists to support the development and evaluation of such systems. We, th...

متن کامل

Accuracy Estimation With Clustered Dataset

2006

Ricco Rakotomalala Jean-Hugues Chauchat François Pellegrino

If the dataset available to machine learning results from cluster sampling (e.g. patients from a sample of hospital wards), the usual cross-validation error rate estimate can lead to biased and misleading results. An adapted cross-validation is described for this case. Using a simulation, the sampling distribution of the generalization error rate estimate, under cluster or simple random samplin...

متن کامل

The Music Listening Histories Dataset

2017

Gabriel Vigliensoni Ichiro Fujinaga

We introduce the Music Listening Histories Dataset (MLHD), a large-scale collection of music listening events assembled from more than 27 billion time-stamped logs extracted from Last.fm. The logs are organized in the form of listening histories per user, and have been conveniently preprocessed and cleaned. Attractive features of the MLHD are the self-declared metadata provided by users at the ...

متن کامل