dataset generation

Creating the Dataset for the Western Wind and Solar Integration Study (U.S.A.)

2008

Cameron W. Potter Debra Lew Jim McCaa Sam Cheng Scott Eichelberger Eric Grimit

The Western Wind and Solar Integration Study (WWSIS) is one of the world’s largest regional integration studies to date. This paper discusses the creation of the wind dataset that will be the basis for assessing the operating impacts and mitigation options due to the variability and uncertainty of wind power on the utility grids. The dataset is based on output from a mesoscale numerical weather...

متن کامل

DMDD: A Large-Scale Dataset for Dataset Mentions Detection

Journal: :Transactions of the Association for Computational Linguistics 2023

Abstract The recognition of dataset names is a critical task for automatic information extraction in scientific literature, enabling researchers to understand and identify research opportunities. However, existing corpora mention detection are limited size naming diversity. In this paper, we introduce the Dataset Mentions Detection (DMDD), largest publicly available corpus task. DMDD consists m...

متن کامل

Ranking of Classifiers based on Dataset Characteristics using Active Meta Learning

2013

Nikita Bhatt Amit Thakkar Amit Ganatra Nirav Bhatt

Classification is a machine learning technique which is used to categorize the different input patterns into different classes. To select the best classifier for a given dataset is one of the critical issues in Classification. Using cross-validation approach, it is possible to apply candidate algorithms on a given dataset and best classifier is selected by considering various evaluation measure...

متن کامل

Benchmarking short text semantic similarity

Journal: :IJIIDS 2010

James O'Shea Zuhair Bandar Keeley A. Crockett David McLean

Short Text Semantic Similarity measurement is a new and rapidly growing field of research. “Short texts” are typically sentence length but are not required to be grammatically correct. There is great potential for applying these measures in fields such as Information Retrieval, Dialogue Management and Question Answering. A dataset of 65 sentence pairs, with similarity ratings, produced in 2006 ...

متن کامل

Sequence to Sequence Model for Video Captioning

2017

Yu Guo Bowen Yao Yue Liu

Automatically generating video captions with natural language remains a challenge for both the field of nature language processing and computer vision. Recurrent Neural Networks (RNNs), which models sequence dynamics, has proved to be effective in visual interpretation. Based on a recent sequence to sequence model for video captioning, which is designed to learn the temporal structure of the se...

متن کامل

A CNN Based Scene Chinese Text Recognition Algorithm With Synthetic Data Engine

Journal: :CoRR 2016

Xiaohang Ren Kai Chen Jun Sun

Scene text recognition plays an important role in many computer vision applications. The small size of available public available scene text datasets is the main challenge when training a text recognition CNN model. In this paper, we propose a CNN based Chinese text recognition algorithm. To enlarge the dataset for training the CNN model, we design a synthetic data engine for Chinese scene char...

متن کامل

LUGS: A Scalable Non-parametric Data Synthesizer for Privacy-Preserving Health Data Publiction

2013

This paper introduces a non-parametric data synthesizing algorithm to generate privacysafe “realistic but not real” synthetic health data. The proposed algorithm synthesizes artificial records while preserving the statistical characteristics of the original data to the extent possible. The risk from “database linking attack” is quantified by an l-diversified data generation process. Moreover it...

متن کامل

Domain Adaptation for Neural Networks by Parameter Augmentation

2016

Yusuke Watanabe Kazuma Hashimoto Yoshimasa Tsuruoka

We propose a simple domain adaptation method for neural networks in a supervised setting. Supervised domain adaptation is a way of improving the generalization performance on the target domain by using the source domain dataset, assuming that both of the datasets are labeled. Recently, recurrent neural networks have been shown to be successful on a variety of NLP tasks such as caption generatio...

متن کامل

GENERATION OF A BENCHMARK DATASET USING HISTORICAL PHOTOGRAPHS FOR AN AUTOMATED EVALUATION OF DIFFERENT FEATURE MATCHING METHODS

Journal: :The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 2019

متن کامل

From Ontology for Genetic Interval (OGI) to Sequence Assembly - Ontology applying to next generation sequencing

2009

Yu Lin Hiroshi Tarui Peter Simons

We develop an OWL ontology: OGI (Ontology for Genetic Interval) for the formalization of the genomic elements by defining them as a Genetic Interval. Based on OGI’s definition of Genetic Interval Relations, which derived from the Allen interval calculus, we attempt to represent the relationships among contigs and sequence data from next generation sequencing. A real dataset generated from the b...

متن کامل