imputation

Genetic and population analysis minimac2: faster genotype imputation

2015

Christian Fuchsberger Gonçalo R. Abecasis David A. Hinds Jeffrey Barrett

Summary: Genotype imputation is a key step in the analysis of genome-wide association studies. Upcoming very large reference panels, such as those from The 1000 Genomes Project and the Haplotype Consortium, will improve imputation quality of rare and less common variants, but will also increase the computational burden. Here, we demonstrate how the application of software engineering techniques...

متن کامل

Partial imputation to improve predictive modelling in insurance risk classification using a hybrid positive selection algorithm and correlation-based feature selection

2012

Mlungisi Duma Bhekisipho Twala Fulufhelo V. Nelwamondo Tshilidzi Marwala

We propose a hybrid missing data imputation technique using positive selection and correlation-based feature selection for insurance data. The hybrid is used to help supervised learning methods improve their classification accuracy and resilience in the presence of increasing missing data. The positive selection algorithm searches for potential candidates for imputation and the correlation-base...

متن کامل

Op-cbio130491 2744..2749

2013

Qing Duan Eric Yi Liu Paul L. Auer Guosheng Zhang Ethan M. Lange Chris Bizon Shuo Jiao Steven Buyske Nora Franceschini Chris S. Carlson Li Hsu Alex P. Reiner Ulrike Peters Jeffrey Haessler Keith Curtis Christina L. Wassel Jennifer G. Robinson Lisa W. Martin Christopher A. Haiman Loic Le Marchand Tara C. Matise Lucia A. Hindorff Dana C. Crawford Themistocles L. Assimes Hyun Min Kang Gerardo Heiss Rebecca D. Jackson Charles Kooperberg James G. Wilson Gonçalo R. Abecasis Kari E. North Deborah A. Nickerson Leslie A. Lange Yun Li Jeffrey Barrett

Summary: Although the 1000 Genomes haplotypes are the most commonly used reference panel for imputation, medical sequencing projects are generating large alternate sets of sequenced samples. Imputation in African Americans using 3384 haplotypes from the Exome Sequencing Project, compared with 2184 haplotypes from 1000 Genomes Project, increased effective sample size by 8.3–11.4% for coding vari...

متن کامل

Genotype imputation via matrix completion.

Journal: :Genome research 2013

Eric C Chi Hua Zhou Gary K Chen Diego Ortega Del Vecchyo Kenneth Lange

Most current genotype imputation methods are model-based and computationally intensive, taking days to impute one chromosome pair on 1000 people. We describe an efficient genotype imputation method based on matrix completion. Our matrix completion method is implemented in MATLAB and tested on real data from HapMap 3, simulated pedigree data, and simulated low-coverage sequencing data derived fr...

متن کامل

Impact of Missing Value Imputation on Classification for DNA Microarray Gene Expression Data—A Model-Based Study

2009

Youting Sun Ulisses Braga-Neto Edward R. Dougherty

Many missing-value (MV) imputation methods have been developed for microarray data, but only a few studies have investigated the relationship between MV imputation and classification accuracy. Furthermore, these studies are problematic in fundamental steps such as MV generation and classifier error estimation. In this work, we carry out a model-based study that addresses some of the issues in p...

متن کامل

Statistical Inference for Hardy-Weinberg Proportions in the Presence of Missing Genotype Information

2013

Jan Graffelman Milagros Sánchez Samantha Cook Victor Moreno

In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for H...

متن کامل

M Effect of Missing Value Methods on Bayesian Network Classification of Hepatitis Data

2013

Nazziwa Aisha Mohd Bakri Adam Shamarina Shohaimi

Missing value imputation methods are widely used in solving missing value problems during statistical analysis. For classification tasks, these imputation methods can affect the accuracy of the Bayesian network classifiers. This paper study’s the effect of missing value treatment on the prediction accuracy of four Bayesian network classifiers used to predict death in acute chronic Hepatitis pat...

متن کامل

Multiple imputation methods for bivariate outcomes in cluster randomised trials

2016

K DiazOrdaz M G Kenward M Gomes R Grieve

Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that i...

متن کامل

Evolving imputation strategies for missing data in classification problems with TPOT

Journal: :CoRR 2017

Unai Garciarena Roberto Santana Alexander Mendiburu

Missing data has a ubiquitous presence in real-life applications of machine learning techniques. Imputation methods are algorithms conceived for restoring missing values in the data, based on other entries in the database. The choice of the imputation method has an influence on the performance of the machine learning technique, e.g., it influences the accuracy of the classification algorithm ap...

متن کامل

Comparison of imputation and modelling methods in the analysis of a physical activity trial with missing outcomes.

Journal: :International journal of epidemiology 2005

Angela M Wood Ian R White Melvyn Hillsdon James Carpenter

BACKGROUND Longitudinal studies almost always have some individuals with missing outcomes. Inappropriate handling of the missing data in the analysis can result in misleading conclusions. Here we review a wide range of methods to handle missing outcomes in single and repeated measures data and discuss which methods are most appropriate. METHODS Using data from a randomized controlled trial to...

متن کامل