Predicting phenotypes from high-dimensional genomes using gradient boosting decision trees

نویسندگان

چکیده

xsxsGenomic selection (GS) is an emerging technique for predicting unknown phenotypes using genome-wide marker coverage, allowing the use of efficient computational models to select individuals with high phenotypic values as candidate breeding populations. However, GS remains challenging inefficient crop due limited size training populations, nature genotype-environment interactions, and complex interaction patterns between molecular markers. In this study, we ensemble learning algorithms construct gradient boosted decision tree (GBDT) achieve prediction from genotypic We trained GBDT wheat dataset compared predictive performance six other widely used models. The mean normalized discounted cumulative gain (MNDCG) method was evaluate ability each model values. results study show that: (1) Bayesian converge reach a steady-state only when sufficient number iterations are set. As increases, accuracy but efficiency decreases significantly. When 200,000 performed, five similar converges smooth state, their 7.60% better than overall, 70 times that model. (2) Overall, overall RRBLUP best, some traits, still had higher (3) influenced by subset markers, markers models, so reasonable genetic data appropriate could improve

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees

Understanding the relationship between short-term subway ridership and its influential factors is crucial to improving the accuracy of short-term subway ridership prediction. Although there has been a growing body of studies on short-term ridership prediction approaches, limited effort is made to investigate the short-term subway ridership prediction considering bus transfer activities and temp...

متن کامل

Gradient Boosted Decision Trees for High Dimensional Sparse Output

In this paper, we study the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse. For example, in multilabel classification, the output space is a L-dimensional 0/1 vector, where L is number of labels that can grow to millions and beyond in many modern applications. We show that vanilla GBDT can easily run out of memory or encounter near-forever running ti...

متن کامل

Boosting Lazy Decision Trees

This paper explores the problem of how to construct lazy decision tree ensembles. We present and empirically evaluate a relevancebased boosting-style algorithm that builds a lazy decision tree ensemble customized for each test instance. From the experimental results, we conclude that our boosting-style algorithm significantly improves the performance of the base learner. An empirical comparison...

متن کامل

Boosting Decision Trees

A new boosting algorithm of Freund and Schapire is used to improve the performance of decision trees which are constructed usin: the information ratio criterion of Quinlan’s C4.5 algorithm. This boosting algorithm iteratively constructs a series of decision tress, each decision tree being trained and pruned on examples that have been filtered by previously trained trees. Examples that have been...

متن کامل

Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree

Nowadays a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. In the present work, we propose a method for predicting protein interact...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2022

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2022.3171341