Genetics Algorithm Feature Selection for Improving Aqueous Solubility Prediction
نویسندگان
چکیده
Aqueous solubility is an important property for conducting chemical reactions of the compound. In this research, we develop several machine learning models predicting aqueous reaction molecules. The open public dataset, AqSolDB, was used model development which contains 9982 data on molecule solubility. Several regression were trained dataset and their performance evaluated using mean absolute error. use model-based tree development. result showed that best prediction Categoric Boosting Regressor achieving 0.854 importance feature affected can also be calculated from calculation. It shown variable MolLogP strongly correlated with reaction. To further improve our model, selected features a genetics algorithm learning-based models. lowest error obtained 0.771 provides improvement previous calculation without selection.
منابع مشابه
improving short-term wind power prediction with neural network and ica algorithm and input feature selection
according to this fact that wind is now a part of global energy portfolio and due to unreliable and discontinuous production of wind energy; prediction of wind power value is proposed as a main necessity. in recent years, various methods have been proposed for wind power prediction. in this paper the prediction structure involves feature selection and use of artificial neural network (ann). in ...
متن کاملFast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملFeature Selection in Data-Mining for Genetics Using Genetic Algorithm
We discovered genetic features and environmental factors which were involved in multifactorial diseases. To exploit the massive data obtained from the experiments conducted at the General Hospital, Chennai, data mining tools were required and we proposed a 2-Phase approach using a specific genetic algorithm. This heuristic approach had been chosen as the number of features to consider was large...
متن کاملFeature Selection Methods for Improving Protein Structure Prediction with Rosetta
Rosetta is one of the leading algorithms for protein structure prediction today. It is a Monte Carlo energy minimization method requiring many random restarts to find structures with low energy. In this paper we present a resampling technique for structure prediction of small alpha/beta proteins using Rosetta. From an initial round of Rosetta sampling, we learn properties of the energy landscap...
متن کاملA Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection
K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algori...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of physics
سال: 2022
ISSN: ['0022-3700', '1747-3721', '0368-3508', '1747-3713']
DOI: https://doi.org/10.1088/1742-6596/2377/1/012016