Comparison of dimensionality reduction and clustering methods for SARS-CoV-2 genome

نویسندگان

چکیده

This paper aims to conduct an analysis of the SARS-CoV-2 genome variation was carried out by comparing results clustering using several algorithms and distribution sequence in each cluster. The used are K-means, Gaussian mixture models, agglomerative hierarchical clustering, mean-shift DBSCAN. However, algorithm has a weakness grouping data that very high dimensions such as data, so dimensional reduction process is needed. In this research, dimensionality principal component (PCA) autoencoder method with three models produce 2, 10, 50 features. main contributions achieved were scheme performance experiment on hyper parameters for method. Based experiments conducted, PCA DBSCAN achieve highest silhouette score 0.8770 clusters when two need more iterations converge. On testing Indonesian than half them enter one cluster rest distributed other clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Serology Testing for SARS-CoV-2: Benefits and Challenges

As COVID-19 was declared as a pandemic by the World Health Organization (WHO) in March 2020, it is an emerging need to discuss different aspects of this pandemic. In any pandemic, valid and rapid laboratory diagnostic tests are critically important for early diagnosis, which will increase the rate of successful treatment and more importantly prevent the spread of the disease.  

متن کامل

Genome Organization of the SARS-CoV

Annotation of the genome sequence of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) is indispensable to understand its evolution and pathogenesis. We have performed a full annotation of the SARS-CoV genome sequences by using annotation programs publicly available or developed by ourselves. Totally, 21 open reading frames (ORFs) of genes or putative uncharacterized prote...

متن کامل

Using Dimensionality Reduction Methods in Text Clustering

High dimensionality of the feature space is one of the major concerns owing to computational complexity and accuracy consideration in the text clustering. Therefore, various dimension reduction methods have been introduced in the literature to select an informative subset (or sub list) of features. As each dimension reduction method uses a different strategy (aspect) to select a subset of featu...

متن کامل

SARS-CoV Genome Polymorphism: A Bioinformatics Study

A dataset of 103 SARS-CoV isolates (101 human patients and 2 palm civets) was investigated on different aspects of genome polymorphism and isolate classification. The number and the distribution of single nucleotide variations (SNVs) and insertions and deletions, with respect to a "profile", were determined and discussed ("profile" being a sequence containing the most represented letter per pos...

متن کامل

The Comparison of Susceptibility to SARS-CoV-2 Infection between Pediatric and Adults

SARS-CoV-2 causes coronavirus disease 2019 (COVID-19) and is responsible for the recent pandemic in the world. It has been recently recognized as a challenge for public health and a significant cause of severe illness in all age groups. Young children and older people are susceptible to SARS-CoV-2 infection. However, children usually present mild symptoms compared to adult patients. The relatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Bulletin of Electrical Engineering and Informatics

سال: 2021

ISSN: ['2302-9285']

DOI: https://doi.org/10.11591/eei.v10i4.2803