RepBin: Constraint-Based Graph Representation Learning for Metagenomic Binning

نویسندگان

چکیده

Mixed communities of organisms are found in many environments -- from the human gut to marine ecosystems and can have profound impact on health environment. Metagenomics studies genomic material such through high-throughput sequencing that yields DNA subsequences for subsequent analysis. A fundamental problem standard workflow, called binning, is discover clusters, subsequences, associated with constituent organisms. Inherent noise various biological constraints need be imposed them skewed cluster size distribution exacerbate difficulty this unsupervised learning problem. In paper, we present a new formulation using graph where nodes edges represent homophily information. addition, model providing heterophilous signal about cannot clustered together. We solve binning by developing algorithms (i) representation preserves both relations heterophily (ii) constraint-based clustering method addresses problems distribution. Extensive experiments, real synthetic datasets, demonstrate our approach, RepBin, outperforms wide variety competing methods. Our methods, may useful other domains as well, advance state-of-the-art metagenomics learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Metagenomic reads binning with spaced seeds

Article history: Received 23 February 2017 Received in revised form 16 May 2017 Accepted 21 May 2017 Available online xxxx

متن کامل

Graph-based Isometry Invariant Representation Learning

Learning transformation invariant representations of visual data is an important problem in computer vision. Deep convolutional networks have demonstrated remarkable results for image and video classification tasks. However, they have achieved only limited success in the classification of images that undergo geometric transformations. In this work we present a novel Transformation Invariant Gra...

متن کامل

Metagenomic binning through low density hashing

Bacterial microbiomes of incredible complexity are found throughout the world, from exotic marine locations to the soil in our yards to within our very guts. With recent advances in Next-Generation Sequencing (NGS) technologies, we have vastly greater quantities of microbial genome data, but the nature of environmental samples is such that DNA from different species are mixed together. Here, we...

متن کامل

A Framework for Generalizing Graph-based Representation Learning Methods

Random walks are at the heart of many existing deep learning algorithms for graph data. However, such algorithms have many limitations that arise from the use of random walks, e.g., the features resulting from these methods are unable to transfer to new nodes and graphs as they are tied to node identity. In this work, we introduce the notion of attributed random walks which serves as a basis fo...

متن کامل

Graph Representation Learning and Graph Classification

Many real-world problems are represented by using graphs. For example, given a graph of a chemical compound, we want do determine whether it causes a gene mutation or not. As another example, given a graph of a social network, we want to predict a potential friendship that does not exist but it is likely to appear soon. Many of these questions can be answered by using machine learning methods i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i4.20388