Deep Motif: Visualizing Genomic Sequence Classifications
نویسندگان
چکیده
This paper applies a deep convolutional/highway MLP framework to classify genomic sequences on the transcription factor binding site task. To make the model understandable, we propose an optimization driven strategy to extract “motifs”, or symbolic patterns which visualize the positive class learned by the network. We show that our system, Deep Motif (DeMo), extracts motifs that are similar to, and in some cases outperform the current well known motifs. In addition, we find that a deeper model consisting of multiple convolutional and highway layers can outperform a single convolutional and fully connected layer in the previous state-of-the-art.1
منابع مشابه
Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks
Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) ...
متن کاملGenetic Architect: Discovering Genomic Structure with Learned Neural Architectures
Each human genome is a 3 billion base pair set of encoding instructions. Decoding the genome using deep learning fundamentally differs from most tasks, as we do not know the full structure of the data and therefore cannot design architectures to suit it. As such, architectures that fit the structure of genomics should be learned not prescribed. Here, we develop a novel search algorithm, applica...
متن کاملFunctional motifs in Escherichia coli NC101
Escherichia coli (E. coli) bacteria can damage DNA of the gut lining cells and may encourage the development of colon cancer according to recent reports. Genetic switches are specific sequence motifs and many of them are drug targets. It is interesting to know motifs and their location in sequences. At the present study, Gibbs sampler algorithm was used in order to predict and find functional m...
متن کاملPrototype Matching Networks for Large-Scale Multi-label Genomic Sequence Classification
One of the fundamental tasks in understanding genomics is the problem of predicting Transcription Factor Binding Sites (TFBSs). With more than hundreds of Transcription Factors (TFs) as labels, genomic-sequence based TFBS prediction is a challenging multi-label classification task. There are two major biological mechanisms for TF binding: (1) sequence-specific binding patterns on genomes known ...
متن کاملCloning and molecular characterization of TaERF6, a gene encoding a bread wheat ethylene response factor
Ethylene response factor proteins are important for regulating gene expression under different stresses. Different isoforms for ERF have previously isolated from bread wheat (Triticum aestivum L.) and related genera and called from TaERF1 to TaERF5. We isolated, cloned and molecular characterized a novel one based on TdERF1, an isoform in durum wheat (Tri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1605.01133 شماره
صفحات -
تاریخ انتشار 2016