PairsDB atlas of protein sequence space

نویسندگان

  • Andreas Heger
  • Eija Korpelainen
  • Taavi Hupponen
  • Kimmo Mattila
  • Vesa Ollikainen
  • Liisa Holm
چکیده

Sequence similarity/database searching is a cornerstone of molecular biology. PairsDB is a database intended to make exploring protein sequences and their similarity relationships quick and easy. Behind PairsDB is a comprehensive collection of protein sequences and BLAST and PSI-BLAST alignments between them. Instead of running BLAST or PSI-BLAST individually on each request, results are retrieved instantaneously from a database of pre-computed alignments. Filtering options allow you to find a set of sequences satisfying a set of criteria-for example, all human proteins with solved structure and without transmembrane segments. PairsDB is continually updated and covers all sequences in Uniprot. The data is stored in a MySQL relational database. Data files will be made available for download at ftp://nic.funet.fi/pub/sci/molbio. PairsDB can also be accessed interactively at http://pairsdb.csc.fi. PairsDB data is a valuable platform to build various downstream automated analysis pipelines. For example, the graph of all-against-all similarity relationships is the starting point for clustering protein families, delineating domains, improving alignment accuracy by consistency measures, and defining orthologous genes. Moreover, query-anchored stacked sequence alignments, profiles and consensus sequences are useful in studies of sequence conservation patterns for clues about possible functional sites.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ADDA: a domain database with global coverage of the protein universe

We used the Automatic Domain Decomposition Algorithm (ADDA) to generate a database of protein domain families with complete coverage of all protein sequences. Sequences are split into domains and domains are grouped into protein domain families in a completely automated process. The current database contains domains for more than 1.5 million sequences in more than 40,000 domain families. In par...

متن کامل

Expression and Secretion of Human Granulocyte Macrophage-Colony Stimulating Factor Using Escherichia coli Enterotoxin I Signal Sequence

With the aim of the secretion of human granulocyte macrophage-colony stimulating factor (hGM-CSF) in Escherichia coli, hGM-CSF cDNA was fused in-frame next to the signal sequence of ST toxin (ST-I) of exteroxigenic E. coli, containing 53 or 19 amino acids of signal peptide. The fused STsig::hGM-CSF coding fragments were inserted into a T7-based expression plasmid. The recombinant plasmids were ...

متن کامل

طراحی و ساخت کلون بیان کننده داروی ضد انعقادی دسیرودین (هیرودین) به شکل خارج سلولی در اشرشیا کلی

Background and purpose: Hirudin is a 65-66 amino acids polypeptide which is secreted as an anticoagulant compound from salivary glands of medical leech. This drug is a very potent inhibitor of thrombin and is so effective for arterial and venous thrombosis prevention. Therefore, it can compete with heparin. The aim of this study was to add a pelB signal peptide to pET-22b plasmid and to investi...

متن کامل

On the fine spectra of the generalized difference operator Delta_{uv} over the sequence space c0

The main purpose of this paper is to detemine the fine spectrum of the generalized difference operator Delta_{uv} over the sequence space c0. These results are more general than the fine spectrum of the generalized difference operator Delta_{uv} of Srivastava and Kumar.

متن کامل

On the fine spectra of the Zweier matrix as an operator over the weighted sequence space $l_{p}(w)$

In the present paper, the ne spectrum of the Zweier matrix as anoperator over the weighted sequence space `p(w); have been examined.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 36  شماره 

صفحات  -

تاریخ انتشار 2008