Prominent use of distal 5' transcription start sites and discovery of a large number of additional exons in ENCODE regions.

نویسندگان

  • France Denoeud
  • Philipp Kapranov
  • Catherine Ucla
  • Adam Frankish
  • Robert Castelo
  • Jorg Drenkow
  • Julien Lagarde
  • Tyler Alioto
  • Caroline Manzano
  • Jacqueline Chrast
  • Sujit Dike
  • Carine Wyss
  • Charlotte N Henrichsen
  • Nancy Holroyd
  • Mark C Dickson
  • Ruth Taylor
  • Zahra Hance
  • Sylvain Foissac
  • Richard M Myers
  • Jane Rogers
  • Tim Hubbard
  • Jennifer Harrow
  • Roderic Guigó
  • Thomas R Gingeras
  • Stylianos E Antonarakis
  • Alexandre Reymond
چکیده

This report presents systematic empirical annotation of transcript products from 399 annotated protein-coding loci across the 1% of the human genome targeted by the Encyclopedia of DNA elements (ENCODE) pilot project using a combination of 5' rapid amplification of cDNA ends (RACE) and high-density resolution tiling arrays. We identified previously unannotated and often tissue- or cell-line-specific transcribed fragments (RACEfrags), both 5' distal to the annotated 5' terminus and internal to the annotated gene bounds for the vast majority (81.5%) of the tested genes. Half of the distal RACEfrags span large segments of genomic sequences away from the main portion of the coding transcript and often overlap with the upstream-annotated gene(s). Notably, at least 20% of the resultant novel transcripts have changes in their open reading frames (ORFs), most of them fusing ORFs of adjacent transcripts. A significant fraction of distal RACEfrags show expression levels comparable to those of known exons of the same locus, suggesting that they are not part of very minority splice forms. These results have significant implications concerning (1) our current understanding of the architecture of protein-coding genes; (2) our views on locations of regulatory regions in the genome; and (3) the interpretation of sequence polymorphisms mapping to regions hitherto considered to be "noncoding," ultimately relating to the identification of disease-related sequence alterations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mycobacterium avium subsp. paratuberculosis induces differential cytosine methylation at miR-21 transcription start site region

Mycobacterium aviumsubspecies paratuberculosis (MAP), as an obligate intracellular bacterium, causes paratuberculosis (Johne’s disease) in ruminants. Plus, MAP has consistently been isolated from Crohn’s disease (CD) lesions in humans; a notion implying possible direct causative ...

متن کامل

Expression of the plasma prekallikrein gene: utilization of multiple transcription start sites and alternative promoter regions.

The plasma prekallikrein gene is expressed in many different human tissues at distinctly different levels and therefore tissue-specific control of the gene transcription is likely. In this study we demonstrate that transcription of the plasma prekallikrein gene can be initiated at multiple sites, for which at least four different promoters are utilized. A comparison of the genomic and mRNA sequ...

متن کامل

The Structure of Characterization the Rat Aggrecan of Its Promoter * Gene and Preliminary

Aggrecan is a major structural component of cartilage extracellular matrix and a specific gene product of differentiated chondrocytes. cDNA clones have been used to isolate rat aggrecan genomic clones from phage and cosmid libraries, producing over 80 kilobases (kb) of overlapping DNA containing the complete rat aggrecan gene, including 12 kb of 5’and 8 kb of 3”flanking DNA. DNA sequencing show...

متن کامل

Characterization of the gene and promoter for RTI40, a differentiation marker of type I alveolar epithelial cells.

In an effort to understand the processes that establish and maintain the differentiated state of the alveolar epithelium, we have analyzed the gene for rat type I cell 40 kD protein (RTI40), an apical integral plasma membrane protein expressed in type I but not type II alveolar epithelial cells. The RTI40 gene spans 35 kilobase pairs; it contains 6 exons and at least 6 rat Identifier repetitive...

متن کامل

Thematic Minireview Series on Results from the ENCODE Project: Integrative Global Analyses of Regulatory Regions in the Human Genome

The Encyclopedia of DNA Elements (ENCODE) Project (http://www.genome.gov/10005107) is an international collaboration of research groups funded by the National Human Genome Research Institute, with the goal of delineating all functional elements encoded in the human genome (1). This project began in 2003with a targeted analysis of a selected 1%of the human genome. The results from the pilot proj...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome research

دوره 17 6  شماره 

صفحات  -

تاریخ انتشار 2007