Large-scale RACE approach for proactive experimental definition of C. elegans ORFeome.
نویسندگان
چکیده
Although a highly accurate sequence of the Caenorhabditis elegans genome has been available for 10 years, the exact transcript structures of many of its protein-coding genes remain unsettled. Approximately two-thirds of the ORFeome has been verified reactively by amplifying and cloning computationally predicted transcript models; still a full third of the ORFeome remains experimentally unverified. To fully identify the protein-coding potential of the worm genome including transcripts that may not satisfy existing heuristics for gene prediction, we developed a computational and experimental platform adapting rapid amplification of cDNA ends (RACE) for large-scale structural transcript annotation. We interrogated 2000 unverified protein-coding genes using this platform. We obtained RACE data for approximately two-thirds of the examined transcripts and reconstructed ORF and transcript models for close to 1000 of these. We defined untranslated regions, identified new exons, and redefined previously annotated exons. Our results show that as much as 20% of the C. elegans genome may be incorrectly annotated. Many annotation errors could be corrected proactively with our large-scale RACE platform.
منابع مشابه
WormBase as an integrated platform for the C. elegans ORFeome.
The ORFeome project has validated and corrected a large number of predicted gene models in the nematode C. elegans, and has provided an enormous resource for proteome-scale studies. To make the resource useful to the research and teaching community, it needs to be integrated with other large-scale data sets, including the C. elegans genome, cell lineage, neurological wiring diagram, transcripto...
متن کاملC. elegans ORFeome version 3.1: increasing the coverage of ORFeome resources with improved gene predictions.
The first version of the Caenorhabditis elegans ORFeome cloning project, based on release WS9 of Wormbase (August 1999), provided experimental verifications for approximately 55% of predicted protein-encoding open reading frames (ORFs). The remaining 45% of predicted ORFs could not be cloned, possibly as a result of mispredicted gene boundaries. Since the release of WS9, gene predictions have i...
متن کاملClosing in on the C. elegans ORFeome by cloning TWINSCAN predictions.
The genome of Caenorhabditis elegans was the first animal genome to be sequenced. Although considerable effort has been devoted to annotating it, the standard WormBase annotation contains thousands of predicted genes for which there is no cDNA or EST evidence. We hypothesized that a more complete experimental annotation could be obtained by creating a more accurate gene-prediction program and t...
متن کاملToward improving Caenorhabditis elegans phenome mapping with an ORFeome-based RNAi library.
The recently completed Caenorhabditis elegans genome sequence allows application of high-throughput (HT) approaches for phenotypic analyses using RNA interference (RNAi). As large phenotypic data sets become available, "phenoclustering" strategies can be used to begin understanding the complex molecular networks involved in development and other biological processes. The current HT-RNAi resourc...
متن کاملWorfDB: the Caenorhabditis elegans ORFeome Database
WorfDB (Worm ORFeome DataBase; http://worfdb.dfci.harvard.edu) was created to integrate and disseminate the data from the cloning of complete set of approximately 19 000 predicted protein-encoding Open Reading Frames (ORFs) of Caenorhabditis elegans (also referred to as the 'worm ORFeome'). WorfDB serves as a central data repository enabling the scientific community to search for availability a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genome research
دوره 19 12 شماره
صفحات -
تاریخ انتشار 2009