Resequencing Data of 20 Arabidopsis Ecotypes

نویسندگان

  • Georg Zeller
  • Richard Clark
  • Gunnar Rätsch
  • Daniel Huson
  • Detlef Weigel
چکیده

This diploma thesis describes work on a chip resequencing project of 20 ecotypes belonging to the plant model species Arabidopsis thaliana, and these ecotypes are accessions from natural populations. Chip resequencing primarily aims at identifying single nucleotide polymorphisms (SNPs), the most abundant class of naturally occurring sequence variation. For resequencing, DNA microarrays are employed on which a genome-wide tiling of 25-mer probes is spotted. These probes are designed complementary to an a priori reference genome sequence. For each interrogated site probes with any of the four possible nucleotides in the middle are represented so that a nucleotide substitution in the interrogated genome will generally lead to a hybridization signal that is strongest for the corresponding non-reference probe at a SNP position. The huge data set resulting from the resequencing of 20 genomes of ∼125 Mb has been stored in a MySQL database and a viewer has been implemented in Java for graphical display of resequencing data recovered directly from the database. Part of this thesis is a basic characterization of the resequencing data. Intensity and specificity of hybridization exhibit a large degree of variability, the difference in intensity being more than 10-fold in extreme cases. Examinations revealed that this variability is in part caused by experimental factors, and in part determined by sequence properties of the probe. High AT content and self-complementarity, favoring hairpin formation, negatively affect hybridization, whereas probes with high-complexity sequences, measured by sequence entropy, hybridize better on average. In order to estimate the potential of a given probe for cross-hybridization to multiple DNA sequence tracts in the genome, a systematic search for repeated 25-mers in the reference genome has been conducted. The result suggests that more than 90 % false SNP calls in the reference ecotype, Col-0, are caused by cross-hybridization found with this search method. The error rates for SNP calls in other ecotypes can be improved with a filter based on 25-mer matches. Finally, an algorithm has been developed for the prediction of large deletions from resequencing data. It is a comparative loss-of-signal approach that identifies regions where the target ecotype exhibits strongly reduced hybridization signal relative to the reference. More than 700 deletions larger than 200 bp have been predicted for the ecotype Ler-1 some of which are accurate estimates of deletions known from dideoxy sequencing. The main obstacles for deletion calling are regions which are repetitive or produce an ambiguous hybridization signal from the reference. This leads to uncertainties about start and end points of putative deletions. As the set of known large deletions in Ler-1 is incomplete, it is difficult to assess the specificity of our deletion calling heuristic. Indirect evaluations suggest that among the predictions the number of true deletions is higher than the number of false positives. A better assessment will be possible when some regions containing putative deletions have been sequenced.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting polymorphic regions in Arabidopsis thaliana with resequencing microarrays.

Whole-genome oligonucleotide resequencing arrays have allowed the comprehensive discovery of single nucleotide polymorphisms (SNPs) in eukaryotic genomes of moderate to large size. With this technology, the detection rate for isolated SNPs is typically high. However, it is greatly reduced when other polymorphisms are located near a SNP as multiple mismatches inhibit hybridization to arrayed oli...

متن کامل

Use of natural variation reveals core genes in the transcriptome of iron-deficient Arabidopsis thaliana roots

Iron (Fe) is an essential mineral micronutrient for plants and animals. Plants respond to Fe deficiency by increasing root uptake capacity. Identification of gene networks for Fe uptake and homeostasis could result in improved crop growth and nutritional value. Previous studies have used microarrays to identify a large number of genes regulated by Fe deficiency in roots of three Arabidopsis eco...

متن کامل

PRIMe Update: Innovative Content for Plant Metabolomics and Integration of Gene Expression and Metabolite Accumulation

PRIMe (http://prime.psc.riken.jp/), the Platform for RIKEN Metabolomics, is a website that was designed and implemented to support research and analyses ranging from metabolomics to transcriptomics. To achieve functional genomics and annotation of unknown metabolites, we established the following PRIMe contents: MS2T, a library comprising >1 million entries of untargeted tandem mass spectrometr...

متن کامل

Array-based Genome Comparison of Arabidopsis Ecotypes using Hidden Markov Models

Abstract: Arabidopsis thaliana is an important model organism in plant biology with a broad geographic distribution including ecotypes from Africa, America, Asia, and Europe. The natural variation of different ecotypes is expected to be reflected to a substantial degree in their genome sequences. Array comparative genomic hybridization (Array-CGH) can be used to quantify the natural variation o...

متن کامل

A Genome Scan for Genes Underlying Microgeographic-Scale Local Adaptation in a Wild Arabidopsis Species

Adaptive divergence at the microgeographic scale has been generally disregarded because high gene flow is expected to disrupt local adaptation. Yet, growing number of studies reporting adaptive divergence at a small spatial scale highlight the importance of this process in evolutionary biology. To investigate the genetic basis of microgeographic local adaptation, we conducted a genome-wide scan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005