Introduction

Illumina sequencing of 27 cultivated and wild alfalfa transcriptomes: gene and single nucleotide polymorphism (SNP) discovery Xuehui Li1, Ananta Acharya1, Andrew D. Farmer2, John A. Crow2, Arvind K. Bharti2, Yanling Wei1, YuanhongHan1, Jiqing Gou1, Gregory D. May2, Maria J. Monteros1, E. Charles Brummer1 Introduction Alfalfa, a perennial, outcrossing species, is a widely planted forage legume. Currently, improvement of cultivated alfalfa mainly relies on recurrent phenotypic selection. Marker-assisted breeding holds promise to propel alfalfa improvement, but is constrained by the lack of a large number of markers. With low cost and high time/labor efficiency, next generation sequencing enables high-throughput discovery of SNPs, even for species with large complex genomes. In this experiment, our objective was to increase the number of SNPs for alfalfa research and molecular breeding. Materials and Methods We have sequenced 27 alfalfa genotypes (23 cultivated tetraploids and four wild diploids). Total RNA was isolated from young and old stems, and pooled for each genotype for Illumina sequencing. Each transcriptome was sequenced on a single lane of the Illumina Genome Analyzer IIx to produce about 17-32 million 72-bp reads. Quality-filtered reads were used for de novo assembly to generate contigs. To assess the representation and quality of our alfalfa assembly, BLASTx was performed against the annotated non-redundant GenBank protein database. SNPs were detected by realigned reads to the assembled contigs under conditions of : (1) average quality of bases calling the SNP >20; (2) number of uniquely aligned reads calling the SNP >=20; and (3) p value of contingency test <0.01. References Li et al., 2011. Prevalence of segregation distortion in diploid alfalfa and its implications for genetics and breeding applications. Theor. Appl. Genet. 123:667-679. Robins et al., 2007. Genetic mapping of biomass production in tetraploid alfalfa. Crop Sci. 47:1-10. Acknowledgment This project is funded by the USDA National Institute of Food and Agriculture. Results and Conclusion • Sequencing of 27 genotypes resulted in a total of 740 million reads (Table 1), the assembling of which generated 25,183 contigs with a total length of 26.8 Mbp and an average length of 1,065 bp, giving an average read depth of 56-fold for each genotype. • Overall, 21,954 (87.2%) of 25,183 contigs matched to 14,878 unique protein accessions. • The realignment of reads to the contigs enabled the detection of 873,384 putative SNPs and 25,183 InDels. In total, 7,812 (31%) of the 25,183 contigs aligned to M. truncatulapseudomolecules version 3.5.1, carrying 298,771 SNPs and 9,205 InDels, which were widely distributed along the eight chromosomes (Figure 1). • High Resolution Melting (HRM) analysis of 192 putative SNPs validated about 85% of them, including confirming the allele dosage inferred from sequencing (Figure 2a and 2b). • Principle Components Analysis (PCA) with the 173,947 SNPs indicated that subspecies falcatais clearly separated from diploid caerulea and tetraploid sativa (cultivated tetraploid alfalfa) (Figure 3). • Selected SNPs have been mapped to tetraploid and diploid alfalfa linkage maps previously constructed with RFLP and SSR markers (Li et al., 2011; Robins et al., 2007) (Figure 4). • An alfalfa Illumina Infinium array with ~10,000 SNPs is being developed, which will enable high-throughput genotyping and facilitate genome-wide association studies and genomic selection in alfalfa. • Our results demonstrated that next generation transcriptome sequencing is an efficient way to discover high quality SNPs for alfalfa. These ESTs and SNP markers could effectively contribute to future alfalfa research and breeding applications. Table 1. 27 genotypes used in this study and sequence statistics Figure 1. SNPs distribution along eight chromosomes of M. truncatula.X-axis is the genome location for each chromosome. The number of SNPs per 1,000 bp was calculated for each 0.5 million base pair interval and plotted on the Y-axis. Figure 2. Examples of high resolution melting analysis of SNP (a) Validation of three SNP phenotypes; (b) Validation of potential allele dose in heterozygotes. (a) (b) Figure 3. PCA analysis of 27 genotypes.Blue solid circle represents tetraploid sativa; red solid circle represents tetraploid falcata; blue triangle represents diploid caerulea; red triangle represents diploid falcata. Figure 4. Physical map of M.truncatula (Build 3.0) and genetic linkage maps for one diploid (CC78) and one tetraploid mapping population (ABE408×Wis6) based on RFLP, SSR, and SNP. The physical locations indicated on the maps are all in the scale of 5 × 105 base pairs. Markers in red on linkage maps are SNPs.

Introduction

Introduction

Presentation Transcript

Introduction to introduction to introduction to … Optimization

INTRODUCTION/ INTRODUCTION

Introduction

INTRODUCTION

Introduction

Introduction