1 / 86

Pangenomics

Exploring and utilization of the genetic variation within the gene pool of modern crop species is a critical step in maintaining and improving crop productivity. The genetic variation ranging from SNPs to large structural variation can result in variation in the gene content between individuals of the same species. The pan-genome concept was proposed to better capture this variation; as single reference genome is insufficient to capture the complete genetic diversity of a species.

Maruthi3
Download Presentation

Pangenomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNIVERSITY OF AGRICULTURAL SCIENCES, BANGALORE DEPARTMENT OF GENETICS AND PLANT BREEDING I Doctoral Seminar PRESENTED BY- MARUTHI PRASAD B P ID. No. - PAMB 1066

  2. POPULATION INCREASE!!!! CLIMATE CHANGE!!!! • Capturing maximum genetic variation • Time & Accuracy • Understanding genome of crop PEST AND DISEASE OUTBREAK!!!! 3

  3. Reference genome - Tool which serve as a base for crop improvement ngs Numerous sequencing efforts have been undertaken in plants and, as a result, reference genome sequences have become available for several crops, which serve as a base for crop improvement efforts. Tao et al. (2019) 4

  4. Single reference genome is adequate? SINGLE-REFERENCE GENOME Single reference genome oriented Comparative genome analysis What if our reference genome is incomplete to capture whole information's ? 5

  5. Limitations of using single reference genome Dynamic nature of the genome Incomplete representation of genetic diversity Biases in the reference genome 6

  6. BEYOND A SINGLE REFERENCE GENOME !!! The move from a single reference genome to multiple reference genomes will better illuminate the mining of genetic diversity for crop improvement by providing a more precise and comprehensive guiding principle Bayer et al. (2020) 7

  7. APPROACHES, APPLICATIONS AND RECENT DEVELOPMENTS IN PANGENOMICS Looking beyond the single reference genome 8

  8. Outline of Presentation 9

  9. What is Pangenome? Genomic data derived from multiple accessions and cultivars Full extent of sequence variations within a species PAN-GENOMIC approach to figure out new genes and alleles directly related to phenotype “A pangenome refers to the full complement of genes of a biological clade, such as a species, which can be partitioned into a set of core genes that are shared by all individuals and a set of dispensable genes that are partially shared or individual specific.” 10

  10. CLASSIFICATION OF GENES IN PANGENOME 11

  11. 12

  12. Difference between Core Genes and Dispensable Genes 13

  13. When and Who ? Herve Tettelin • Pangenomes were first introduced by Tettelinet al., in 2005to describe gene diversity in Streptococcus agalactiae DuccioMedini • Pangenomics in plants was first proposed by Morgante et al. (2007) Michele Morgante 14

  14. Timeline of Developments in Pangenomic Research Review of analytical tool and model developed over 10 years of pangenome research (Vernikoset al., 2015) E. Coli pangenome built using 1085 genomes Rice accessory genome characterized Human pangenome Bacterial upper kingdom pangenome The pangenome introduced by Tettelin Human pangenome Pig pangenome Pangenome of phytoplankton Emiliania huxleyi Plant pangenome concept proposed by Morgante et al. Pangenome of bread wheat and stiff brome 2009 2008 2013 2014 2018 2017 2019 2015 2006 2016 2007 Brassica oleracea pangenome Poplar genome Soybean and wild relatives pangenome Maize transcriptome Rice pangenome built using 3010 accessions Saccharomyces cerevisiae pangenome built using 10 isolates Streptococcus pneumoniae pangenome Escherichia coli pangenome Goliczet al. (2019) 15

  15. Timeline of Developments in Pangenomic Research First graph-based plant pan-genome was constructed in soybean ‘‘map-to-pan’’ strategy 16

  16. MAJOR DRIVING FORCES FOR SVs UNDERLYING THE VARIABLE SEQUENCES OF PLANT PAN-GENOMES 17

  17. 1.Transposable elements Insertion of TEs in regulatory regions (tb1, vgt1, ZmCCT10, and ZmCCT9 ) 18

  18. 2. Non-Allelic Homologous Recombination (NAHR) 19

  19. 3.Genetic Introgression/Horizontal Gene Transfer (HGT) 20

  20. 4. Biased gene loss (fractionation) in polyploid plants 21

  21. How a Pangenome is generated? 22 • Li et al. (2022)

  22. Pan-genome workflow 2. Identification of Genetic Variation 1. Selection of Germplasm 3. Genotyping 4. Linking Genotypic &Phenotypic variation 23

  23. Selecting germplasm for a sequence assembly 24

  24. Sequencing NGS Technology TGS Technology 25

  25. Assembly errors Missing gene 26

  26. PAN GENOME ASSEMBLY METHODS Iterative assembly Graph assembly De novo assembly 27

  27. DE NOVO ASSEMBLY De Novo Assembly • Error prone • Costly • high-quality data with high sequencing coverage is required 28

  28. De Novo genome assembly 1. Short/Long reads 2. Contig assembly 3. Scaffold/Chromosome assembly 4. Multiple alignment of genomic regions 5. Pan-genome construction 29

  29. ITERATIVE MAPPING • Less expensive • It requires much less data • Permits the assessment of large numbers of individuals with relatively low sequencing coverage. Changes in the gene order 30

  30. ITERATIVE MAPPING Reference genome Mapping of the reads to the reference sequence Assembly of the unmapped reads Building pangenome 31

  31. GRAPH BASED ASSEMBLY • Graph structure to represent the diversity of genomic sequences • Presents variation across multiple genomes as different paths along a graph of sequence or variant nodes 32

  32. Steps involved in graph-based pangenome assembly 33

  33. Software's used in graph-based pangenome construction 34

  34. TYPES OF PANGENOME 35

  35. Ratio of Core vs Dispensable genes A higher ploidy and outcrossing rate provides extra level of diversity and therefore a larger pangenome with higher percentage of dispensable genes. 36

  36. Case study 1 Varshney, R. K. et al. (2021) Objective: To construct a chickpea pan-genome which provide insights into species divergence, the migration of the cultigen (C. arietinum) and identification of rare allele burden and fitness loss in chickpea. 37

  37. Results Chickpea pan-genome (592.58 Mb) developed using an iterative mapping and assembly approach. A total of 29,870 genes were identified, of which 1,582 were to our knowledge novel compared to previously reported genes. Gene ontology (GO) annotations identified genes that encode response to oxidative stress, response to stimulus, heat shock protein, cellular response to acidic pH and response to cold, suggesting a possible role in adaptation. The modeling curve analysis showed that chickpea pan-genome is closed 38

  38. Cultivated (2,258) and C. reticulatum (22) accessions were analysed to discover structural variations, as compared to the CDC Frontier genome. • More structural variations in the C. reticulatumaccessions because of their high divergence from cultivated chickpea. • They further identified 793 gene-gain copy number variants (CNVs) and 209 gene-loss CNVs in cultivated accessions, and 643 gene-gain and 247 gene-loss CNVs in C. reticulatum accessions. 39

  39. 1. Chickpea experienced a strong bottleneck beginning around 10,000 years ago 2. The population size reaching its minimum around 1,000 years ago • 3. Followed by a very strong expansion of the population within the last 400 years, suggest a strong recent expansion of chickpea agriculture. Reconstructed the past history of effective size of chickpea population using 150 randomly chosen cultivated genotypes of chickpea using markovian coalescent as implemented in SMC++ (Terhorst et al., 2017). 40

  40. Neighbour-joining tree constructed indicates a clear out-grouping of wild species accessions from cultivated accessions • The cultivated accessions formed three distinct clusters • One landrace from East Africa (ICC 16369) grouped together with wild species accessions indicating that it is mislabeled as belonging to the cultivated chickpea 41

  41. Conclusions from this study They constructed a chickpea pan-genome and identified the novel genes which are not reported earlier Divergence tree constructed allowed them to estimate the divergence of cicer over the last 21 million years Identified selective sweeps of genes under domestication & bottleneck leading to reduced genetic diversity 42

  42. Case study 2 2021 2021 Objective: • To develop a high-quality rice pan-genome of genetically diverse rice accessions through de novo genome assemblies • Demonstration of the impact of structural variation on environmental adaptations and agronomic traits 43

  43. Materials and methods PacBio SMRT sequencing De novo assembled Assemblies were evaluated for completeness using BUSCO (Benchmarking Universal Single-Copy Orthologs) 44

  44. Results • They had built a pan-genome of cultivated rice comprising 66,636 genes. • Distribution analysis showed that 20,374 genes were categorized as ‘‘core genes’’ and46,262 genes were categorized as ‘‘dispensable genes’’ which included 14,609 accession-private genes. • They identified an average 24,469 SVs per accession relative to Nipponbare. 45

  45. Contribution of SVs in rice environmental adaptation OsWAK112d gene, a known negative regulator of blast resistance Two Independent deletions in OsWAK112d gene contributed to environmental adaptation by enhancing blast resistance in rice. Fig 1.Schematic illustrating the deletions of OsWAK112d in the LJ and N22 accessions Fig 2. The distributions of the deletion of OsWAK112d in subpopulations of O. sativa and wild rice population 46

  46. Association of Gene CNVs with variations in agronomic traits In addition to SVs, gCNVs were inferred for 25,549(38.34%) of the protein coding genes in the rice pan genome. CNV of OsVIL1 is likely associated with flowering time and grain number 47

  47. Conclusions from this study • De novo assembly of 31 high-quality genomes for genetically diverse accessions • Pan-genome-scale resources and a graph-based genome reveal hidden SVs andgCNVs • The derived state of O. sativa SVs was inferred using the O. glaberrima assembly • SVs and gCNVshave shaped gene expression profiles and agronomic trait variations 48

  48. APPLICATIONS OF PAN-GENOMES IN PLANT GENETIC STUDIES AND BREEDING 49

  49. 1. Pangenomics in Utilizing Crop Wild Relatives (CWRs) 1. Pangenomics in Utilizing Crop Wild Relatives (CWRs) 50

More Related