1 / 28

A. Dereeper, G. Sarah, F. Sabot

Explore SNP polymorphism data. A. Dereeper, G. Sarah, F. Sabot. Bioinformatics trainings, Vietnam Hanoi, November, 2015. Tablet. Graphical tools to visualize assemblies Accept many formats ACE, SAM, BAM. A. Dereeper, G. Sarah, F. Sabot, Y. Hueber. A. Dereeper, G. Sarah, F. Sabot.

wsabbagh
Download Presentation

A. Dereeper, G. Sarah, F. Sabot

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Explore SNP polymorphism data A. Dereeper, G. Sarah, F. Sabot Bioinformatics trainings, Vietnam Hanoi, November, 2015

  2. Tablet • Graphical tools to visualize assemblies • Accept many formats ACE, SAM, BAM A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  3. GATK (Genome Analysis ToolKit) • Software package to analyse NGS data. • Implemented to analyse human resequencing data, for medical purpose (1000 genomes, The Cancer Genome Atlas) • Included: depth analyses, quality score recalibration, SNP/InDel detection • Complementary with other packages: SamTools, PicardTools, VCFtools, BEDtools PREPROCESS: * Index human genome (Picard), we used HG18 from UCSC. * Convert Illumina reads to Fastq format * Convert Illumina 1.6 read quality scores to standard Sanger scores FOR EACH SAMPLE: 1. Align samples to genome (BWA), generates SAI files. 2. Convert SAI to SAM (BWA) 3. Convert SAM to BAM binary format (SAM Tools) 4. Sort BAM (SAM Tools) 5. Index BAM (SAM Tools) 6. Identify target regions for realignment (Genome Analysis Toolkit) 7. Realign BAM to get better Indel calling (Genome Analysis Toolkit) 8. Reindex the realigned BAM (SAM Tools) 9. Call Indels (Genome Analysis Toolkit) 10. Call SNPs (Genome Analysis Toolkit) 11. View aligned reads in BAM/BAI (Integrated Genome Viewer) A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  4. Fastq (RC1) Fastq (RC2) Fastq (RC3) Fastq (RC4) Cutadapt Cutadapt Cutadapt Cutadapt …. Mapping BWA Mapping BWA Mapping BWA Mapping BWA Add or Replace Groups Add or Replace Groups Add or Replace Groups Add or Replace Groups BAM with read group BAM with read group BAM with read group BAM with read group mergeSam Global BAM with read group VCF file

  5. For GBS data • Tassel pipeline • Version 5 TASSEL-GBS Plos One, 2014 A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  6. GBS RAD-Seq RNA-Seq WGRS Reads pre-processing and mapping + Galaxy workflow SNP Calling and genotype assignation Tassel Genotyping data Storage and mining Genotyping data analyses and visualization (GWAS, diversity…)

  7. Format VCF (Variant Call Format) Advantages: Variation description for each position + genotype assignations Indexed flat files. Binary files also exist: BCF format A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  8. Autres fonctionalités GATK Other GATK functionalities • Module DepthOfCoverage: Allows to get sequencing depth for each gene, each position and each individual • Module ReadBackedPhasing: Allows to set, if possible, associations between alleles (phase and haplotypes) when we are in an heterozygote situation. Et non AGG GGA A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  9. Format Pileup • Another format for variant calling (generated by samtools) • Describe alignment row by row (not line by line like in SAM format) • Used by VarScan like softwares (varscan pileup2snp) • Frequently used for rare variants, with a low frequency (e.g. viral pop) A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  10. Projet Gigwa, pour la gestion des données massives de variants (GBS, RADSeq, WGRS) « With NGS arise serious computational challenges in terms of storage, search, sharing, analysis, and data visualization, that redefine some practices in data management. » - Based on NoSQL technology - Handles VCF files (Variant Call Format) and annotations - Supports multiple variant types: SNPs, InDels, SSRs, SV - Powerful genotyping queries - Easily scalable with MongoDB sharding - Transparent access - Takes phasing information into account when importing/exporting in VCF format A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  11. http://gigwa.southgreen.fr/gigwa/ A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  12. SNiPlay: Web application for polymorphism analyses http://sniplay.southgreen.fr A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  13. IFB project “Galaxy4Sniplay” (WP4 IFB, Plant node) A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  14. Available using Galaxy Toolshed • Installable on any Galaxy instance A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  15. http://sniplay.southgreen.fr Upload a VCF file in SNiPlay Upload a VCF file (+ reference if not available in genome collection) Select rice genome The reference corresponce to mRNA A. Dereeper, G. Sarah, F. Sabot Bioinformatics trainings, Vietnam Hanoi, November, 2015

  16. Filters using VCFtools or Gigwa • Maf • Missing data • Annotation • Position… A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  17. SNP annotation using SnpEff A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  18. A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  19. sNMF • Test different values of K (estimates the probability (likelihood tests) that samples are structured in K populations) • For the best value of K, the application shows Q estimates for each individual (admixture percent) (probability that the individual belongs to each population)

  20. MDS (Multi-Dimensional Scale) plot SNP-based Distance tree with FastME

  21. Pi: Nucleotide diversity: Average number of nucleotide differences per site between any two DNA sequences chosen randomly from the sample population Used to measure the degree of polymorphism within a population Diversity analysis Comparison between individuals A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  22. => Can allows the detection of introgression Introgression = Movement of a exogene region (gene flow) from one species into the gene pool of another by the repeated backcrossing of an interspecific hybrid with one of its parent species Widely used in agronomy obtained but can occurs naturally A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  23. Low frequency haplotype Distance between 2 haplotypes (nb of mutations) High frequency haplotypes Group distribution whithin this haplotype Haplotypes • Haplotype reconstruction using Gevalt • Network with Haplophyle • Available only for regions presenting few variants (short regions, genes) • Exploit phased VCF (in progress…) A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  24. Design de puces Illumina Illumina ship design Fichier de soumission pour Illumina Submission file for Illumina Genotypage file Analyse with BeadStudio software Cartesian coordinates A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  25. GWAS (Genome-Wide Association Studies) • Estimate association between a marker and a phenotypic character • Manhattan plots: displays GWAS statistical tests (-log10 pvalue) along chromosomes • TASSEL, MLMM sofwares • False positives because of the studied structuration panel => correction using structure population et and kinship A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  26. TD: Study of root charaters using GWAS in Oryza sativa japonica. Influence of a correction using structure and kinship A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  27. Analyse de structure de populations Population structure analysis • Test different values of K (estimates of probability that samples are structured in K populations) • For the best value of K, the application shows Q estimates for each individual (admixture percent) A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

  28. Relatedness between individuals (kinship matrix) • TASSEL and plink softwares • Estimation of relatedness between individuals using a distance matrix A. Dereeper, G. Sarah, F. Sabot, Y. Hueber A. Dereeper, G. Sarah, F. Sabot Formation Bio-informatique, 9 au 13 février 2015 Bioinformatics trainings, Vietnam Hanoi, November, 2015

More Related