1 / 20

A. Dereeper , G. Sarah, F. Sabot, Y. Hueber

Exploiting SNP polymorphism data. A. Dereeper , G. Sarah, F. Sabot, Y. Hueber. Formation Bio-informatique, 9 au 13 février 2015. Tablet. Graphical tools to visualize assemblies Accept many formats ACE, SAM, BAM. A. Dereeper, G. Sarah, F. Sabot, Y. Hueber.

cdunaway
Download Presentation

A. Dereeper , G. Sarah, F. Sabot, Y. Hueber

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting SNP polymorphism data A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  2. Tablet • Graphical tools to visualize assemblies • Accept many formats ACE, SAM, BAM A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  3. GATK (Genome Analysis ToolKit) • Software package to analyse NGS data. • Implemented to analyse human resequencing data, for medical purpose (1000 genomes, The Cancer Genome Atlas) • Included: depth analyses, quality score recalibration, SNP/InDel detection • Complementary with other pacjages: SamTools, PicardTools, VCFtools, BEDtools PREPROCESS: * Index human genome (Picard), we used HG18 from UCSC. * Convert Illumina reads to Fastq format * Convert Illumina 1.6 read quality scores to standard Sanger scores FOR EACH SAMPLE: 1. Align samples to genome (BWA), generates SAI files. 2. Convert SAI to SAM (BWA) 3. Convert SAM to BAM binary format (SAM Tools) 4. Sort BAM (SAM Tools) 5. Index BAM (SAM Tools) 6. Identify target regions for realignment (Genome Analysis Toolkit) 7. Realign BAM to get better Indel calling (Genome Analysis Toolkit) 8. Reindex the realigned BAM (SAM Tools) 9. Call Indels (Genome Analysis Toolkit) 10. Call SNPs (Genome Analysis Toolkit) 11. View aligned reads in BAM/BAI (Integrated Genome Viewer) A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  4. Fastq (RC1) Fastq (RC2) Fastq (RC3) Fastq (RC4) Cutadapt Cutadapt Cutadapt Cutadapt …. Mapping BWA Mapping BWA Mapping BWA Mapping BWA Add or Replace Groups Add or Replace Groups Add or Replace Groups Add or Replace Groups BAM with read group BAM with read group BAM with read group BAM with read group mergeSam Global BAM with read group VCF file

  5. Format VCF (Variant Call Format) Interest: variation description for each position + genotype assignations ##fileformat=VCFv4.0 ##fileDate=20090805 ##source=myImputationProgramV3.1 ##reference=1000GenomesPilot-NCBI36 ##phasing=partial ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> ##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency"> ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"> ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> ##FILTER=<ID=q10,Description="Quality below 10"> ##FILTER=<ID=s50,Description="Less than 50% of samples have data"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  6. Autres fonctionalités GATK Other GATK functionalities • Module DepthOfCoverage: Allows to getsequencingdepth for eachgene, each position and eachindividual • Module ReadBackedPhasing: Allows to set, if possible, associations betweenalleles (phase and haplotypes) whenwe are in an heterozygote situation. Et non AGG GGA A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  7. Format Pileup • Another format for variant calling (generated by samtools) • Describealignmentrow by row (not line by line like in SAM format) • Used by VarScanlike softwares (varscan pileup2snp) • Frequentlyused for rare variants, with a lowfrequency (e.g. pop virales) A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  8. Projet Gigwa, pour la gestion des données massives de variants (GBS, RADSeq, WGRS) « With NGS arise serious computational challenges in terms of storage, search, sharing, analysis, and data visualization, that redefine some practices in data management. » - Based on NoSQL technology - Handles VCF files (Variant Call Format) and annotations - Supports multiple variant types: SNPs, InDels, SSRs, SV - Powerful genotyping queries - Easily scalable with MongoDB sharding - Transparent access - Takes phasing information into account when importing/exporting in VCF format A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  9. http://gigwa.southgreen.fr/gigwa/ A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  10. SNiPlay: Web application for polymorphism analyses http://sniplay.cirad.fr A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  11. Upload a VCF file in SNiPlay Upload a VCF file (+ reference if not available in genome collection) Select ricegenome The referencecorresponce to mRNA

  12. SNPs annotation using SnpEff A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  13. Design de puces Illumina Illumina ship design Fichier de soumission pour Illumina Submission file for Illumina Genotypage file Analyse withBeadStudio software Cartesiancoordinates A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  14. Diversity analysis Librairie EggLib

  15. Lessfrequent haplotype Distance between 2 haplotypes (#mutations) Frequenthaplotypes Groupe distribution in thishaplotype Haplotype network A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  16. Allele sharing between groups External file (optional) Individu, group Ind1, Table Ind2, Table Ind3, Table Ind4, East Ind5, East Ind6, East Ind7, East Ind8, West A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  17. GWAS (Genome-Wide Association Studies) • Estimate association between a marker and a phenotypic character • Manhattan plots: displays GWAS statistic tests (-log10 pvalue) along chromosomes • TASSEL, MLMM sofwares • False positives because of the studied structuration panel => correction using structure population et and kinship A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  18. Analyse de structure de populations Population structure analysis • Test different values of K (estimates of probability that samples are structured in K populations) • For the best value of K, the application shows Q estimates for each individual (admixture percent) A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  19. Relatedness between individuals (kinship matrix) • TASSEL and plink softwares • Estimation of relatedness between individuals using a distance matrix A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

  20. TD: Study of root charaters using GWAS in Oryza sativa japonica. Influence of a correction using structure and kinship A. Dereeper, G. Sarah, F. Sabot, Y. Hueber Formation Bio-informatique, 9 au 13 février 2015

More Related