1 / 12

Polymorphism discovery informatics

Polymorphism discovery informatics. Gabor T. Marth. Department of Biology Boston College Chestnut Hill, MA 02467. Various insertion-deletion type polymorphisms (INDELs) are also very common. Types of sequence variations.

eddiegarcia
Download Presentation

Polymorphism discovery informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Polymorphism discovery informatics Gabor T. Marth Department of Biology Boston College Chestnut Hill, MA 02467

  2. Various insertion-deletion type polymorphisms (INDELs) are also very common Types of sequence variations • Substitution-type single-nucleotide polymorphisms are the most abundant form of sequence variations

  3. systematic pattern of bi-allelism within the population examined Are all substitutions SNPs?

  4. includes the organization of sequences relative to each other, and determining if sequence differences are sequencing artifacts or true polymorphisms ? What is SNP discovery? • comparative analysis of multiple sequences from the same region of the genome (redundant sequence coverage)

  5. Sequence clustering Paralog identification (cluster refinement) Multiple alignment SNP detection Steps of SNP discovery

  6. different sequence types are radically different in terms of their accuracy genome sequence: 99.9 – 99.99% single pass sequence: 98-99% SNP discovery in diverse sequences • many different types of sequences are available for polymorphism discovery genome EST WGS BAC BAC-end restriction fragments • early methods of SNP discovery focused on specific sequence types

  7. General SNP mining – PolyBayes sequence clustering simplifies to database search with genome reference multiple alignment by anchoring fragments to genome reference paralog filtering by counting mismatches weighed by quality values SNP detection by differentiating true polymorphism from sequencing error using quality values

  8. Validation experiments show that the SNP probability or SNP score is accurate African Asian discard keep Caucasian The SNP score allows one to choose cutoff values that balance false positive rate and the recovery of rare SNPs Hispanic CHM 1 SNP validation • Pooled sequencing • Direct re-sequencing

  9. Random, shotgun reads from whole-genome libraries aligned to the genome reference sequence Genome-scale SNP mining projects • Overlaps of large-insert clone sequences

  10. aacgtttatgtgattaccagtaaattacggca aacgtttatgtgattcccagtaaattacggca person 1. aacgtttatgtgattaccagtaaattacggca aacgtttatgtgattcccagtaaagtacggca person 2. SNP genotyping • SNP discovery: which nucleotides in the genome are polymorphic? ag aacgtttatgtgatt|ccagtaaa|tacggca ct • SNP genotyping: which alleles does an individual carry at a nucleotide locus that is known to be polymorphic?

  11. heterozygous peak homoozygous peak Genotyping by sequence

  12. marker density “dense” “sparse” allele frequency “common” “rare” Genome variation landscape • nucleotide diversity on human chromosomes

More Related