1 / 41

CSE280a: Projects

CSE280a: Projects. Vineet Bafna. Project Logisitics. Research project (70%) Work individually, or in groups of 2 Two presentations: Introductory presentation: Feb 1st week (20 minutes) (20% grade) Describe the goals of the project Describe your (computational) formulation

cleo
Download Presentation

CSE280a: Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE280a: Projects Vineet Bafna Vineet Bafna

  2. Project Logisitics • Research project (70%) • Work individually, or in groups of 2 • Two presentations: • Introductory presentation: Feb 1st week (20 minutes) (20% grade) • Describe the goals of the project • Describe your (computational) formulation • Summarize/critique reading assignment • Present an algorithm • Constructive criticism of other projects • One on one meeting with instructor (end February) (10% grade) • Discuss preliminary results • Final presentation (last 2-3 classes): (30% grade) • Submit a final report • Final presentation Vineet Bafna

  3. Project 1: disease gene mapping • Recall, Linkage Disequilibrium • In the absence of recombination, • Correlation between columns • The joint probability Pr[A=a,B=b] is different from P(a)P(b) • With extensive recombination • Pr(a,b)=P(a)P(b) Vineet Bafna

  4. Measures of LD • Consider two bi-allelic sites with alleles marked with 0 and 1 • Define • P00 = Pr[Allele 0 in locus 1, and 0 in locus 2] • P0* = Pr[Allele 0 in locus 1] • Linkage equilibrium if P00 = P0* P*0 • D = abs(P00 - P0* P*0) = abs(P01 - P0* P*1) = … Vineet Bafna

  5. LD can be used to map disease genes • LD decays with distance from the disease allele. • By plotting LD, one can short list the region containing the disease gene. LD D N N D D N 0 1 1 0 0 1 Vineet Bafna

  6. Multiple loci • In complex diseases, multiple loci interact to confer disease susceptibility LD D N N D D N 0 0 1 0 0 1 0 1 1 0 0 0 Vineet Bafna

  7. Testing for multiple loci • Assume SNP matrix with n individuals, m loci. Testing for all sets of 5 SNPs implies a huge number of computations? • Can you come out with computational strategies that can speed it up? Vineet Bafna

  8. Speeding up multiple locus computations • A filtering strategy? • Input: a SNP matrix with one or more pairs that interactively associate • Output: a set of SNP pairs that includes the interacting pair(s). • Method should be fast, and should NOT consider all pairs. Vineet Bafna

  9. Speeding up the computations • Correlated SNPs should also have low hamming distance. • Random SNPs should have high hamming distance. • Strategy: select k individuals at random. • Hash each individual restricted to k individuals • Correlated SNPs should fall in the same bin with high probability 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 1 1 k=2 Vineet Bafna

  10. Project 2: mtDNA phylogeny • In the absence of recombination, the history of mitochondrial DNA can be expressed by a tree. • The goal of this project is to build a robust phylogeny using a heuristic modification of the perfect phylogeny. Vineet Bafna

  11. The Genographic project • The genographic project aims to trace geographic origins of the human race using mitochondrial DNA. https://www3.nationalgeographic.com/genographic/atlas.html Vineet Bafna

  12. Without recurrent mutations • Unique tree can explain the evolutionary history 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 r 1 3 E 2 B 5 4 D A C Vineet Bafna

  13. With recurrent mutations 2 • Adding another individual F destroys perfect phylogeny • Why? • It is not so easy to place F • Can you suggest a strategy? 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 F 0 1 0 0 0 r 1 3 E 2 B 5 4 1 D F A C Vineet Bafna

  14. Tests of Selection • In class, we have discussed alleles that can be selectively neutral, or under active selection • Active selection may be positive or negative • How do we identify regions under positive, or negative selection? • Balancing selection: sometimes it is helpful for a population to Vineet Bafna

  15. Adaptive Selection • Selection leads to loss of heterozygosity (will be explained in detail in the next lecture). • Can you come up with a test for selection? Vineet Bafna

  16. Balancing selection • Sometimes both alleles are useful in a population, and it helps to have both around • A simple example is when diversity is important (the two variants help maintain diversity) • Bipolar disorder genes could be under balancing selection • High creativity which might confer some selective/reproductive advantage. • Depression offers a disadvantage • If so, the tests for this disorder might be tricky. • How can we identify regions under balancing selection? Vineet Bafna

  17. Testing for Balancing Selection • Adaptive selection leads to loss of heterozygosity (will be explained in detail in the next lecture). • Balancing selection leads to two dominant haplotypes • Can you come up with a test for balancing selection? Vineet Bafna

  18. Project: Primer design for cancer genomics Vineet Bafna

  19. The Science behind Gleevec Fusions • observed in leukemia, lymphoma, and sarcomas • “Philadelphia Translocation” • Drugs target this fusion protein Vineet Bafna

  20. Fluoroscent in situ hybridization • Cancer genomes show extensive structural variation Vineet Bafna

  21. Assaying for tumor variants • Most tumors start off with a single cell, which then proliferate. • Drugs like Gleevec are used well after cancer has taken hold. • Can we detect the cancer early by detecting the genomic abnormality? • If a very few cells in the person are cancerous, can we still detect it? • Can we track a patient through his treatment? Vineet Bafna

  22. Cancer genomics • In cancers, large genetic changes can occur, including deletions, inversions, and rearrangements of genomes • In the early stages, only a few cells will show this deletion Vineet Bafna

  23. Polymerase Chain Reaction • PCR is a technique for amplifying and detecting a specific portion of the genome • Amplification takes place if the primers are ‘appropriate’ distance apart (<2kb) Vineet Bafna

  24. Assaying for Rare Variants • PCR can be used to assay for a given genomic abnormality, even in a heterogenous population of tumor and normal cells Detection PCR Extract Genomic DNA Distance too large for amplification Tumor cell Vineet Bafna

  25. Variant Variants • What if the variant is the minority in the cell population? • What if deletion boundaries are uncertain? Patient A Deletion Patient B Deletion Patient C Deletion Vineet Bafna

  26. Observed variation in deletion size Sizes of homozygous deletions in cell lines from different human cancers. (scale is in megabases). Vineet Bafna

  27. Primer Approximation Multiplex PCR (PAMP)* • Multiple primers are optimally spaced, flanking a breakpoint of interest • Upstream of breakpoint, forward primers • Downstream of breakpoint, reverse primers • The primers are run in a multiplex PCR reaction • Any pair can form a viable product Patient B Patient C Deletion Deletion Vineet Bafna

  28. Experimental Design (500Kb region) • 10 sets of 25 primers: upstream and downstream • 250 upstream • 250 downstream • Primer-pairs closest to breakpoint amplified • Assay by oligo array Goal: Computational selection of an ‘optimal’ primer set Vineet Bafna

  29. Goal • Input, a collection of primers • Identify a subset of primers that do not cross-hybridize, are unique, yet cover the region completely • Use combinatorial optimization, simulated annealing, integer linear programming….. Vineet Bafna

  30. Spectral Networks Algorithms for De Novo Interpretation of Tandem Mass Spectra Nuno Bandeira, Ph.D. Department of Computer Science and Engineering, University of California, San Diego ProtIG seminar series September 21, 2007 Vineet Bafna

  31. Proteins and their modifications Proteins are fundamental players in the regulation of biological processes. encodes for DNA Proteins regulate Knowing proteins involves knowing many things. This dissertation focuses on: - Identification - Sequencing - Post-translational modifications ( ) Vineet Bafna

  32. Protein sequences and modifications SRLEM ILGF Mass( )=16 Mass(M )=147 From a computational perspective, a protein can be represented as a string over a weighted alphabet: Protein sequence: …AFSRLEMILGF… AFSRL Subsequences are called peptides (obtained via enzymatic digestion) SRLEMILGF EMILG Modifications change amino acid masses: SRLEMILGF Mass(M)=131 Mass(SRLEMILGF)=1047 Mass(SRLEM ILGF)=1063 Vineet Bafna

  33. Nobel prize in chemistry, 2002 Vineet Bafna

  34. What is mass spectrometry? Vineet Bafna http://nobelprize.org/chemistry/laureates/2002/chemadv02.pdf

  35. Tandem Mass Spectrometry (MS/MS) Modified peptide LARG*E PM …THISISAVERYLARGESAMPLEPRTEINSEQENCE… Protein Sequence: Modification: any event that changes the mass at a specific site. Peptide LARGE : b : b y: : y MS/MS spectrum Vineet Bafna

  36. Example of a real MS/MS spectrum b10 Symmetric y12 Vineet Bafna

  37. Tandem Mass Spectrometry (MS/MS) Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, SEQUENCE, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. Database ofknown peptidesMDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, SEQUENCE, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. Database search De novo sequencing u e q q f q s u u e e n n n e e c s s n e e q c c e e u s e e c Peptide SEQUENCE Peptides Tandem Mass Spectrometry Enzymatic digestion Proteins Large set of MS/MS spectra … … Vineet Bafna

  38. Mixture spectra Sometimes, the instrument generates a single spectrum from two or more peptides: Peptide B: ALDDILNLK Peptide A: NLAFFQLR ? Vineet Bafna Mixture spectrum

  39. How to identify mixture spectra? Vineet Bafna

  40. Proposed approach • When identifying a mixture spectrum of peptides A,B, assume you have non-mixture spectra for the same peptides. • Compare the non-mixture spectra of known peptides to putative mixture spectra to determine peptide identifications Vineet Bafna

  41. Project description • Implement an algorithm to identify mixture spectra from pairs of peptides by combining previously identified spectra from isolated peptides. • Test the above implementation by simulating mixture spectra using an existing database of spectra from isolated peptides. • Propose a scoring procedure to separate correct from false identifications. Nuno Bandeira bandeira@ucsd.edu Vineet Bafna

More Related