1 / 112

From Bauhaus to Bio-House

From Bauhaus to Bio-House. NATURE | VOL 422 | 24 APRIL 2003 |www.nature.com/nature. Dude, Where is My Genome?. Past Present & Future Of Genomics Technologies. Bud Mishra. Professor of Computer Science, Mathematics and Cell Biology ¦

devon
Download Presentation

From Bauhaus to Bio-House

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Bauhaus to Bio-House NATURE | VOL 422 | 24 APRIL 2003 |www.nature.com/nature

  2. Dude, Where is My Genome? Past Present & Future Of Genomics Technologies

  3. Bud Mishra Professor of Computer Science, Mathematics and Cell Biology ¦ Courant Institute, NYU School of Medicine, Tata Institute of Fundamental Research, and Mt. Sinai School of Medicine

  4. Tools of the trade Where we collect three important tools from biotechnology: scissors, glues and copiers…

  5. Scissors • Type II Restriction Enzyme • Biochemicals capable of cutting the double-stranded DNA by breaking two -O-P-O bridges on each backbone • Restriction Site: • Corresponds to specific short sequences: EcoRI GAATTC • Naturally occurring protein in bacteria…Defends the bacterium from invading viral DNA…Bacterium produces another enzyme that methylates the restriction sites of its own DNA Tools of the Trade

  6. Glue • DNA Ligase • Cellular Enzyme: Joins two strands of DNA molecules by repairing phosphodiester bonds • T4 DNA Ligase (E. coli infected with bacteriophage T4) • Hybridization • Hydrogen bonding between two complementary single stranded DNA fragments, or an RNA fragment and a complementary single stranded DNA fragment… results in a double stranded DNA or a DNA-RNA fragment Tools of the Trade

  7. Copier • DNA Amplification: • Main Ingredients: Insert (the DNA segment to be amplified), Vector (a cloning vector that combines with an insert to create a replicon), Host Organism (usually bacteria). Tools of the Trade

  8. Copier • PCR (Polymerase Chain Reaction): • Main Ingredients: Primers, Catalysts, Templates, and the dNTPs. Tools of the Trade

  9. Sir Ernest Rutherford “For Mike’s sake, Soddy, don’t call it transmutation. They’ll have our heads off as alchemists.” Rutherford, winner of 1908 Nobel prize for chemistry for cataloging alpha and beta particles… “All science is either physics or stamp collecting.”

  10. The Middle Way • Two Extremes: • Indexing: For each character ‘b’ in the genome, make a list of each position where it occurs. • Shotgunning: For each long sentence in the genome, select it with low probability (o(lgn/n)), and then read it reasonably accurately. • The Middle way: • Indexed-Shotgun: For each short word in the genome, select it with high probability (o(1)), and then measure its position and read it reasonably accurately. • Where is the middle???

  11. Outline: • Physical Mapping & Sequencing: • Map: • assign physical locations to important markers (e.g., restriction sites or hybridization probes). • Sequence: • align short sequence reads to the markers (map-based sequence assembly) or • align long sequence reads to each other (shotgun assembly) Array Mapping Optical Mapping Sequencing

  12. Array Mapping

  13. 6 5 4 3 2 1 1 2 3 4 5 6 Measuring distances: • A one dimensional “Buffon’s needle problem.” • Take two points on a line, and drop unit-length needles of some color. • The probability that the two points will have different colors monotonically increases with the distance between these two points • as distance increases from 0 to 1; • attains a fixed value for all distances konger than 1. • One can generalize by considering • More than two points…P points. • Dropping a small set of bichromatic needles… p p p Distance ¼ 3/6 = 0.5

  14. cX coverage subsample cX coverage subsample M High Coverage BAC Library cX coverage subsample cX coverage subsample The Experiments: • Probes are “points” • BACs are “needles” • Hybridization on an array simulates “dropping the bichromatic needles”

  15. A Mathematical Problem • A set of P points: {x1, x2, …, xP} µ [0,G] with pdf f(x) = 1/G i.i.d. for all x 2 [0,G] • Distance di,j = d(|xi –xj|), “measured” between two arbitrary points xi and xj = x. • Given O(P2) distances infer positions.

  16. Distance vs Observed

  17. Matrix-to-Line • Given a P £ P positive symmetric real-valued matrix D of “measured distances”. • The entry di,j» f(d |x). • Choose an embedding of the points: • {x’1, x2, …, x’P} ½ [0,G], • which maximizes a likelihood function • Õ1 · i, j · f(|x’i – x’j| | di,j)

  18. Bayes’ Formula

  19. Minimizing a Quadratic Cost Function

  20. A Physical Model P2 d1,2 d2,3 P1 P2 P3 P4 d2,4 P1 P3 d1,3 d3,4 d1,4 Mass-less Balls connected with springs of different stiffness… P4

  21. Algorithm Join • Consider measured distances of length L’ ·q L; Examine these distances in increasing order. • q2 (0,1) to be determined by the Chernoff bounds • Initially, every probe is a singleton contig. • Two operations: Join and Adjust either combines smaller contigs or improve an existing contig.

  22. Algorithm Adjust • Join and adjust locally minimizes the “log-likelihood cost function” • Local minimum of a weighted sum-of-square error function

  23. Algorithmic Complexity

  24. Yeast Mapping

  25. Data from One Experiment

  26. Map…

  27. Probe 111 Probe 79 Probe 101 Probe 85 Probe 95 Local Distances

  28. Optical Mapping

  29. Optical Approaches are Inherently Noisy! • Since many biological macromolecules are smaller than the Raleigh limit, the optical approaches involve attaching single fluorescent probes to specific macromolecules. • Controlling Noise: • Magnitude of Stoke-shift • Steric hinderance • Absorption cross-section • Point spread function (PSF) • Image Processing

  30. Optical Mapping • Capture and immobilize whole genomes as massive collections of single DNA molecules Cells gently lysed to extract genomic DNA DNA captured in parallel arrays of long single DNA molecules using microfluidic device Genomic DNA, captured as single DNA molecules produced by random breakage of intact chromosomes

  31. Optical Mapping 2. Interrogate with restriction endonucleases 3. Maintain order of restriction fragments in each molecule Digestion reveals 6-nucleotide cleavage sites as ”gaps”

  32. Optical Mapping 4. Determine size of fragments

  33. Optical Mapping 5. GENTIG Robust Bayesian Map Assembler to make whole-genome restriction map

  34. Computational Analysis Single DNA molecule on Optical Chip after digestion, staining • Image analysis software measures size and order of restriction fragments • Overlapping single molecule maps are aligned to produce a map assembly covering an entire chromosome

  35. Map Assembly Overlapping single molecule maps are aligned to produce a map assembly covering an entire chromosome

  36. Complexity Issues Various combinations of error sources lead to NP-hard Problems

  37. s1j s2j s3j sM,j sR3j sRM,j sR2j sR1j SMRM(Single Molecule Restriction Map) DRj Dj

  38. SMRM(Single Molecule Restriction Map)

  39. Problem 2 (Sizing Error)

  40. Problem 2 is NP Complete

  41. Example

  42. Probabilistic Analysis Where we design the experiments to generate easy instances of a difficult problem…

  43. Combinatorial Structure

  44. Flips & Flops

  45. + - - - Intuition

  46. Other Error Sources

  47. Discretization

  48. Sizing Error

More Related