1 / 33

Graph Theory And Bioinformatics Jason Wengert

Graph Theory And Bioinformatics Jason Wengert. Outline. Introduction to Graphs Eulerian Paths & Hamiltonian Cycles Interval Graph & Shape of Genes Sequencing DNA Shortest Superstring Problem Sequencing by Hybridization (SBH) SBH as Hamiltonian Cycle Problem SBH as Eulerian Path Problem.

kaleb
Download Presentation

Graph Theory And Bioinformatics Jason Wengert

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph Theory And Bioinformatics Jason Wengert

  2. Outline Introduction to Graphs Eulerian Paths & Hamiltonian Cycles Interval Graph & Shape of Genes Sequencing DNA Shortest Superstring Problem Sequencing by Hybridization (SBH) SBH as Hamiltonian Cycle Problem SBH as Eulerian Path Problem

  3. Graph Theory A graph is a set of vertices connected by edges Edges connect two vertices and can be weighted, directed or undirected

  4. The Bridge Obsession Problem Find a tour crossing every bridge just once Leonhard Euler, 1735 Bridges of Königsberg

  5. Eulerian Cycle Problem Find a cycle that visits everyedge exactly once Linear time Solutions exist if each vertex has even degree For a path, the start and end vertices can have odd degree More complicated Königsberg

  6. Hamiltonian Cycle Problem Find a cycle that visits every vertex exactly once NP – complete Equivalent to Travelling Salesperson Problem Game invented by Sir William Hamilton in 1857

  7. Interval Graph Vertices are segments on a line Edges exist between two vertices if any part of their segments overlap

  8. Seymour Benzer and T4 phages Benzer studied mutated bacteriophages Non-mutated T4 phages kill bacteria The mutants had continuous segments missing from their DNA These mutants lost the ability to kill the bacteria Two mutant strains with non-overlapping segments missing can kill bacteria together

  9. Complementation between pairs of mutant T4 bacteriophages

  10. Interval Graph: Linear Genes

  11. Interval Graph: Branched Genes

  12. Sequencing DNA Sanger Method: Make copies of fragments of different length by nucleotide starvation during copying Separate the fragments via electrophoresis Each starvation experiment produces a ladder of fragments of various lengths Generate several-hundred-nucleotide fragment sequences (reads) from these ladders

  13. Sanger Method: Generating Read Start at primer (restriction site) Grow DNA chain Include ddNTPs Stops reaction at all possible points Separate products by length, using gel electrophoresis

  14. Combining Reads Need a method to take the reads and combine them into a genome You don't know where the reads came from in the DNA

  15. Shortest Superstring Problem Problem: Given a set of strings, find a shortest string that contains all of them Input: Strings s1, s2,…., sn Output: A string s that contains all strings s1, s2,…., sn as substrings, such that the length of s is minimized Complexity: NP – complete Note: this formulation does not take into account sequencing errors

  16. Shortest Superstring Problem: Example

  17. Reducing SSP to TSP Define overlap ( si, sj ) as the length of the longest prefix of sj that matches a suffix of si. aaaggcatcaaatctaaaggcatcaaa aaaggcatcaaatctaaaggcatcaaa aaaggcatcaaatctaaaggcatcaaa Construct a graph with n vertices representing the n strings s1, s2,…., sn. Insert edges of length overlap ( si, sj ) between vertices siand sj. Find the shortest path which visits every vertex exactly once. This is the Traveling Salesman Problem (TSP), which is also NP – complete.

  18. Sequencing By Hybridization The presence of a target DNA sequence can be determined by creating a probe, a single-strand fragment with the complementary sequence If the target sequence exists in the sample, it will hybridize with the probe

  19. How SBH Works Attach all possible DNA probes of length l to a flat surface, each probe at a distinct and known location. This set of probes is called the DNA array. Apply a solution containing fluorescently labeled DNA fragment to the array. The DNA fragment hybridizes with those probes that are complementary to substrings of length l of the fragment.

  20. How SBH Works (cont’d) Using a spectroscopic detector, determine which probes hybridize to the DNA fragment to obtain the l–mer composition of the target DNA fragment. Apply the combinatorial algorithm (below) to reconstruct the sequence of the target DNA fragment from the l – mer composition.

  21. Hybridization on DNA Array

  22. Spectrum Spectrum(s,l) is the multiset of all l-mers that appear in s Example: Spectrum(TATGGTGC, 3) = {TAT, ATG, TGG, GGT, GTG, TGC}

  23. The SBH Problem Goal: Reconstruct a string from its l-mer composition Input: A set S, representing all l-mers from an (unknown) string s Output: String s such that Spectrum ( s,l ) = S

  24. SBH As Hamiltonian Path Vertices of the graph are every l-mer in the spectrum A directed edge leads from vertex u to vertex v if the last l-1 characters of u equal the first l-1 characters of v. S = { ATG TGG TGC GTG GGC GCA GCG CGT }

  25. SBH: Hamiltonian Path Approach S = { ATG TGG TGC GTG GGC GCA GCG CGT } Path 1: ATGCGTGGCA Path 2: ATGGCGTGCA

  26. SBH As Eulerian Path Vertices are (l-1)-mers Edges represent the l-mers from the spectrum CG GT TG CA AT GC GG S = { ATG, TGC, GTG, GGC, GCA, GCG, CGT }

  27. SBH: Eulerian Path Approach S = { AT, TG, GC, GG, GT, CA, CG } corresponds to two different paths: CG CG GT GT TG AT TG GC AT GC CA CA GG GG ATGGCGTGCA ATGCGTGGCA

  28. Euler Theorem A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing edges: in(v)=out(v) Theorem: A connected graph is Eulerian if and only if each of its vertices is balanced.

  29. Constructing an Eulerian Cycle Start with an arbitrary vertex and form an arbitrary cycle until you reach a dead end Because the graph is Eulerian, this dead end will be the starting point If the resulting cycle is not Eulerian, select a vertex from the cycle with untraversed edges, and repeat step 1 starting there. Combine the two cycles and repeat step 2.

  30. Constructing an Eulerian Cycle

  31. Some Difficulties with SBH Fidelity of Hybridization: difficult to detect differences between probes hybridized with perfect matches and 1 or 2 mismatches Array Size: Effect of low fidelity can be decreased with longer l-mers, but array size increases exponentially in l. Array size is limited with current technology. Practicality: SBH is still impractical. As DNA microarray technology improves, SBH may become practical in the future Practicality again: Although SBH is still impractical, it spearheaded expression analysis and SNP analysis techniques

  32. Summary Introduction to Graphs Eulerian Paths & Hamiltonian Cycles Interval Graph & Shape of Genes Sequencing DNA Shortest Superstring Problem Sequencing by Hybridization (SBH) SBH as Hamiltonian Cycle Problem SBH as Eulerian Path Problem

  33. References Jones, N. and Pevzner, P. An introduction to Bioinformatics Algorithms. Some slides courtesy of the book's website at bix.ucsd.edu/bioalgorithms/slides.php

More Related