CSE 746 – Introduction to Bioinformatics Research Project Two methods of DNA Sequencing – Comparing and Intertwining Suffix Trees and De Bruijn Graphs for Sequence Assembly Dicle Öztürk 540110004. Suffix trees - Definition . Definition (Gusfield). Suffix trees – uses and complexity.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Two methods of DNA Sequencing – Comparing and Intertwining Suffix Trees and De Bruijn Graphs for Sequence Assembly
Assuming a bounded alphabet, this algorithm runs in O(m^2) time.
N1 : root (initially a leaf)
Ni : assumption
Ni+1 : inductive string constructed using Ni
find the longest path from root whose label matches a prefix of S[i+1...m]$
(Matching path is unique because no two edges out of a node can have labels that begin with the same char)
if no further match is possible
if in the middle of an edge (u,v)
split the edge into two
insert a new node w just after the last char on the edge that matched a char in S[i+1...m] (before the char mismatched)
label the edges (u,w) and (w,v) accordingly
create a new edge (w,i+1), thus creating a new leaf (i+1)
label (w,i+1) with the unmatched part of the suffix S[i+1...m]
In the notes of (Lewis, usask.ca), some general applications of suffix trees in computational biology are mentioned,
Conceptually, the De Bruijn graph of a sequence can be considered as a simplification of that sequence's affix tree.
Furthermore it says,
If we rank the nodes by distance from the root, the k-mer nodes of the De Bruijn graph correspond to the nodes of rank k in the affix tree
It is easy to demonstrate that two k-mers are connected in the De Bruijn graph iff the corresponding nodes in the affix tree are connected by a path composed of an edge and a suffix link, going through a node of rank k+1
 – Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Dan Gusfield, Cambridge University Press, Jan 15, 1997.
 – Ukkonen E., On-line Construction of Suffix-Trees, Algorithmica vol 14(3), 1995.
 - Algorithms on Strings, Maxime Crochemore and Christophe Hancart, Cambridge University Press, June 2007.
 – Genome assembly and comparison using de Bruijn graphs, D.R. Zerbino, PhD Thesis, European Bioinformatics Institute, Darwin College, September, 2009.
 – Giegerich R., and Kurtz S., From Ukkonen to McCreight and Weiner: A unifying view of linear-time sufﬁx tree construction, Algorithmica 19:331–353, 1997
 – Maaß, M. G., Linear bidirectional on-line construction of afﬁx trees, Algorithmica vol. 37(1), 2003.
 – Bieganski, P., Riedl, J., Cartis, J.V., Retzel, E.F., Generalized Suffix Trees for Biological Sequence Data: Implementations and Applications, HICSS (5), 1994.