1 / 7

Genome Assembly

Genome Assembly. Charles Yan 2008. Fragment Assembly. Given a large number of fragments, such as ACC AC AT AC AT GG … , the goal is to figure out the original sequence that consists of each and every of the fragment. Overlaps.

cornelia
Download Presentation

Genome Assembly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome Assembly Charles Yan 2008

  2. Fragment Assembly • Given a large number of fragments, such as ACC AC AT AC AT GG … , the goal is to figure out the original sequence that consists of each and every of the fragment.

  3. Overlaps • The overlap between string T and S is the longest suffix of S that is also the prefix of T. S=ATCGATCCG T=CGATCCGATTAT overlap(T, S)= CGATCCG

  4. A Simplified Problem Shortest common superstring problem: Given a set of strings, to find a minimal length string S that each and every one of the input strings appears as a substring of S.

  5. Directed Graph Model • Nodes: Each input fragment is a node. (Each node is labeled with an input fragment) • Edge(v,w) is labeled with overlap (W,V), where W and V are the node labels of w, and v respectively. The edge weight is |overlap (W,V)|. • To find a superstring is to find a directed path that traverse each and every node once (Hamilton path problem) • Shortest superstring: A Hamilton path with the maximal sum of edge weight.

  6. Directed Graph Model • NPC • No efficient solution that can give accurate results for all cases • Heuristic

  7. Genome Assembly Difficulties Repeats Bidirectional nature of DNA Errors

More Related