1 / 20

DNA 多序列比对中的算法技术和并行方法

DNA 多序列比对中的算法技术和并行方法. 邹权 (PH.D.&Professor) 天津大学 计算机科学与技术学院 2015.12. Multiple Sequence Alignment(MSA): What & Where. Different from Mapping, Assembly, BLAST. Multiple Sequence Alignment(MSA): What & Where.

leeson
Download Presentation

DNA 多序列比对中的算法技术和并行方法

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DNA多序列比对中的算法技术和并行方法 邹权 (PH.D.&Professor) 天津大学 计算机科学与技术学院 2015.12

  2. Multiple Sequence Alignment(MSA): What & Where • Different from Mapping, Assembly, BLAST

  3. Multiple Sequence Alignment(MSA): What & Where • Different from Mapping, Assembly, BLAST • BLAST: Basic Local Alignment Search Tool Output Database Query

  4. Multiple Sequence Alignment(MSA): What & Where Output input

  5. Multiple Sequence Alignment(MSA): What & Where Multiple Sequence Alignment Phylogenetic tree Virus sequences Multiple DNA Sequence Alignment Population SNV calling Multiple SimilarDNA Sequence Alignment … Application Our Focus

  6. Techniques for similar DNA MSA 1. k-band Dynamic Programming K-band -4 -5 0 -1 -1

  7. Techniques for similar DNA MSA 2. Center star strategy S3 S1 S1 S3 S5 S2 S4 S2 S4 S5 tree alignment Center star strategy

  8. Center Star for Multiple Sequence Alignment input sequences step 1 … search final result step 2 update sum up

  9. How to set k for k-band?

  10. Detecting the matching region with Trie S=AGACGTAGCCTAGCAGCCCGTACT S1=AGACGT S2=AGCCTA S3=GCAGCC S4=CGTACT T=AGACCTAGCTAGCAGCCCGTACACT

  11. Center Star for Multiple Sequence Alignment input sequences trie trees step 1 … search final result step 2 update sum up

  12. From Trie to Suffix Tree Trie Suffix Tree S1=AGACGTAGCCTAGCAGCCCGTACT S2= GACGTAGCCTAGCAGCCCGTACT S3= ACGTAGCCTAGCAGCCCGTACT S4= CGTAGCCTAGCAGCCCGTACT S5= GTAGCCTAGCAGCCCGTACT S6= TAGCCTAGCAGCCCGTACT S7= AGCCTAGCAGCCCGTACT S1=AGACGT S2=AGCCTA S3=GCAGCC S4=CGTACT …

  13. Greedy search with suffix tree S=GTCCGAAGCTCCGG (1,1,4) (5,6,9) T=GTCCTGAAGCTCCGT 1234567890123456

  14. Extreme MSA for Very Similar DNA Sequences input sequences step 1 … search final result step 2 update sum up

  15. Experiments • 100 human mitochondria genome sequences • 16k length (1555KB) • Our output 1558KB • ClustalΩ 1627KB

  16. Discuss: How to measure the similarity? • Global alignable • pairwise • multiple • Prove • optimization Extreme center star Global alignable

  17. MapReduce for Center Star Frame

  18. Software http://datamining.xmu.edu.cn/software/halign/ http://lab.malab.cn/soft/halign/ Quan Zou, Qinghua Hu, Maozu Guo, Guohua Wang. HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment Based on the Centre Star Strategy. Bioinformatics. 2015,31(15): 2475-2481

More Related