1 / 21

Reconstructing Phylogenetic Trees for Ultra-Large Unaligned DNA Sequences with Hadoop

This research paper discusses the use of Hadoop for reconstructing phylogenetic trees from ultra-large unaligned DNA sequences. It explores the challenges of handling a large number of sequences and the difficulties in multiple sequence alignment. The experiments conducted focus on human mtGenome and 16s rRNA data, comparing the running time and average SP score between aligned and unaligned data. The software used for this research is HAlign, which is a fast multiple sequence alignment tool. The paper concludes with a discussion on the limitations and several complex issues in evolution that are ignored in this study.

moniqueb
Download Presentation

Reconstructing Phylogenetic Trees for Ultra-Large Unaligned DNA Sequences with Hadoop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reconstructing phylogenetic trees for ultra-large unaligned DNA sequences via with Hadoop Quan Zou(PH.D. & Prof.) Tianjin Univ, School of Computer zouquan@nclab.net http://cs.tju.edu.cn/faculty/zouquan/

  2. Background: why

  3. Phylogenetic Tree • Genome-Genome • Gene-Gene • Population Model Computation

  4. Background: challenge Too many sequences, Difficult to MSA

  5. Flow

  6. Flow---Clustering

  7. Flow---Clustering Sampling

  8. Flow---MSA

  9. A Trie Tree for a Sequence

  10. More tricks in MSA input sequences trie trees step 1 search final result step 2 update sum up

  11. Experiments • Data • Human mtGenome • 16s rRNA • Measurement • Running time • Average SP score (For MSA)

  12. Experiments---phylogenetic tree

  13. Experiments---MSA(mtDNA)

  14. Experiments---MSA(16s rRNA)

  15. Experiments Running time comparison between aligned and unaligned data

  16. Software http://datamining.xmu.edu.cn/software/halign/ Quan Zou, et al. HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment based on Center Star Strategy. Bioinformatics. Doi:10.1093/bioinformatics/btv177. http://datamining.xmu.edu.cn/software/Phylogenetic_tree/

  17. Discussion • Summary • MSA with Hadoop • NJ phylogenetic tree with Hadoop • From DNA to Protein • RNA secondary structure is ignored • Several complex issues in evolution are ignored

  18. Thanks ! • Quan Zou(PH.D. & Prof.) • Tianjin Univ, School of Computer • zouquan@nclab.net • http://cs.tju.edu.cn/faculty/zouquan/

More Related