1 / 14

MCB3895-004 Lecture # 11 Sept 30/14

MCB3895-004 Lecture # 11 Sept 30/14. De novo genome assembly using Velvet; using Perl to execute system commands. Velvet. Velvet was one of the first widely used de novo assembly programs http ://www.ebi.ac.uk/~ zerbino/velvet/Manual.pdf

Download Presentation

MCB3895-004 Lecture # 11 Sept 30/14

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MCB3895-004 Lecture #11Sept 30/14 De novo genome assembly using Velvet; using Perl to execute system commands

  2. Velvet • Velvet was one of the first widely used de novo assembly programs • http://www.ebi.ac.uk/~zerbino/velvet/Manual.pdf • It is still quite common, although frequently superseded

  3. Velvet step #1: De brujin graph construction Red bases = sequencing error http://en.wikipedia.org/wiki/Velvet_assembler

  4. Velvet step #2: graph simplification • Unbranched paths collapsed into single nodes http://en.wikipedia.org/wiki/Velvet_assembler

  5. Velvet step #3: tip removal • Dead end paths removed http://en.wikipedia.org/wiki/Velvet_assembler

  6. Velvet step #4: bubble popping • "Bubbles" caused by sequencing error resolved http://en.wikipedia.org/wiki/Velvet_assembler

  7. Running velvet: single end • Step #1: velveth to set up database $ velvethSRR826450_1 21 -short -fastqSRR826450_1.fastq • Parameters for directory name, kmer, read type, input file name • Step #2: velvetg to make graph $ velvetg SRR826450_1 -exp_covauto • Parameters for directory name, automatic coverage detection • Note: the max kmer size is set during software compilation. In our case the max is kmer=61 • Note: only odd kmer numbers are allowed

  8. Running velvet: paired end • In a fit of cussedness, velvet requires its paired-end data in a single interleaved file • Use shuffleSequences_fastq.pl • Copy from /opt/bioinformatics2/velvet_1.2.10_31kmer/contrib/shuffleSequences_fasta/ into your directory • Running velvet is otherwise similar $ velveth SRR826450 21 -shortPaired-fastqSRR826450_paired.fastq $ velvetgSRR826450 -exp_covauto

  9. Velvet output • contigs.fa - contains output contigs • If paired end, contigs are joined into scaffolds using strings of Ns • Log - contains what commands have actually been run along with parameters, results summary

  10. Today's assignment part 1 • Compare a single-end and paired-end assembly of SRR826450 • Use a small kmer (e.g., 21) to keep things computationally simple for now

  11. Running commands using Perl • Perl can be used to run terminal commands, just like Bash • Use the systemcommand to run whatever follows as a terminal command • e.g., system"ls -l > list"; # lists all of the files in the # directory where the script was # run and stores this information # in a new file "list"

  12. Using system to run repetitive analyses • A useful way to use this sort of command is in a loop to run the same program multiple times using slightly different parameters • e.g., for (my$kmer= 15; $kmer>= 31; $kmer=$kmer+ 2){ print"kmer=$kmer\n"; system"velvethinfile_$kmer $kmer-shortPaired-fastqinfile"; system"velvetginfile_$kmer-exp_cov auto"; }

  13. Hints for system loops • Use print statements can be helpful to keep track of loop • Run the loop first with all system commands converted to print statements • Run only a single iteration of the entire loop

  14. Today's assignment part 2 • Answer the question: what is the optimal kmer size to assemble the SRR826450 paired end reads? • For Thursday's class, read and be ready to discuss: • Magoc et al. 2013 Bioinformatics 29:1718-1725 • Bradnam et al. GigaScience 2:10

More Related