1 / 7

Assembly S.O.P.

Overlap Layout Consensus. Assembly S.O.P. Reference Assembly. Align reads to a reference sequence ??? PROFIT!!!!!. Reference Assembly by Newbler. from The Genome Sequencer Data Analysis Software Manual, p.147

monty
Download Presentation

Assembly S.O.P.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overlap Layout Consensus Assembly S.O.P.

  2. Reference Assembly • Align reads to a reference sequence • ??? • PROFIT!!!!!

  3. Reference Assembly by Newbler from The Genome Sequencer Data Analysis Software Manual, p.147 • For each read, search for a suitable alignment, or alignments, of the read to the reference sequence(s) (a read may align to multiple positions in the reference sequence); this is done in "nucleotide" space • Construct contigs and compute a consensus basecall sequence from the signals of the aligned reads (performed in "flowspace") • Identify the positions in the aligned reads (consensus) that differ from the reference sequence(s); alternatively, identify subsets of the aligned reads that are identical within each subset but differ between subsets (these are the "putative differences") • Evaluate the list of putative differences to identify High-Confidence differences • Output the following information: • contig consensus sequence(s) and associated quality values; • alignments of the reads and contigs to the reference, position-by-position metrics of the depth and consensus accuracy (quality values) for each position in the aligned reference; • and the positions and alignments of identified differences

  4. Reference Assembly by AMOScmp • AMOS Is Not An Assembler • AMOScmp uses NUCmer to align reads to a reference sequence

  5. #!/usr/local/bin/amos-2.0.4/bin/runAmos -C # `AMOScmp' - The AMOS Comparative Assembler Pipeline #--------------------------------------- USER DEFINED VALUES ------------------# TGT = $(PREFIX).afg REF = $(PREFIX).1con #------------------------------------------------------------------------------# BINDIR=/usr/local/bin/amos-2.0.4/bin NUCMER=/usr/local/bin/MUMmer3.21/nucmer SEQS = $(PREFIX).seq BANK = $(PREFIX).bnk ALIGN = $(PREFIX).delta LAYOUT = $(PREFIX).layout CONFLICT = $(PREFIX).conflict CONTIG = $(PREFIX).contig FASTA = $(PREFIX).fasta INPUTS = $(TGT) $(REF) OUTPUTS = $(CONTIG) $(FASTA) ## Building AMOS bank 10: $(BINDIR)/bank-transact -c -z -b $(BANK) -m $(TGT) ## Collecting clear range sequences 20: $(BINDIR)/dumpreads $(BANK) > $(SEQS) ## Running nucmer 30: $(NUCMER) --maxmatch --prefix=$(PREFIX) $(REF) $(SEQS) ## Running layout 40: $(BINDIR)/casm-layout -U $(LAYOUT) -C $(CONFLICT) -b $(BANK) $(ALIGN) ## Running consensus 50: $(BINDIR)/make-consensus -B -b $(BANK) ## Outputting contigs 60: $(BINDIR)/bank2contig $(BANK) > $(CONTIG) ## Outputting fasta 70: $(BINDIR)/bank2fasta -b $(BANK) > $(FASTA) The AMOScmp pipeline script

  6. NUCmer • MUM: maximal unique matches • A MUM is a subsequence that occurs in two exactly matching copies, once in each input sequence, and that cannot be extended in either direction

  7. NUCmer alignment procedure • Create a map of all contig positions within each of the multi-fasta files • Concatenate the two files separately • Run MUMmer to find all exact matches between the two genomes. • Map the resulting matches back to the separate contigs. • Run a clustering algorithm for all the MUMs along each contig. MUMs are clustered together if they are separated by no more than a user-specified distance. • Run a modified Smith-Waterman dynamic programming alignment algorithm to align the sequences between the MUMs. In order to avoid excessive computation in this step, the algorithm permits only limited mismatches in these gaps between MUMs. The exact amount of mismatch is specified by the user.

More Related