1 / 42

Computational Biology

Computational Biology. Dr. Jens Allmer. Lecture Slides Week 3. Sequence Alignment. Exact simple Approximate More difficult. target. pattern. target. pattern. Sequence Alignment. Exact pattern matching Naive method aligns pattern with each location of the target

lecea
Download Presentation

Computational Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Biology Dr. Jens Allmer Lecture Slides Week 3

  2. Sequence Alignment • Exact • simple • Approximate • More difficult target pattern target pattern

  3. Sequence Alignment • Exact pattern matching • Naive method aligns pattern with each location of the target • Boyer-Moore indexes the pattern to skip some alignments • Wu-Manber indexes many patterns and skips some alignments • Indexing • Suffix tree indexes target and then quickly finds each pattern • Many other methods

  4. Sequence Alignment • Approximate pattern matching • Pairwise • Local • Smith Waterman • BLAST • FASTA • Global • Needlemann Wunsch • Multiple • T-Coffee • ClustalW • ...

  5. Basic Local Alignment Seach Tool • Input • Pattern • Target • Search parameters and settings • Output • Alignments in various formats • XML • Help • http://www.ncbi.nlm.nih.gov/books/NBK1763/

  6. BLAST • Target • Needs to be indexed • Cannot be FASTA • Must fit to the pattern and BLAST variant • protein target and protein pattern can be searched using blastp • Target indexing • makeblastdb, in the BLAST package can index FASTA files • Needs sequence input (e.g. FASTA, asn.1) • Needs sequence type to be provided e.g.: protein

  7. BLAST • blastp • Needs indexed database • Needs query sequence (can be unindexed FASTA) • Produces alignments

  8. Blast flavors Query: DNA Protein DB: DNA Protein • BlastN- nt versus nt database • BlastP- protein versus proteindatabase • BlastX- translated nt (6 frames) versus protein database • tBlastN - protein versus translated nt database (6 frames) • tBlastX - translated nt versus translated nt database (both 6 frames)

  9. BLAST Output • XML • -outfmt 5 • This switch leads to XML output

  10. End Theory I • 5 min mindmapping • 10 min break

  11. Download Blast • http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download • Get blastp and makeblastdb from mbg404 since you are not allowed to install anything • Download a Fasta file (protein, genome, collection of sequences in fasta format) • Database must consist of amino acids since we only have access to blastp today • Use makeblastdb from the Blast package to index the file • Several files will be created when you do it right

  12. MakeDB • Example • makeblastdb -in seq.fasta -dbtype prot -out seqBl –title seqBlastDB • More information? • Go to the doc folder of BLAST • Documentation is there • http://www.ncbi.nlm.nih.gov/books/NBK1763/

  13. BLAST • Now that we have an indexed database try to run BLAST • Read documentation and try to solve the simplest case • You will need the indexed database and you will need a FASTA file as query • You could create queries from the database and slightly change them • Good luck

  14. End Practice I • 15 min break

  15. Theory II

  16. Mass Spectra Recording (e.g. Triple Play) 4500 4505

  17. Fragmentation Spectrum

  18. MS/MS spectra • MS/MS spectra can be assigned a peptide sequence (PSM) • Database search • De novo sequencing

  19. PepNovo • Performs de novo sequencing of MS/ MS spectra • Takes a single spectrum as input • Needs a mathematical model for its evaluation • Will display the results in the console • You will therefore need to redirect the output • Example • ?>PepNovo.exe -dta MSMSSpectrum.dta -model tryp_model.txt

  20. De Novo Sequencing LY D E E L Q A I A K KA I A Q L E E D Y L 1016.4 901.6 772.4 901.6 129.2 114.8 ~ E (129.1) ~ D (115.02) E D E D

  21. MS/MS spectra • MS/MS spectra can be assigned a peptide sequence (PSM) • Database search • De novo sequencing

  22. Correlation 6 5 4 3 2 1 0 -0.10 -0.05 0.00 0.05 0.10 Database selection >1080ZR IAAYPGVSPGLMIHYNIGR >1137RZ AAYPGATQPGATELARRLGK >1152RZ GSGDAAYPGGPFFNLFNLGK >1152ZR GSGDAAYPGGPFFNLFNLGK >2360RZ VDSGWGGVVVVALAPYNLGR >240RZ HPGVVCRPGRGGGCSRHIGK HPGVVCCSRHRRSHTIGK

  23. Initalization Files • X!Tandem • Taxonomy.xml • Default_Input.xml • Input.xml • Running X!Tandem • ?>tandem.exe input.xml • That was easy • But behold, what about the input?

  24. Taxonomy XML <?xml version="1.0" ?> <bioml label="x! taxon-to-file matching list"> <taxon label="chlamy"> <file format="peptide" URL="test_chlre2.fasta.pro" /> </taxon> </bioml>

  25. Input.xml <?xml version="1.0" ?> <bioml> <note>Each one of the parameters for x! tandem is entered as a labeled note node. Any of the entries in the default_input.xml file can be over-ridden by adding a corresponding entry to this file. This file represents a minimum input file, with only entries for the default settings, the output file and the input spectra file name. See the taxonomy.xml file for a description of how FASTA sequence list files are linked to a taxon name.</note> <note type="input" label="list path, default parameters">default_input.xml</note> <note type="input" label="list path, taxonomy information">taxonomy.xml</note> <note type="input" label="protein, taxon">chlamy</note> <note type="input" label="spectrum, path">test_spectra.mgf</note> <note type="input" label="output, path">output.xml</note> </bioml> Another input file Personally, I don’t approve of the XML used here

  26. Default-input XML <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="tandem-input-style.xsl"?> <bioml> <note>list path parameters</note> <note type="input" label="list path, default parameters">default_input.xml</note> <note>This value is ignored when it is present in the default parameter list path.</note> <note type="input" label="list path, taxonomy information">taxonomy.xml</note> <note>spectrum parameters</note> <note type="input" label="spectrum, fragment monoisotopic mass error">0.4</note> <note type="input" label="spectrum, parent monoisotopic mass error plus">100</note> <note type="input" label="spectrum, parent monoisotopic mass error minus">100</note> <note type="input" label="spectrum, parent monoisotopic mass isotope error">yes</note> <note type="input" label="spectrum, fragment monoisotopic mass error units">Daltons</note> <note>The value for this parameter may be 'Daltons' or 'ppm': all other values are ignored</note> <note type="input" label="spectrum, parent monoisotopic mass error units">ppm</note> <note>The value for this parameter may be 'Daltons' or 'ppm': all other values are ignored</note> <note type="input" label="spectrum, fragment mass type">monoisotopic</note> <note>values are monoisotopic|average </note> <note>spectrum conditioning parameters</note> <note type="input" label="spectrum, dynamic range">100.0</note> <note>The peaks read in are normalized so that the most intense peak is set to the dynamic range value. All peaks with values of less that 1, using this normalization, are not used. This normalization has the overall effect of setting a threshold value for peak intensities.</note> <note type="input" label="spectrum, total peaks">50</note> <note>If this value is 0, it is ignored. If it is greater than zero (lets say 50), then the number of peaks in the spectrum with be limited to the 50 most intense peaks in the spectrum. X! tandem does not do any peak finding: it only limits the peaks used by this parameter, and the dynamic range parameter.</note> <note type="input" label="spectrum, maximum parent charge">4</note> <note type="input" label="spectrum, use noise suppression">yes</note> <note type="input" label="spectrum, minimum parent m+h">500.0</note> <note type="input" label="spectrum, minimum fragment mz">150.0</note> <note type="input" label="spectrum, minimum peaks">15</note> <note type="input" label="spectrum, threads">1</note> <note type="input" label="spectrum, sequence batch size">1000</note> <note>residue modification parameters</note> ........ </bioml>

  27. Beautifying XML • XML • Only describes data • Formatting of XML • Additional files can be linked to beautify the display • Transformation (XSLT) • Translates XML into HTML • XML Styling (CSS) • Describes formatting to the elements and attributes used in the XML file • Both files need to be linked at the beginning of the XML file

  28. XML • What is an element? • What is an attribute? • Design a Person • What are attributes of a person? • Use elements for logical grouping • Use attributes for specific information

  29. Styling • Connect the example style • Nothing will be styled ;) • Examine the CSS file and rename the styles such that your person XMLwill be somewhat styled

  30. End Theory II • 5 min mindmapping • 10 min break

  31. Practice II

  32. View Spectra and Sequence • To view matching peaks of the PepNovo prediction and the spectrum at the same time • Use the DtaViewer from http://www.biolnk.com

  33. Download • Download PepNovo • http://www-cse.ucsd.edu/groups/bioinformatics/software.html#pepnovo • http://bioinformatics.allmer.de/tools • Download test file • http://bioinformatics.allmer.de/tools

  34. Try PepNovo • Try to run PepNovo • Use the given input • Use the help information • Use the lecture slides • Use the lecture notes • Aim • Store the result in a text file

  35. PepNovo Results are displayed in the console We need to redirect the output into a file. ?>PepNovo.exe -dta MSMSSpectrum.dta -model tryp_model.txt > result.txt

  36. X!Tandem • Unzip folder and check • Mgf formated spectra (file) • Database file (FASTA) • tandem-win32-10-12-01-1 folder • Used .xml configuration files (default_input.xml, input.xml and taxonomy.xml) • To get the same output given in zip folder; • Replace configuration files in «tandem-win\bin» folder with ones in «used» folder. • Also copy database file to «fasta» folder and .mgf file to «bin» in «tandem-win»

  37. X!Tandem Console Application

  38. X!Tandem Default Input Parameters such as mass tolerances, enzyme type, number of charged for search can be reset in default_input.xml

  39. X!Tandem Input.xml • In input.xml file, youshouldspecifypath of: • taxonomy.xml • default_input.xml • Spectrafilename • Outputfilename • NOTE: Here input.xml andallfilesaboveare in samefolder(directory))

  40. X!Tandem Taxonomy In taxonomy file, you should specify «database file path». In this example, database file is in «fasta» folder in «Xtandem\tandem-win32-10-12-01-1» folder.

  41. X!Tandem Output

More Related