1 / 23

Basic Overview of Bioinformatics Tools and Biocomputing Applications I

Basic Overview of Bioinformatics Tools and Biocomputing Applications I. Dr Tan Tin Wee Director Bioinformatics Centre. Software Tools. Data stored in retrievable forms in database systems Data generated by machines, DNA / Protein sequencers, automated systems.

Download Presentation

Basic Overview of Bioinformatics Tools and Biocomputing Applications I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Overview of Bioinformatics Tools and Biocomputing Applications I Dr Tan Tin Wee Director Bioinformatics Centre

  2. Software Tools • Data stored in retrievable forms in database systems • Data generated by machines, DNA / Protein sequencers, automated systems AutomatedMachines ResearchLabs Biological Data Analytical Tools Databases New Knowledge

  3. Common Computational Analyses • Sequence Assembly • Simple sequence analysis • Translation and reverse Complement, ORF • Composition statistics (protein & DNA) • Molecular mass • Total charge and pI; local hydropathy • Simple determination of secondary structures • Restriction site analysis • Internal repeat analysis • Detection of active sites, functional residues, characteristic structures, substrates, and processing signals

  4. Common Computational Analyses • Database sequence search • Multiple alignment • 2° and 3° Structure prediction; transmembrane helix detection • Structure modeling • Docking prediction and design • Hidden Markov model searches

  5. Sequence Assembly • Fragmented data from DNA sequencers • Detection of Overlap • Merging of Contigs • Assembly into continuous sequence 3' 5'

  6. Sequence Format Interconversion • DNA/Protein and other sequence data come in different formats. • Annotations • Different programs use different formats • Interconversion utility tools • eg. READSEQ, TOGCG, TOSTADEN, etc

  7. 3. Calculate based on list of criteria ………….… …………….. …………….. ……………... Simple Sequence Analysis 1. Linear Sequence eg. DNA/ Protein 2. Open a Window - n = 1 n = variable n = sliding

  8. Some Simple Sequence Analysis Applications • DNA complementary strand eg. COMPLEMENT & REVERSE • Open window size 1 • A--->T • C --->G • T ---> A • G ---> C • Slide to next Window of 1 • Proceed to end of sequence • Reverse order of complement • 5' ...ATCTCGATACTACTACG...3' • ||||||||||||||||| • 3' ...TAGAGCTATGATGATGC...5'

  9. Some Simple Sequence Analysis Applications • DNA to Protein sequence translation, e.g. TRANSLATE • Open window of 3 bases • Look up Codon Usage table • Assign Amino acid residue • Slide window to next 3 bases • Proceed till stop codon detected. • Repeat whole procedure for six frames ATACTACTGAGATCTAGGCTAGTACTGCGTGCG Frame 1 Frame 2 Frame 3 Complement - Frames 4-6

  10. Some Simple Sequence Analysis Applications • Detect Open Reading Frame e.g.ORF • Translate sequence, report long stretches of start and stop codons • Compositional analysis • eg. Calculate total A, T, G, C • eg. Calculate total molecular mass of protein, analysis percentages of amino acids • eg. Total Charge composition, pI

  11. Some Simple Sequence Analysis Applications • Simple prediction of secondary structure of Protein sequence • decide a window size • compute for each window of amino acids statistical potential to form helix, beta sheet, turn, etc. Chou-Fasman, GOR etc algorithms • use a statistical potential chart • plot potentials in graphical or pictorial format

  12. Some Simple Sequence Analysis Applications • Restriction Mapping eg. MAP, MAPPLOT,MAPSORT, PLASMIDMAP etc • Table of Restriction Enzymes and cut siteseg. EcoRI, BamHI AluIand their cut sites eg. GAATTC , AATT • Take a DNA sequence • Pattern match against the list of cut sites • For each match, assign Restriction enzyme • Calculate distance between cut sites • Display in table, graphical, or restriction map, etc gel Plasmidmap

  13. Some Simple Sequence Analysis Applications • Protein sequence Motifs pattern matching eg. PROSITEMAP, MOTIFS, BLOCKS etc • Table/Database of Sequence Patterns/Motifs and their signature sequence eg. Arg-Gly-Asp (RGD) or consensus sequence (eg. PROSITE, BLOCKS db) • Take Protein sequence • Pattern match against the list of signature sites • For each match, assign potential function according to database • Display in table or graphically, or hyperlinked

  14. Some Simple Sequence Analysis Applications • Peptide Cleavage Maps eg. PEPTIDESORT, PEPTIDE MAP • Table of Protease vs Cleavage sites eg. Trypsin, chymotrypsin, and Chemical cleavage sites cyanogen bromide • Pattern match with entire protein sequence • Calculate size of peptide fragments • Sort and Map, Plot as electrophoretic patterns on a log-linear simulated digest. • Compute Partial Digest patterns

  15. Some Simple Sequence Analysis Applications • DOTPLOT- selfcomparison • Take a Window size • Compare against entire length of own sequence • Report matches above a threshold • Plot on Graph • Slide window, repeat till end of sequence • Detection of Internal repeats • Pairwise comparison - detection of homology Sequence A Sequence A

  16. Some Simple Sequence Analysis Applications • RNA secondary structure analysis • Mfold, PlotFold, FoldRNA, Squiggles, Circles, Domes, Mountains, StemLoop • Folding of RNA into stems, loops • Calculation of energy - prediction of stability of structure • Display of structure and alternatives AUCG U G G A AUGC ---- -- -- UACG ...AUCGA AUCUC...

  17. Database Searching • Text-based Database Searching -using a text string to match an annotation in a sequence database record, ie. Keyword search • Sequence-based Database Searching-using a biological sequence to match its whole or parts of its sequence to the sequences of every sequence database records

  18. Text-Based Database Searching • Examples: Entrez, SRS, DBGET, AceDB- common integrated database systems • Search Concepts • Boolean Search - AND, OR, NOT • Broadening Search • Narrowing the Search • Proximity searching, soundex • Wild Card, Stemming eg. Thala* for thalasemia, thalassemia, thalassemic • Use standard string search algorithms and boolean operations, vocabulary matches

  19. Text-based Database Searching • Example: To find the human homolog of the Drosophila per gene • Procedure • Web to Entrez • All Fields : enter "human" "per" • Hits returned, irrelevant - broaden search • "human" "period" - more hits • check every one, find the human RIGUI gene • Hit and miss, clever guess work, free form or controlled vocabulary (MeSH terms)?Use Boolean searches?

  20. Sequence-based Database Searching • Homology Search • Global or Local Sequence Alignment • Needleman-Wunch Algorithm • Smith-Waterman Algorithm • Lipman - Pearson FASTA • Altschul's BLAST • Take a sequence, pairwise comparison with each sequence in the database

  21. Sequence-based Database Searching • Basic Assumptions: • Sequences of homologous Genes/Protein diverge over time even though structure and/or function change little • Significant sequence similarity inferred as potential structural /functional similarity or common evolutionary origin • Based on well-characterised protein, infer the function of an unknown sequence at gene or protein sequence level.

  22. Sequence-based Database Searching • Global Alignmentforces complete alignment of the pairwise comparison of the two input sequences • Local Alignmentlooks for local stretches of similarity and tries to align the most similar segments • Algorithms used may be similar, but output different, statistics needed to assess results

  23. Sequence-based Database Searching • Alignment Scoring • Substitution score and substitution matrixPAM, BLOSUM • affine gap costs/gap penalty and gap scores • Optimal alignments, dynamic programmingNeedleman-Wunsch algorithm,Smith-Waterman algorithm (SSEARCH) • Additional heuristics - FASTA, BLAST

More Related