1 / 21

BioInformatics - What and Why?

BioInformatics - What and Why?. The following power point presentation is designed to give some background information on Bioinformatics. This presentation is modified from information supplied by Dr. Bruno Gaeta, and with permission from eBioInformatics Pty Ltd (c) Copywright.

avi
Download Presentation

BioInformatics - What and Why?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioInformatics - What and Why? The following power point presentation is designed to give some background information on Bioinformatics. This presentation is modified from information supplied by Dr. Bruno Gaeta, and with permission from eBioInformatics Pty Ltd (c) Copywright

  2. The need for bioinformaticists.The number of entries in data bases of gene sequences is increasing exponentially. Bioinformaticians are needed to understand and use this information. GenBank growth 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

  3. Genome sequencing projects, including the human genome project are producing vast amounts of information. The challenge is to use this information in a useful way Publically available genomes (April 1998) COMPLETE/PUBLIC Aquifex aeolicus Pyrococcus horikoshii Bacillus subtilis Treponema pallidum Borrelia burgdorferi Helicobacter pylori Archaeoglobus fulgidus Methanobacterium thermo. Escherichia coli Mycoplasma pneumoniae Synechocystis sp. PCC6803 Methanococcus jannaschii Saccharomyces cerevisiae Mycoplasma genitalium Haemophilus influenzae COMPLETE/PENDING PUBLICATION Rickettsia prowazekii Pseudomonas aeruginosa Pyrococcus abyssii Bacillus sp. C-125 Ureaplasma urealyticum Pyrobaculum aerophilum ALMOST/PUBLIC Pyrococcus furiosus Mycobacterium tuberculosis H37Rv Mycobacterium tuberculosis CSU93 Neisseria gonorrhea Neisseria meningiditis Streptococcus pyogenes Terry Gaasterland, Siv Andersson, Christoph Sensen http://www.mcs.anl.gov/home/gaasterl/genomes.html

  4. “Towards a paradigm shift in biology” Nature News and Views 349:99 Bioinformatics impacts on all aspects of biological research. ”..We must hook our individual computers into the worldwide network that gives us access to daily changes in the databases and also makes immediate our communications with each other. The programs that display and analyze the material for us must be improved - and we must learn to use them more effectively. Like the purchased kits, they will make our life easier, but also like the kits, we must understand enough of how they work to use them effectively…” Walter Gilbert (1991) “Towards a paradigm shift in biology” Nature News and Views 349:99

  5. Promises of genomics and bioinformatics • Medicine • Knowledge of protein structure facilitates drug design • Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up • Genome analysis allows the targeting of genetic diseases • The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated • The same techniques can be applied to biotechnology, crop and livestock improvement, etc...

  6. What is bioinformatics? • Application of information technology to the storage, management and analysis of biological information • Facilitated by the use of computers

  7. What is bioinformatics? • Sequence analysis • Geneticists/ molecular biologists analyse genome sequence information to understand disease processes • Molecular modeling • Crystallographers/ biochemists design drugs using computer-aided tools • Phylogeny/evolution • Geneticists obtain information about the evolution of organisms by looking for similarities in gene sequences • Ecology and population studies • Bioinformatics is used to handle large amounts of data obtained in population studies • Medical informatics • Personalised medicine

  8. Sequence analysis: overview Sequence entry Sequencing project management Sequence database browsing Manual sequence entry Nucleotide sequence analysis Nucleotide sequence file Search for protein coding regions Search databases for similar sequences Protein sequence analysis • Design further experiments • Restriction mapping • PCR planning Translate into protein Protein sequence file coding non-coding Search databases for similar sequences Search for known motifs Predict secondary structure Sequence comparison Search for known motifs RNA structure prediction Sequence comparison Predict tertiary structure Multiple sequence analysis Create a multiple sequence alignment Edit the alignment Format the alignment for publication Molecular phylogeny Protein family analysis

  9. Gene Sequencing: Automated chemcial sequencing methods allow rapid generation of large data banks of gene sequences

  10. Database similarity searching: The BLAST program has been written to allow rapid comparison of a new gene sequence with the 100s of 1000s of gene sequences in data bases Sequences producing significant alignments: (bits) Value gnl|PID|e252316 (Z74911) ORF YOR003w [Saccharomyces cerevisiae] 112 7e-26 gi|603258 (U18795) Prb1p: vacuolar protease B [Saccharomyces ce... 106 5e-24 gnl|PID|e264388 (X59720) YCR045c, len:491 [Saccharomyces cerevi... 69 7e-13 gnl|PID|e239708 (Z71514) ORF YNL238w [Saccharomyces cerevisiae] 30 0.66 gnl|PID|e239572 (Z71603) ORF YNL327w [Saccharomyces cerevisiae] 29 1.1 gnl|PID|e239737 (Z71554) ORF YNL278w [Saccharomyces cerevisiae] 29 1.5 gnl|PID|e252316 (Z74911) ORF YOR003w [Saccharomyces cerevisiae] Length = 478 Score = 112 bits (278), Expect = 7e-26 Identities = 85/259 (32%), Positives = 117/259 (44%), Gaps = 32/259 (12%) Query: 2 QSVPWGISRVQAPAAHNRG---------LTGSGVKVAVLDTGIST-HPDLNIRGG-ASFV 50 + PWG+ RV G G GV VLDTGI T H D R + + Sbjct: 174 EEAPWGLHRVSHREKPKYGQDLEYLYEDAAGKGVTSYVLDTGIDTEHEDFEGRAEWGAVI 233 Query: 51 PGEPSTQDGNGHGTHVAGTIAALNNSIGVLGVAPSAELYXXXXXXXXXXXXXXXXXQGLE 110 P D NGHGTH AG I + + GVA + ++ +G+E Sbjct: 234 PANDEASDLNGHGTHCAGIIGSKH-----FGVAKNTKIVAVKVLRSNGEGTVSDVIKGIE 288

  11. Sequence comparison: Gene sequences can be aligned to see similarities between gene from different sources 768 TT....TGTGTGCATTTAAGGGTGATAGTGTATTTGCTCTTTAAGAGCTG 813 || || || | | ||| | |||| ||||| ||| ||| 87 TTGACAGGTACCCAACTGTGTGTGCTGATGTA.TTGCTGGCCAAGGACTG 135 . . . . . 814 AGTGTTTGAGCCTCTGTTTGTGTGTAATTGAGTGTGCATGTGTGGGAGTG 863 | | | | |||||| | |||| | || | | 136 AAGGATC.............TCAGTAATTAATCATGCACCTATGTGGCGG 172 . . . . . 864 AAATTGTGGAATGTGTATGCTCATAGCACTGAGTGAAAATAAAAGATTGT 913 ||| | ||| || || ||| | ||||||||| || |||||| | 173 AAA.TATGGGATATGCATGTCGA...CACTGAGTG..AAGGCAAGATTAT 216

  12. 50 100 150 200 250 AceIII 1 CAGCTCnnnnnnn’nnn... AluI 2 AG’CT AlwI 1 GGATCnnnn’n_ ApoI 2 r’AATT_y BanII 1 G_rGCy’C BfaI 2 C’TA_G BfiI 1 ACTGGG BsaXI 1 ACnnnnnCTCC BsgI 1 GTGCAGnnnnnnnnnnn... BsiHKAI 1 G_wGCw’C Bsp1286I 1 G_dGCh’C BsrI 2 ACTG_Gn’ BsrFI 1 r’CCGG_y CjeI 2 CCAnnnnnnGTnnnnnn... CviJI 4 rG’Cy CviRI 1 TG’CA DdeI 2 C’TnA_G DpnI 2 GA’TC EcoRI 1 G’AATT_C HinfI 2 G’AnT_C MaeIII 1 ’GTnAC_ MnlI 1 CCTCnnnnnn_n’ MseI 2 T’TA_A MspI 1 C’CG_G NdeI 1 CA’TA_TG Sau3AI 2 ’GATC_ SstI 1 G_AGCT’C TfiI 2 G’AwT_C Tsp45I 1 ’GTsAC_ Tsp509I 3 ’AATT_ TspRI 1 CAGTGnn’ Restriction mapping: Genes can be analysed to detect gene sequences that can be cleaved with restriction enzymes

  13. PCR Primer Design: Oligonucleotides for use in the polymerisation chain reaction can be designed using computer based prgrams OPTIMAL primer length --> 20 MINIMUM primer length --> 18 MAXIMUM primer length --> 22 OPTIMAL primer melting temperature --> 60.000 MINIMUM acceptable melting temp --> 57.000 MAXIMUM acceptable melting temp --> 63.000 MINIMUM acceptable primer GC% --> 20.000 MAXIMUM acceptable primer GC% --> 80.000 Salt concentration (mM) --> 50.000 DNA concentration (nM) --> 50.000 MAX no. unknown bases (Ns) allowed --> 0 MAX acceptable self-complementarity --> 12 MAXIMUM 3' end self-complementarity --> 8 GC clamp how many 3' bases --> 0

  14. 0 1,000 2,000 3,000 4,000 2.0 1.5 1.0 0.5 -0.0 2.0 1.5 1.0 0.5 -0.0 2.0 1.5 1.0 0.5 -0.0 0 1,000 2,000 3,000 4,000 Gene discovery:Computer program can be used to recognise the protein coding regions in DNA Plot created using codon preference (GCG)

  15. RNA structure prediction: Structural features of RNA can be predicted A C G U G C A A U G C U A U A C G G A A U U A U G U A C U C G C C A G G G U G G G G G U C C G C U C A C U C G U C A A A U G C G C U A G U C G G C C A

  16. Protein structure prediction: Particular structural features can be recognised in protein sequences 50 100 5.0 KD Hydrophobicity -5.0 10 Surface Prob. 0.0 1.2 Flexibility 0.8 1.7 AntigenicIndex -1.7 CF Turns CF AlphaHelices CF Beta Sheets GOR Turns GOR AlphaHelices GOR Beta Sheets Glycosylation Sites 50 100

  17. Protein Structure : the 3-D structure of proteins is used to understand protein function and design new drugs

  18. Multiple sequence alignment: Sequences of proteins from different organisms can be aligned to see similarities and differences Alignment formatted using MacBoxshade

  19. Phylogeny inference: Analysis of sequences allows evolutionary relationships to be determined E.coli C.botulinum C.cadavers C.butyricum B.subtilis B.cereus Phylogenetic tree constructed using the Phylip package

  20. Mapping Identifying the location of clones and markers on the chromosome by genetic linkage analysis and physical mapping Sequencing Assembling clone sequence reads into large (eventually complete) genome sequences Gene discovery Identifying coding regions in genomic DNA by database searching and other methods Function assignment Using database searches, pattern searches, protein family analysis and structure prediction to assign a function to each predicted gene Data mining Searching for relationships and correlations in the information Genome comparison Comparing different complete genomes to infer evolutionary history and genome rearrangements Large scale bioinformatics: genome projects

  21. Challenges in bioinformatics • Explosion of information • Need for faster, automated analysis to process large amounts of data • Need for integration between different types of information (sequences, literature, annotations, protein levels, RNA levels etc…) • Need for “smarter” software to identify interesting relationships in very large data sets • Lack of “bioinformaticians” • Software needs to be easier to access, use and understand • Biologists need to learn about the software, its limitations, and how to interpret its results

More Related