1 / 34

REMINDERS

REMINDERS. 2 nd Exam on Coverage: Central Dogma of DNA Replication Transcription Translation Recombinant DNA technology and molecular biology Protein analysis. BIOINFORMATICS. BIOINFORMATICS. Study of the structure of biological information and biological systems

yair
Download Presentation

REMINDERS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. REMINDERS • 2nd Exam on • Coverage: • Central Dogma of DNA • Replication • Transcription • Translation • Recombinant DNA technology and molecular biology • Protein analysis

  2. BIOINFORMATICS

  3. BIOINFORMATICS • Study of the structure of biological information and biological systems • Integrates theories and tools of mathematics/statistics, computer science and information technology • Involves the use of hardware and software to study vast amounts of biological data

  4. What is Bioinformatics? • the field of science in which biology, computer science, and information technology merge to form a single discipline • application of information technology to the storage, management and analysis of biological information • facilitated by the use of computers

  5. FUNCTIONS • Data Management • Storage • Retrieval • Data Analysis *Literature/Bibliography, Sequence, Structure, Taxonomy, Expression, etc.

  6. BIOLOGICAL DATABASES • Systematic data storage/retrieval • Maintained on a regular basis • Can contain various types of data (integration) • Sequence • Structure • Other pertinent information • Nucleic acids and proteins are most common

  7. DATABASES • a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system • Biological databases consist usually of the nucleic acid sequences of the genetic material of various organisms as well as protein sequences and structures

  8. DATABASES • e.g. nucleotide sequence database typically contains information such as • contact name • the input sequence with a description of the type of molecule • the scientific name of the source organism from which it was isolated • additional requirements • easy access to the information • a method for extracting only that information needed to answer a specific biological question

  9. DATABASES • Sequence • GenBank, European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ); managed by the International Nucleotide Sequence Database Collaboration (INSDC) • UniGene • Saccharomyces Genome Database (SGD) • UniProtKB (UniProtKB/Swiss-Prot or UniProt/TrEMBL) • ExPASy

  10. DATABASES • Structure • Nucleic Acid Database (NDB) • Protein Data Bank (PDB) • Worldwide Protein Data Bank (wwPDB) • ExPASy

  11. DATA MINING • Process by which testable hypotheses are created regarding function/structure of gene/protein of interest through identifying similar sequences in “more established” organisms • Tools: • Text-term search • Sequence similarity search

  12. Machine Learning • Studies methods and the design of computer programs based on past experience • Why? • New methods are being introduced • Old ones should be improved

  13. “Units” of Information • DNA (genome) • RNA (transcriptome) • Protein (proteome)

  14. What is Being Analyzed? • Sequence • Structure • Interactions • Pathways • Mutations/Evolutions

  15. Why? • Increasing amount of biological information entails • Organization • Archiving • Global unification/harmonization • More biological discoveries • Functional/Structural similarities • Phylogenetic/Evolutionary patterns

  16. Applications • Medicine • Pharmaceuticals • Biotechnology • Agriculture

  17. STRUCTURE DATABASES

  18. Molecular Data • When you draw a molecule, • You start with atoms • Then proceed with the structure • And the three-dimensional data • What can be stored? • Coordinates • Sequences • Chemical graphs • Atoms and bonds

  19. Databases • Protein Data Bank (PDB) • Molecular Modeling Database (MMDB)

  20. Techniques in the Laboratory • X-ray Crystallography • Nuclear Magnetic Resonance

  21. Formats • PDB • mmCIF • MMDB

  22. Structure Viewers • Cn3D • RasMol • WebMol • Mage • VRML • CAD • Swiss PDB Viewer

  23. Promises of bioinformatics • Medicine • Knowledge of protein structure facilitates drug design • Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up • Genome analysis allows the targeting of genetic diseases • The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated • The same techniques can be applied to biotechnology, crop and livestock improvement, etc...

  24. Challenges in bioinformatics • Explosion of information • Need for faster, automated analysis to process large amounts of data • Need for integration between different types of information (sequences, literature, annotations, protein levels, RNA levels etc…) • Need for “smarter” software to identify interesting relationships in very large data sets • Lack of “bioinformaticians” • Software needs to be easier to access, use and understand • Biologists need to learn about the software, its limitations, and how to interpret its results

  25. SEQUENCE ALIGNMENT

  26. Two or More Sequences • Measure similarity • Determine correspondences between residues • Find patterns of conservation • Derive evolutionary relationships

  27. Alignment • Correspondences of nucleotides/amino acids in two sequences or more are assigned • An assignment of correspondences that preserves the order of the residues within the sequences is an alignment • Gaps are used to achieve this • Sequence alignment refers to the identification of residue-residue correspondences

  28. Uses • Homology • Similarities • “Ancestry” • Genome annotation • Assigning structure and function to genes • Database queries • For newly-discovered/unknown sequences

  29. Tools • Dot Plots • Diagonal lines of dots showing similarities between two sequences • Scoring Matrices • Score reflects quality of each possible alignment; best possible score is identified • Scoring scheme is crucial • PAM (Point Accepted Mutations) and BLOSUM (BLOCKS Substitution Matrix) • Dynamic Programming • Algorithmic technique that reuses previous computations

  30. Scoring • Penalties/Scores • Match (e.g. A – A) • Mismatch (e.g. A C) • Gap (e.g. A _) • Linear Gap Penalty: Uniform • Affine Gap Penalty: Gap Existence vs. Gap Extension

  31. Local vs. Global Alignments • Global Alignment • Similarities between majority of two sequences • Local Alignment • Similarities between specific parts of two sequences

  32. Programs Pairwise Sequence Alignment • BLAST • VAST • FASTA Multiple Sequence Alignment • MAFFT

  33. Needleman-Wunsch Algorithm • Can be used for global and alignments • Maximum-value function • A simple scoring scheme is assumed Three steps • Initialization • Matrix fill (scoring) • Traceback (alignment)

More Related