1 / 35

Introduction to Computational Biosciences and Bioinformatics

Introduction to Computational Biosciences and Bioinformatics. Alex Ropelewski ropelews@psc.edu Pittsburgh Supercomputing Center National Resource for Biomedical Supercomputing http://staff.psc.edu/ropelews/jsu/Begin_CS_Jackson_State_Intro_Computational_BioScience.ppt

tonya
Download Presentation

Introduction to Computational Biosciences and Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Computational Biosciences and Bioinformatics Alex Ropelewski ropelews@psc.edu Pittsburgh Supercomputing Center National Resource for Biomedical Supercomputing http://staff.psc.edu/ropelews/jsu/Begin_CS_Jackson_State_Intro_Computational_BioScience.ppt http://compbio.jsums.edu/awareness/week1.html These materials were developed with funding from the US National Institutes of Health grant #2T36 GM008789 to the Pittsburgh Supercomputing Center

  2. Computational Biosciences The application of computer science, engineering, physical science and mathematics to the way in which plants, animals and humans function

  3. Bioinformatics Structural biology Genetic databases Quantitative ecology Physiological modeling Medical informatics Image processing and visualization Medical imaging Biomedical instrumentation Biomathematics Neuroscience Telemedicine Biomedical engineering Other related areas Computational Bioscience Fields

  4. Bioinformatics The interdisciplinary science of using computational approaches to analyze, classify, collect, represent and store biological data with the goal of accelerating and enhancing the understanding of DNA, RNA and Protein sequences.

  5. Structural Biology The branch of the sciences concerned with the molecular structure of biological macromolecules such as proteins and nucleic acids, how they acquire the structures they have, and how alterations in their structures affect their function.

  6. Physiological Modeling The study of the mechanical, physical, and biochemical functions of living organisms through the use and creation of mathematical models of physiological systems. Examples include models of components of organisms, such as particular organs or cell systems.

  7. Image Processing and Visualization The science of organizing, displaying, and analyzing image data taken from any living organism in a realistic life-like manner.

  8. Computational Neuroscience and Signal Processing Applying mathematical and computational methods to understand the signaling, control and other networks in living organisms

  9. Who Employs Computational Bioscientists? • Pharmaceuticals & Biotechnology (Bayer, Schering-Plough, Amgen, Merck, Eli Lilly, etc,) • Hospitals (particularly research hospitals) • Agriculture (Monsanto, Pioneer, etc.) • Academia (particularly research universities/institutes) • Government • NIH (many institutes including NLM, NCBI, NCI, CDC) • DOE (National labs) • Department of Defense (including Army Corps of Engineers) • Agriculture, Veterans Affairs, NSF • Government Contractors (such as Computercraft, SRA)

  10. Computational Biosciences Job Growth Engineers, Life and Physical Scientists and Related Occupations. Occupational Outlook Handbook, 2008-09 Edition. Department of Labor, Bureau of Labor Statistics

  11. Computational Biosciences Salaries National Occupational Employment and Wage Estimates Department of Labor, Bureau of Labor Statistics, May 2007

  12. Computational Biosciences • Interdisciplinary skills are required • Require knowledge in the following areas: • Biology • Chemistry • Computer Science • Mathematics • Statistics • Physics • Engineering

  13. Computational Biosciences Required Skill Sets • Agricultural and food scientists need “…the ability to apply statistical techniques, and the ability to use computers to analyze data and to control biological and chemical processing.” • Biological scientists “…usually study allied disciplines such as mathematics, physics, engineering and computer science. Computer courses are beneficial for modeling and simulating biological processes, operating some laboratory equipment and performing research in the emerging field of bioinformatics” • “Computer skills are essential for prospective environmental scientists and hydrologists. Students who have some experience with computer modeling, data analysis and integration, digital mapping, remote sensing and Geographic Information Systems will be the most prepared to enter the job market” • Medical scientists “in addition to required courses in chemistry and biology undergraduates should study allied disciplines such as mathematics, engineering, physics, and computer science…” Engineers, Life and Physical Scientists and Related Occupations. Occupational Outlook Handbook, 2008-09 Edition. Department of Labor, Bureau of Labor Statistics

  14. Computational Biosciences Required Skill Sets • “Developments in the field of Chemistry that involve life sciences will expand, resulting in more interaction among biologists, engineers, computer specialists and chemist.” Chemistry majors “usually study biological sciences; mathematics; physics; and increasingly computer science. Computer courses are essential because employers prefer job applicants who are able to apply computer skills to modeling and simulation tasks and operate computerized laboratory equipment. This is increasingly important as combinatorial chemistry and advanced screening techniques are more widely applied. Courses in statistics are useful because chemists… need the ability to apply basic statistical techniques.” “Chemists should experience employment growth in pharmaceutical and biotechnology research as recent advances in genetics open new avenues of treatment for diseases…. Job growth for chemists is expected to be strongest in pharmaceutical and biotechnology firms.” Engineers, Life and Physical Scientists and Related Occupations. Occupational Outlook Handbook, 2008-09 Edition. Department of Labor, Bureau of Labor Statistics

  15. Bioinformatics The interdisciplinary science of using computational approaches to analyze, classify, collect, represent and store biological data with the goal of accelerating and enhancing the understanding of DNA, RNA and Protein sequences.

  16. What is a Sequence? • A sequence is a way to represent a protein, DNA, or RNA molecule as a character string. Phospholipase A2 - Bos taurus (Bovine). MRLLVLAALLTVGAGQAGLNSRALWQFNGMIKCKIPSSEPLLDFNNYGCYCGLGGSGTPV DDLDRCCQTHDNCYKQAKKLDSCKVLVDNPYTNNYSYSCSNNEITCSSENNACEAFICNC DRNAAICFSKVPYNKEHKNLDKKNC

  17. Molecular Alphabet • DNA/RNA Sequences: Letters represent side chains or bases: • A - Adenine • C - Cytosine • G - Guanine • T - Thymine (DNA) • U - Uracil (RNA) • X or N (Unknown) Image from Wikipedia Commons: http://en.wikipedia.org/wiki/File:DNA_chemical_structure.svg

  18. A - Alanine R - Arginine N - Asparagine D - Aspartic acid C - Cysteine E - Glutamic acid Q - Glutamine G - Glycine H - Histidine I - Isoleucine L - Leucine K – Lysine M – Methionine F - Phenylalanine P - Proline S - Serine T - Threonine W - Tryptophan Y - Tyrosine V - Valine B - Asparagine or aspartic acid Z - Glutamine or glutamic acid J - Leucine or Isoleucine X - Any Amino Acid U - Selenocysteine O - Pyrrolysine Molecular Alphabet • Protein Sequences: Letters represent amino acids: N Q P G I C L C Y Image from Wikipedia Commons: http://en.wikipedia.org/wiki/File:Oxytocin.jpg

  19. What is an Information Library? • A compilation of prior experimental knowledge about biologically relevant molecules into a computer system. • Bioinformatics power is in the ability to leverage and apply this prior experimental knowledge to additional biological problems. • In order to effectively search prior experimental knowledge, the prior experimental knowledge must be organized in a way that makes sense from both a computer science prospective and a biological point of view.

  20. How is Information Organized? • From a computer-science perspective, there are several ways that data can be organized and stored: • In a relational database • In a flat file • In a networked (hyperlinked) model • From a biologists perspective, there are also several different ways that data can be organized: • Sequence • Structure • Family/Domain • Species • Taxonomy • Function/Pathway • Disease/Variation • Publication Journal • And many other ways

  21. Representing Biological Data • Sequence Libraries: • Character based • Classification Libraries (Aligned sets of sequences): • Ambiguous consensus patterns • Weight Matrix • Position Specific Scoring Matrix (Profile) • Hidden Markov Models • Structural Libraries • X,Y,Z coordinates for each alpha carbon atom • Taxonomy • Tree structure represents the taxonomic lineage

  22. What does a biologist do with this data? • Search for similar sequences (sequences that share a biological relationship)

  23. What does a biologist do with this data? • Search for similar sequences (sequences that share a biological relationship)

  24. What does a biologist do with this data? • Align groups of sequences that share a biological relationship (family)

  25. What does a biologist do with this data? • Understand phylogenetic relationships of the family.

  26. What does a biologist do with this data? • Understand key positions (residues) of the family.

  27. What does a biologist do with this data? • Understand how key positions affect the structure and function of the molecule being studied

  28. What does a biologist do with this data? • Use structural data for a molecule from one species to model a related molecule from another species.

  29. Job Opportunities in Bioinformatics • This course will teach you many essential skills that are asked for in these job postings. • Let’s look at actual job postings asking for bioinformatics expertise: • Not all jobs will be labeled “bioinformatics” or “sequence analysis”; many are in a related computational bioscience field. • Specific skills required

  30. Summer Internship-Computational Biology • Qualifications:To be eligible for a Computational Biology Summer Scientific Internship students will have completed their undergraduate Sophomore year (by June 2009) • Be majoring in a biological, chemistry or computer science program. • Candidates would have completed at least one programming course before the start of the internship. • All interns must have current authorization to work for any employer within the United States. • Experience with MatLab, SQL, C++ and/or PERL experience is desired. http://jobview.monster.com/getjob.aspx?JobID=78206043&JobTitle=Summer+Internship-Computational+Biology&q=computational+biology&cy=us&lid=316&re=0&pg=1&dv=1&AVSDM=2008-12-18+14%3a20%3a00&seq=2&fseo=1&isjs=1&re=1000

  31. Bioinformatics Assembly Analyst Responsibilities: • assembling genome sequence data using a variety of tools and parameters and performing the experiments needed to evaluate sequencing strategies • using existing software and databases to analyze genomic data and correlating assemblies and sequences with a variety of genetic and physical maps and other biological information • identifying problems and serving as point of contact for various groups to propose and implement solutions • proposing and implementing upgrades to existing tools and processes to enhance analysis techniques and quality of results • developing and implementing scripts to manipulate, format, parse, analyze, and display genome sequence data; and developing new strategies for analysis and presentation of results. Requirements: • a bachelor's degree in biology or related field • at least three years of experience in DNA sequencing and sequence analysis. • Must possess solid knowledge of sequencing software and public sequencing databases. • Knowledge of bioinformatics tools helpful. http://sh.webhire.com/servlet/av/jd?ai=631&ji=2285147&sn=I

  32. Bioinformatics Analyst: Responsibilities: • The Bioinformatics Analyst will process sequence data and apply quality control measures for generating high quality raw sequence and assembled data from next generation sequencing technologies.​ • Will perform whole genome alignments using existing alignment tools, including BLAST, mummer and patternhunter Perform mapping and post-mapping analysis with short reads using third-party and internally developed tools.​ • Responsible for receiving, processing and managing sequence data.​ • Evaluate new methodologies and tools and improve data processing and quality control protocols.​ • Develop suitable metrics for reporting the completeness and quality of the sequence delivered to the customers.​ Requirements: • B.​S.​ in biology, computer science, bioinformatics or related field, or equivalent combination of education and experience • A minimum of 2 years experience in genomics and bioinformatics-related work.​ • Proficiency in Unix and experience in one or more of these programming languages -perl, SQL, jython and java is required.​ • Familiar with the use of commonly-used sequence analysis tools and genomic databases • Willing to multi-task and respond to new challenges as required.​ • Excellent communication skills.​ • Hands-on experience in a research or production environment http://jobview.monster.com/getjob.aspx?JobID=78527133&JobTitle=Bioinformatics+Analyst&brd=1&q=bioinformatics&cy=us&lid=316&re=130&AVSDM=2009-01-09+12%3a56%3a00&pg=1&seq=11&fseo=1&isjs=1&re=1000

  33. Business Systems Analyst: Responsibilities • The ideal candidate should be a highly motivated team player with a strong understanding of informatics solutions to biology and chemistry, especially in the area of data visualization/​statistical analysis and with proven record of building/​integrating effective tools for scientists to help them in their daily work.​ • Actively work with scientists/​computational biologists in a disease area to understand their needs • Define proper data analysis solution(s) to meet their scientific needs • Perform rapid prototyping to refine the requirements with proper documentation • Work with internal and external software teams, where appropriate to design/​implement proper solutions to meet scientists' needs • Work either as a team member or lead a team to deliver data analysis platforms to scientists/​computational biologists • Work effectively with different NITAS groups to ensure a globally consistent implementation scheme.​ Requirements: • Bachelor's degree in computer science, Biology, Bioinformatics or comparable qualification • At least 3-5 years hands-on experience on data analysis in a drug discovery, scientific or biotech environment • Strong communications and interpersonal skills • Proven capabilities interacting with scientists and being customer service oriented • Ability to work independently and/​or as part of a team • Familiarity with scientific LIMS such as ActivityBase, and data visualization/​analysis tools such as Spotfire • Solid understanding of relational databases and familiarity with Oracle and/​or SQL server • Good understanding in fundamentals of software engineering.​

  34. Summary • Wide variety of jobs • Biology, especially molecular biology and genetics • Some statistics • Computer skills: • UNIX • Bioinformatics Tools • Database (SQL) • Some Programming • Web • Bioinformatics can be a rewarding career path

  35. National Resource for Biomedical Supercomputing

More Related