Introduction to Genomics
1 / 24

Introduction to Genomics and Bioinformatics - PowerPoint PPT Presentation

  • Uploaded on

Introduction to Genomics and Bioinformatics. Maureen J. Donlin Departments of Molecular Microbiology & Immunology Biochemistry & Molecular Biology 6/3/2014. Goals for the course. Finding and using publicly available datasets and tools for genomics and bioinformatics

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Introduction to Genomics and Bioinformatics' - sahkyo

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introduction to genomics and bioinformatics

Introduction to Genomics

and Bioinformatics

Maureen J. Donlin

Departments of Molecular Microbiology & Immunology

Biochemistry & Molecular Biology


Goals for the course
Goals for the course

  • Finding and using publicly available datasets and tools for genomics and bioinformatics

  • Utilize these tool and datasets in your research

  • Interpret the output from various analysis and prediction programs

  • Learn to write a results section for a manuscript

Exercise format
Exercise format

  • Each exercise will consist of 2-4 sections which represent a biological question to be answered with bioinformatics tools/resources

  • You’ll provide the answer in the same format as you would write for the results section of a paper

    • Why did you do this experiment or analysis?

    • What did you actually do?

    • What did you observe?

    • What does it mean?


  • Grading:

    • Exercises 70 %

    • Final exam 20 %

    • Class attendance 10 %

  • Grading policy handout

    • Details about late assignment and tests


  • Course website:


  • Contact:

    • Phone: 977-8858

    • Email:

  • Office – DRC 507

    • Call or email.

    • Usually at WashU on Wednesdays

Lecture outline
Lecture outline

  • Overview of theme for this course

    • Large datasets = long lists of genes

    • How to interrogate gene lists using publicly available data

  • Introduction to sequence databases

  • Quality control and annotation

Host pathogen interactions in model organisms
Host-pathogen interactions in model organisms

  • Caenorhabditis elegans will be the model organism

  • Various bacteria (S. aureus) and fungi (C. albicans) will be the pathogens

  • Examine data from microarrays, RNA sequencing and proteomic studies

  • Use various public databases and tools to interrogate and analyze the data

Aspects of host pathogen interactions
Aspects of host-pathogen interactions

  • Pathogen virulence factors

    • High-throughput expression analysis of pathogens during infection

    • Genetic differences between closely related species that differ in their ability to infect & kill C. elegans

  • Host innate immune response

    • High-throughput expression analysis of host during infection

    • Comparison of host response to different pathogens

  • Factors that mediate infection

    • Screen for pathogen & host factors that affect virulence and susceptibility to infection

Types of worm killing
Types of worm killing

Disease Models & Mech. (2008) 1:205

CurrOpinMicrobiol. (2008) 11:251

App. & Env. Microbiol. (2012) 78:2075

Dataset 1 response to fungal infection
Dataset 1: Response to fungal infection

  • “Candida albicansInfection of Caenorhabditis elegans Induces Antifungal Immune Defenses”Pukkila-Worley R., Ausubel FM and Mylonakis E(2011) PLoS Pathogens 7:e1002074 PMID: 21731485

  • Study innate immune response to C. albicans in a model host

    • Live yeast establish intestinal infection but heat-killed yeast are avirulent

    • Identified 313 genes differentially expressed (DE or DEG) with infection by C. albicans

    • 56% of those genes were also DE with heat-killed yeast

    • Not much overlap with genes DE in response to S. aureusor P. aeruginosa

Starting point for exercise 1
Starting point for Exercise 1

  • Supplementary table 3 which lists the >300 genes DE in response to C. albicans and also gives the overlap with the heat killed C. ablicans

  • Goals are to use NCBI to find information about a few genes from the list

  • Use Excel to bring in additional data into your list of genes

Biological databases
Biological Databases

  • DNA -> RNA -> Protein

    • DNA archives – genomes, ESTs (Genbank/EMBL)

    • Annotated mRNAs/Genes (Gene)

    • RNA (miRNAs, snoRNA, structures)

    • Protein databases

      • Automated translation (GenPept/TrEMBL)

      • Curated (Uniprot)

      • Structures (PDB)

Biological databases1
Biological databases

  • Store data in a form that allows users to search and retrieve

  • Use defined relationships between data to allow finding related records

    • Genome linked to genes

    • Genes linked to transcript isoforms

    • Each transcript linked to encoded protein

    • Genbank records include all cross-database records as active links

Quality control and annotation
Quality control and annotation

  • Genbank – an archive

    • Users submit data and own exclusive rights for all updates to those records

    • All submissions reviewed/approved by NCBI

Growth of genbank
Growth of Genbank

Gene annotation
Gene annotation

  • Assign or define:

    • Gene name

    • Gene structure

    • Molecular Function

    • Biological process

    • Cellular component

    • Ect….

  • Ideally, this data is known experimentally

    • Curate: pull this data from the literature


  • Time consuming and costly

  • Not keeping pace with rate of genome sequencing

    • 2008: 2nd assembly of C. neoformanstype A

    • 2013: Only 1st assembly in Genbank

    • 2014: Refined gene models using NGS data

  • Organism specific databases often have better annotation

  • Curated databases aims at a particular field

    • EuPath (Eukaryotic pathogens)

Gene database
Gene database

  • Gene – derived database

    • Curators at NCBI review submissions/literature and create annotated records of every gene and gene product for a subset of organisms

  • Currently:

    • ~244 million sequence records in Genbank

    • ~16 million records in the Gene database

Introduction to genomics and bioinformatics

  • Curation& annotation of all known proteins

  • Provide […] comprehensive, high-quality and freely accessible resource of protein sequence and functional information.


Uniprot databases
Uniprot databases

  • 545,000 reviewed (UniprotKB/Swiss-Prot)

  • ~56 million not yet reviewed (UniprotKB/TrEMBL)

Other databases
Other databases

  • Genome databases (Thursdays topic)

  • Organism specific (Yeast, Drosophilia, C. elegans, ect.)

  • Expression patterns

  • Protein domains

  • Metabolic pathway

  • …..

  • NAR Database issue: 1st issue of every year

    • See handout