Introduction to Genomics
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

Introduction to Genomics and Bioinformatics PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Introduction to Genomics and Bioinformatics. Maureen J. Donlin Departments of Molecular Microbiology & Immunology Biochemistry & Molecular Biology [email protected] 6/3/2014. Goals for the course. Finding and using publicly available datasets and tools for genomics and bioinformatics

Download Presentation

Introduction to Genomics and Bioinformatics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Introduction to genomics and bioinformatics

Introduction to Genomics

and Bioinformatics

Maureen J. Donlin

Departments of Molecular Microbiology & Immunology

Biochemistry & Molecular Biology

[email protected]


Goals for the course

Goals for the course

  • Finding and using publicly available datasets and tools for genomics and bioinformatics

  • Utilize these tool and datasets in your research

  • Interpret the output from various analysis and prediction programs

  • Learn to write a results section for a manuscript

Exercise format

Exercise format

  • Each exercise will consist of 2-4 sections which represent a biological question to be answered with bioinformatics tools/resources

  • You’ll provide the answer in the same format as you would write for the results section of a paper

    • Why did you do this experiment or analysis?

    • What did you actually do?

    • What did you observe?

    • What does it mean?



  • Grading:

    • Exercises70 %

    • Final exam 20 %

    • Class attendance10 %

  • Grading policy handout

    • Details about late assignment and tests



  • Course website:


  • Contact:

    • Phone: 977-8858

    • Email: [email protected]

  • Office – DRC 507

    • Call or email.

    • Usually at WashU on Wednesdays

Lecture outline

Lecture outline

  • Overview of theme for this course

    • Large datasets = long lists of genes

    • How to interrogate gene lists using publicly available data

  • Introduction to sequence databases

  • Quality control and annotation

Host pathogen interactions in model organisms

Host-pathogen interactions in model organisms

  • Caenorhabditis elegans will be the model organism

  • Various bacteria (S. aureus) and fungi (C. albicans) will be the pathogens

  • Examine data from microarrays, RNA sequencing and proteomic studies

  • Use various public databases and tools to interrogate and analyze the data

Aspects of host pathogen interactions

Aspects of host-pathogen interactions

  • Pathogen virulence factors

    • High-throughput expression analysis of pathogens during infection

    • Genetic differences between closely related species that differ in their ability to infect & kill C. elegans

  • Host innate immune response

    • High-throughput expression analysis of host during infection

    • Comparison of host response to different pathogens

  • Factors that mediate infection

    • Screen for pathogen & host factors that affect virulence and susceptibility to infection

Types of worm killing

Types of worm killing

Disease Models & Mech. (2008) 1:205

CurrOpinMicrobiol. (2008) 11:251

App. & Env. Microbiol. (2012) 78:2075

Dataset 1 response to fungal infection

Dataset 1: Response to fungal infection

  • “Candida albicansInfection of Caenorhabditis elegans Induces Antifungal Immune Defenses”Pukkila-Worley R., Ausubel FM and Mylonakis E(2011) PLoS Pathogens 7:e1002074 PMID: 21731485

  • Study innate immune response to C. albicans in a model host

    • Live yeast establish intestinal infection but heat-killed yeast are avirulent

    • Identified 313 genes differentially expressed (DE or DEG) with infection by C. albicans

    • 56% of those genes were also DE with heat-killed yeast

    • Not much overlap with genes DE in response to S. aureusor P. aeruginosa

Starting point for exercise 1

Starting point for Exercise 1

  • Supplementary table 3 which lists the >300 genes DE in response to C. albicans and also gives the overlap with the heat killed C. ablicans

  • Goals are to use NCBI to find information about a few genes from the list

  • Use Excel to bring in additional data into your list of genes

Biological databases

Biological Databases

  • DNA -> RNA -> Protein

    • DNA archives – genomes, ESTs (Genbank/EMBL)

    • Annotated mRNAs/Genes (Gene)

    • RNA (miRNAs, snoRNA, structures)

    • Protein databases

      • Automated translation (GenPept/TrEMBL)

      • Curated (Uniprot)

      • Structures (PDB)

Biological databases1

Biological databases

  • Store data in a form that allows users to search and retrieve

  • Use defined relationships between data to allow finding related records

    • Genome linked to genes

    • Genes linked to transcript isoforms

    • Each transcript linked to encoded protein

    • Genbank records include all cross-database records as active links

Quality control and annotation

Quality control and annotation

  • Genbank – an archive

    • Users submit data and own exclusive rights for all updates to those records

    • All submissions reviewed/approved by NCBI

Growth of genbank

Growth of Genbank

Introduction to genomics and bioinformatics


Gene annotation

Gene annotation

  • Assign or define:

    • Gene name

    • Gene structure

    • Molecular Function

    • Biological process

    • Cellular component

    • Ect….

  • Ideally, this data is known experimentally

    • Curate: pull this data from the literature



  • Time consuming and costly

  • Not keeping pace with rate of genome sequencing

    • 2008: 2nd assembly of C. neoformanstype A

    • 2013: Only 1st assembly in Genbank

    • 2014: Refined gene models using NGS data

  • Organism specific databases often have better annotation

  • Curated databases aims at a particular field

    • EuPath (Eukaryotic pathogens)

Gene database

Gene database

  • Gene – derived database

    • Curators at NCBI review submissions/literature and create annotated records of every gene and gene product for a subset of organisms

  • Currently:

    • ~244 million sequence records in Genbank

    • ~16 million records in the Gene database

Introduction to genomics and bioinformatics

  • Curation& annotation of all known proteins

  • Provide […] comprehensive, high-quality and freely accessible resource of protein sequence and functional information.


Uniprot databases

Uniprot databases

  • 545,000 reviewed (UniprotKB/Swiss-Prot)

  • ~56 million not yet reviewed (UniprotKB/TrEMBL)

Other databases

Other databases

  • Genome databases (Thursdays topic)

  • Organism specific (Yeast, Drosophilia, C. elegans, ect.)

  • Expression patterns

  • Protein domains

  • Metabolic pathway

  • …..

  • NAR Database issue: 1st issue of every year

    • See handout


  • Login