Tools in bioinformatics - PowerPoint PPT Presentation

Tools in bioinformatics l.jpg
1 / 52

  • Uploaded on
  • Presentation posted in: General

Tools in bioinformatics. Fall 2009-10. Goals. Overview. To provide students with practical knowledge of bioinformatics tools and their application in research. Prerequisites. The course “Introduction to bioinformatics”

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Tools in bioinformatics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Tools in bioinformatics l.jpg

Tools in bioinformatics

Fall 2009-10

Goals l.jpg



  • To provide students with practical knowledge of bioinformatics tools and their application in research


  • The course “Introduction to bioinformatics”

  • Familiarity with topics in molecular biology (cell biology, biochemistry, and genetics)

  • Basic familiarity with computers & internet

Course website l.jpg

Course website


Administration l.jpg



A class will be given every two weeks

There are three class groups:Sunday 16:00-18:00Monday 12:00-14:00

Monday 14:00-16:00


Computer classroom Sherman 03

Administration5 l.jpg



  • Nimrod Rubinstein (Sundays)

  • Daiana Alaluf (Mondays I)

  • Osnat Penn (Mondays II)

  • Reception hours:Email your instructor any question at any time or set an appointment (Britania 405, 6409245)

Slide6 l.jpg


  • Assignments – 50% of final grade (compulsory)

    • Assignments include class and home works:

      • Class works are planned to be completed during the lesson and handed in at the end of it. They will be checked but not graded.

      • Home works should be handed in the following lesson (two weeks after their hand out). They will be checked and graded.

  • Final project – 50% of final grade

    When emailing your instructor (a question, your assignment, or whatever) please state in the “Subject” field: “Tools in Bioinfo”, IDs, CW/HW number (if relevant)

Bioinformatics databases l.jpg


What s in a database l.jpg

What’s in a database?

  • Sequences – genes, proteins, etc…

  • Full genomes

  • Expression data

  • Structures

  • Annotation – information about genes/proteins:- function- cellular location- chromosomal location- introns/exons- phenotypes, diseases

  • Publications

Ncbi and entrez l.jpg

NCBI and Entrez

  • One of the most largest and comprehensive databases belonging to the NIH (national institute of health.The primary Federal agency for conducting and supporting medical research in the USA)

  • Entrez is the search engine of NCBI

  • Search for :genes, proteins, genomes, structures, diseases, publications, and more

Pubmed ncbi s database of biomedical articles l.jpg

PubMed: NCBI’s database of biomedical articles

Yang X, Kurteva S, Ren X, Lee S, Sodroski J. “Subunit stoichiometry of human immunodeficiency virus type 1 envelope glycoprotein trimers during virus entry into host cells “, J Virol. 2006 May;80(9):4388-95.

Use fields l.jpg

Use fields!

Yang[AU] AND glycoprotein[TI] AND 2006[DP] AND J virol[TA]

For the full list of field tags: go to help -> Search Field Descriptions and Tags

Example l.jpg


  • Retrieve all publications in which the first author is:Davidovich C and the last author is: Yonath A

Using limits l.jpg

Using limits

Retrieve the publications of Yonath A, in the journals: Nature and Proc Natl Acad Sci U S A., in the last 5 years

Google scholar l.jpg

Google scholar

Genbank ncbi s gene protein database l.jpg

GenBank: NCBI’s gene & protein database

  • GenBank is an annotated collection of all publicly available DNA sequences (and their amino-acid translations)

  • Holds ~106.5 billionbases of ~108.5 millionsequence records (Oct. 2009)

Searching ncbi for the protein human cd4 l.jpg

Searching NCBI for the protein human CD4

Search demonstration

Using field descriptions qualifiers and boolean operators l.jpg

Using field descriptions, qualifiers, and boolean operators

  • Cd4[GENE] AND human[ORGN] Or Cd4[gene name] AND human[organism]

  • List of field codes:

    • Boolean Operators:ANDORNOT

      Note: do not use the field Protein name [PROT], only GENE!

Slide20 l.jpg

This time we directly search in the protein database

Refseq l.jpg


  • Subcollection of NCBI databases with only non-redundant, highly annotated entries (genomic DNA, transcript (RNA), and protein products)

Slide23 l.jpg

An explanation on GenBank records

Fasta format l.jpg

Fasta format






Save accession numbers for future use (makes searching quicker):RefSeq accession number: NP_000607.1


Downloading l.jpg



Swissprot l.jpg


  • A protein sequence database which strives to provide a high level of annotation regarding:* the function of a protein* domains structure* post-translational modifications* variants

  • One entry for each protein

Genbank vs swissprot l.jpg

GenBank Vs. Swissprot

Swiss-Prot results

GenBank results

Pdb protein data bank l.jpg

PDB: Protein Data Bank

  • Main database of 3D structures of macromolecules

  • Includes ~61,000 entries (proteins, nucleic acids, complex assemblies)

  • Is highly redundant

Human cd4 in complex with hiv gp120 l.jpg

Human CD4 in complex with HIV gp120




Accession numbers l.jpg

Accession Numbers

Genecards l.jpg


  • All-in-one database of human genes (a project by the Weizmann institute)

  • Attempts to integrate as many as possible databases, publications, and all available knowledge

Organism specific databases l.jpg

Organism specific databases

  • Model organisms have independent databases:

HIV database

Summary l.jpg


  • General and comprehensive databases:

    • NCBI, EMBL

  • Genome specific databases (to be discussed):


  • Highly annotated databases:

    • Human genes

      • Genecards

    • Proteins:

      • Swissprot, RefSeq

    • Structures:

      • PDB

As important l.jpg

As important:

  • Google (or any search engine)

And always remember l.jpg

And always remember:

  • RT(F)M -Read the manual!!! (/help/FAQ)

Go g ene o ntology l.jpg

GO: Gene Ontology

Gene ontology l.jpg

Gene Ontology

  • Strives to provide consistent descriptions of gene products obtained from different databases

  • GO annotations include three hierarchicalontologies of gene products:

    • cellular component(s) – the environment in which the gene product functions

    • biological processe(s) – the biological program/pathway in which the gene product is involved

    • molecular function(s) – the elemental activities of the gene product

  • E.g., cytochrome c:

    • cellular components: mitochondrial matrix and mitochondrial inner membrane

    • biological processes: oxidative phosphorylation and induction of cell death

    • molecular functions: oxidoreductase activity

Amigo the official go browser l.jpg

AmiGO: the official GO browser

Slide43 l.jpg



Through ncbi l.jpg

Through NCBI

Slide45 l.jpg





Enrichment analysis l.jpg

Enrichment analysis

Query set

Reference set





Total – N genes

Function f – K genes

Total – n genes

Function f – k genes

Is k/n > K/N, significantly ???

Statistical significance testing l.jpg

Statistical significance testing

Problem formulation:

In a group of N genes there are K “special” ones

If we sample n genes out of N (without replacement), and found k “special” ones, would that be considered a random outcome?

Mathematically, we use the hypergeometric distribution to compute the probability of obtaining k or more “special” ones in a sample of n

Materials methods l.jpg

Materials & Methods

21,121 siRNA knockdown assays, literally covering the entire coding-sequence part of the genome

Results l.jpg


273 HIV-dependency factors (HDFs) were discovered

Biological processes

Slide51 l.jpg

Molecular functions

Subcellular localizations

Observations l.jpg


  • Nuclear pore complex: their loss may impede HIV nuclear access

  • Mediator members (couples TFs to Pol II): requirement for activators to bind HIV LTRs

  • Enzymes involved in glycosilation: HIV’s envelope protein is heavily glycosilated assisting in the virus entry to cells

  • Login