Tools in bioinformatics
1 / 52

Tools in bioinformatics - PowerPoint PPT Presentation

  • Updated On :

Tools in bioinformatics. Fall 2009-10. Goals. Overview. To provide students with practical knowledge of bioinformatics tools and their application in research. Prerequisites. The course “Introduction to bioinformatics”

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Tools in bioinformatics' - lola

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Goals l.jpg


  • To provide students with practical knowledge of bioinformatics tools and their application in research


  • The course “Introduction to bioinformatics”

  • Familiarity with topics in molecular biology (cell biology, biochemistry, and genetics)

  • Basic familiarity with computers & internet

Course website l.jpg
Course website


Administration l.jpg


A class will be given every two weeks

There are three class groups:Sunday 16:00-18:00Monday 12:00-14:00

Monday 14:00-16:00


Computer classroom Sherman 03

Administration5 l.jpg


  • Nimrod Rubinstein [email protected] (Sundays)

  • Daiana Alaluf [email protected] (Mondays I)

  • Osnat Penn [email protected] (Mondays II)

  • Reception hours:Email your instructor any question at any time or set an appointment (Britania 405, 6409245)

Slide6 l.jpg


  • Assignments – 50% of final grade (compulsory)

    • Assignments include class and home works:

      • Class works are planned to be completed during the lesson and handed in at the end of it. They will be checked but not graded.

      • Home works should be handed in the following lesson (two weeks after their hand out). They will be checked and graded.

  • Final project – 50% of final grade

    When emailing your instructor (a question, your assignment, or whatever) please state in the “Subject” field: “Tools in Bioinfo”, IDs, CW/HW number (if relevant)

What s in a database l.jpg
What’s in a database?

  • Sequences – genes, proteins, etc…

  • Full genomes

  • Expression data

  • Structures

  • Annotation – information about genes/proteins:- function- cellular location- chromosomal location- introns/exons- phenotypes, diseases

  • Publications

Ncbi and entrez l.jpg
NCBI and Entrez

  • One of the most largest and comprehensive databases belonging to the NIH (national institute of health.The primary Federal agency for conducting and supporting medical research in the USA)

  • Entrez is the search engine of NCBI

  • Search for :genes, proteins, genomes, structures, diseases, publications, and more

Pubmed ncbi s database of biomedical articles l.jpg
PubMed: NCBI’s database of biomedical articles

Yang X, Kurteva S, Ren X, Lee S, Sodroski J. “Subunit stoichiometry of human immunodeficiency virus type 1 envelope glycoprotein trimers during virus entry into host cells “, J Virol. 2006 May;80(9):4388-95.

Use fields l.jpg
Use fields!

Yang[AU] AND glycoprotein[TI] AND 2006[DP] AND J virol[TA]

For the full list of field tags: go to help -> Search Field Descriptions and Tags

Example l.jpg

  • Retrieve all publications in which the first author is:Davidovich C and the last author is: Yonath A

Using limits l.jpg
Using limits

Retrieve the publications of Yonath A, in the journals: Nature and Proc Natl Acad Sci U S A., in the last 5 years

Google scholar l.jpg
Google scholar

Genbank ncbi s gene protein database l.jpg
GenBank: NCBI’s gene & protein database

  • GenBank is an annotated collection of all publicly available DNA sequences (and their amino-acid translations)

  • Holds ~106.5 billionbases of ~108.5 millionsequence records (Oct. 2009)

Searching ncbi for the protein human cd4 l.jpg
Searching NCBI for the protein human CD4

Search demonstration

Using field descriptions qualifiers and boolean operators l.jpg
Using field descriptions, qualifiers, and boolean operators

  • Cd4[GENE] AND human[ORGN] Or Cd4[gene name] AND human[organism]

  • List of field codes:

    • Boolean Operators:ANDORNOT

      Note: do not use the field Protein name [PROT], only GENE!

Refseq l.jpg

  • Subcollection of NCBI databases with only non-redundant, highly annotated entries (genomic DNA, transcript (RNA), and protein products)

Fasta format l.jpg
Fasta format






Save accession numbers for future use (makes searching quicker):RefSeq accession number: NP_000607.1


Swissprot l.jpg

  • A protein sequence database which strives to provide a high level of annotation regarding:* the function of a protein* domains structure* post-translational modifications* variants

  • One entry for each protein

Genbank vs swissprot l.jpg
GenBank Vs. Swissprot

Swiss-Prot results

GenBank results

Pdb protein data bank l.jpg
PDB: Protein Data Bank

  • Main database of 3D structures of macromolecules

  • Includes ~61,000 entries (proteins, nucleic acids, complex assemblies)

  • Is highly redundant

Genecards l.jpg

  • All-in-one database of human genes (a project by the Weizmann institute)

  • Attempts to integrate as many as possible databases, publications, and all available knowledge

Organism specific databases l.jpg
Organism specific databases

  • Model organisms have independent databases:

HIV database

Summary l.jpg

  • General and comprehensive databases:

    • NCBI, EMBL

  • Genome specific databases (to be discussed):


  • Highly annotated databases:

    • Human genes

      • Genecards

    • Proteins:

      • Swissprot, RefSeq

    • Structures:

      • PDB

As important l.jpg
As important:

  • Google (or any search engine)

And always remember l.jpg
And always remember:

  • RT(F)M -Read the manual!!! (/help/FAQ)

Go g ene o ntology l.jpg
GO: Gene Ontology

Gene ontology l.jpg
Gene Ontology

  • Strives to provide consistent descriptions of gene products obtained from different databases

  • GO annotations include three hierarchicalontologies of gene products:

    • cellular component(s) – the environment in which the gene product functions

    • biological processe(s) – the biological program/pathway in which the gene product is involved

    • molecular function(s) – the elemental activities of the gene product

  • E.g., cytochrome c:

    • cellular components: mitochondrial matrix and mitochondrial inner membrane

    • biological processes: oxidative phosphorylation and induction of cell death

    • molecular functions: oxidoreductase activity

Amigo the official go browser l.jpg
AmiGO: the official GO browser

Slide43 l.jpg



Slide45 l.jpg





Enrichment analysis l.jpg
Enrichment analysis

Query set

Reference set





Total – N genes

Function f – K genes

Total – n genes

Function f – k genes

Is k/n > K/N, significantly ???

Statistical significance testing l.jpg
Statistical significance testing

Problem formulation:

In a group of N genes there are K “special” ones

If we sample n genes out of N (without replacement), and found k “special” ones, would that be considered a random outcome?

Mathematically, we use the hypergeometric distribution to compute the probability of obtaining k or more “special” ones in a sample of n

Materials methods l.jpg
Materials & Methods

21,121 siRNA knockdown assays, literally covering the entire coding-sequence part of the genome

Results l.jpg

273 HIV-dependency factors (HDFs) were discovered

Biological processes

Slide51 l.jpg

Molecular functions

Subcellular localizations

Observations l.jpg

  • Nuclear pore complex: their loss may impede HIV nuclear access

  • Mediator members (couples TFs to Pol II): requirement for activators to bind HIV LTRs

  • Enzymes involved in glycosilation: HIV’s envelope protein is heavily glycosilated assisting in the virus entry to cells