Tools in bioinformatics l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 52

Tools in bioinformatics PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on
  • Presentation posted in: General

Tools in bioinformatics. Fall 2009-10. Goals. Overview. To provide students with practical knowledge of bioinformatics tools and their application in research. Prerequisites. The course “Introduction to bioinformatics”

Download Presentation

Tools in bioinformatics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tools in bioinformatics l.jpg

Tools in bioinformatics

Fall 2009-10


Goals l.jpg

Goals

Overview

  • To provide students with practical knowledge of bioinformatics tools and their application in research

Prerequisites

  • The course “Introduction to bioinformatics”

  • Familiarity with topics in molecular biology (cell biology, biochemistry, and genetics)

  • Basic familiarity with computers & internet


Course website l.jpg

Course website

Administration

http://ibis.tau.ac.il/intro_bioinfo/tools.html


Administration l.jpg

Administration

Classes:

A class will be given every two weeks

There are three class groups:Sunday 16:00-18:00Monday 12:00-14:00

Monday 14:00-16:00

Location:

Computer classroom Sherman 03


Administration5 l.jpg

Administration

Teachers:

  • Nimrod Rubinstein [email protected] (Sundays)

  • Daiana Alaluf [email protected] (Mondays I)

  • Osnat Penn [email protected] (Mondays II)

  • Reception hours:Email your instructor any question at any time or set an appointment (Britania 405, 6409245)


Slide6 l.jpg

Requirements

  • Assignments – 50% of final grade (compulsory)

    • Assignments include class and home works:

      • Class works are planned to be completed during the lesson and handed in at the end of it. They will be checked but not graded.

      • Home works should be handed in the following lesson (two weeks after their hand out). They will be checked and graded.

  • Final project – 50% of final grade

    When emailing your instructor (a question, your assignment, or whatever) please state in the “Subject” field: “Tools in Bioinfo”, IDs, CW/HW number (if relevant)


Bioinformatics databases l.jpg

BIOINFORMATICS DATABASES


What s in a database l.jpg

What’s in a database?

  • Sequences – genes, proteins, etc…

  • Full genomes

  • Expression data

  • Structures

  • Annotation – information about genes/proteins:- function- cellular location- chromosomal location- introns/exons- phenotypes, diseases

  • Publications


Ncbi and entrez l.jpg

NCBI and Entrez

  • One of the most largest and comprehensive databases belonging to the NIH (national institute of health.The primary Federal agency for conducting and supporting medical research in the USA)

  • Entrez is the search engine of NCBI

  • Search for :genes, proteins, genomes, structures, diseases, publications, and more

http://www.ncbi.nlm.nih.gov


Pubmed ncbi s database of biomedical articles l.jpg

PubMed: NCBI’s database of biomedical articles

Yang X, Kurteva S, Ren X, Lee S, Sodroski J. “Subunit stoichiometry of human immunodeficiency virus type 1 envelope glycoprotein trimers during virus entry into host cells “, J Virol. 2006 May;80(9):4388-95.


Use fields l.jpg

Use fields!

Yang[AU] AND glycoprotein[TI] AND 2006[DP] AND J virol[TA]

For the full list of field tags: go to help -> Search Field Descriptions and Tags


Example l.jpg

Example

  • Retrieve all publications in which the first author is:Davidovich C and the last author is: Yonath A


Using limits l.jpg

Using limits

Retrieve the publications of Yonath A, in the journals: Nature and Proc Natl Acad Sci U S A., in the last 5 years


Google scholar l.jpg

Google scholar

http://scholar.google.com/


Genbank ncbi s gene protein database l.jpg

GenBank: NCBI’s gene & protein database

  • GenBank is an annotated collection of all publicly available DNA sequences (and their amino-acid translations)

  • Holds ~106.5 billionbases of ~108.5 millionsequence records (Oct. 2009)


Searching ncbi for the protein human cd4 l.jpg

Searching NCBI for the protein human CD4

Search demonstration


Using field descriptions qualifiers and boolean operators l.jpg

Using field descriptions, qualifiers, and boolean operators

  • Cd4[GENE] AND human[ORGN] Or Cd4[gene name] AND human[organism]

  • List of field codes: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/Summary_Matrices.html#Search_Fields_and_Qualifiers

    • Boolean Operators:ANDORNOT

      Note: do not use the field Protein name [PROT], only GENE!


Slide20 l.jpg

This time we directly search in the protein database


Refseq l.jpg

RefSeq

  • Subcollection of NCBI databases with only non-redundant, highly annotated entries (genomic DNA, transcript (RNA), and protein products)


Slide23 l.jpg

An explanation on GenBank records


Fasta format l.jpg

Fasta format

header

description

ID/accession

> gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens]MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSIVYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPLHLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAKVSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCVRCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI

sequence

Save accession numbers for future use (makes searching quicker):RefSeq accession number: NP_000607.1

24


Downloading l.jpg

Downloading

25


Swissprot l.jpg

Swissprot

  • A protein sequence database which strives to provide a high level of annotation regarding:* the function of a protein* domains structure* post-translational modifications* variants

  • One entry for each protein

http://www.expasy.ch/sprot


Genbank vs swissprot l.jpg

GenBank Vs. Swissprot

Swiss-Prot results

GenBank results


Pdb protein data bank l.jpg

PDB: Protein Data Bank

  • Main database of 3D structures of macromolecules

  • Includes ~61,000 entries (proteins, nucleic acids, complex assemblies)

  • Is highly redundant

http://www.rcsb.org


Human cd4 in complex with hiv gp120 l.jpg

Human CD4 in complex with HIV gp120

PDB ID 1G9M

gp120

CD4


Accession numbers l.jpg

Accession Numbers


Genecards l.jpg

GeneCards

  • All-in-one database of human genes (a project by the Weizmann institute)

  • Attempts to integrate as many as possible databases, publications, and all available knowledge

http://www.genecards.org


Organism specific databases l.jpg

Organism specific databases

  • Model organisms have independent databases:

HIV database http://hiv-web.lanl.gov/content/index


Summary l.jpg

Summary

  • General and comprehensive databases:

    • NCBI, EMBL

  • Genome specific databases (to be discussed):

    • UCSC, ENSEMBL

  • Highly annotated databases:

    • Human genes

      • Genecards

    • Proteins:

      • Swissprot, RefSeq

    • Structures:

      • PDB


As important l.jpg

As important:

  • Google (or any search engine)


And always remember l.jpg

And always remember:

  • RT(F)M -Read the manual!!! (/help/FAQ)


Go g ene o ntology l.jpg

GO: Gene Ontology


Gene ontology l.jpg

Gene Ontology

  • Strives to provide consistent descriptions of gene products obtained from different databases

  • GO annotations include three hierarchicalontologies of gene products:

    • cellular component(s) – the environment in which the gene product functions

    • biological processe(s) – the biological program/pathway in which the gene product is involved

    • molecular function(s) – the elemental activities of the gene product

  • E.g., cytochrome c:

    • cellular components: mitochondrial matrix and mitochondrial inner membrane

    • biological processes: oxidative phosphorylation and induction of cell death

    • molecular functions: oxidoreductase activity


Amigo the official go browser l.jpg

AmiGO: the official GO browser


Slide43 l.jpg

.

.


Through ncbi l.jpg

Through NCBI


Slide45 l.jpg

.

.

.

.


Enrichment analysis l.jpg

Enrichment analysis

Query set

Reference set

N

n

k

K

Total – N genes

Function f – K genes

Total – n genes

Function f – k genes

Is k/n > K/N, significantly ???


Statistical significance testing l.jpg

Statistical significance testing

Problem formulation:

In a group of N genes there are K “special” ones

If we sample n genes out of N (without replacement), and found k “special” ones, would that be considered a random outcome?

Mathematically, we use the hypergeometric distribution to compute the probability of obtaining k or more “special” ones in a sample of n


Materials methods l.jpg

Materials & Methods

21,121 siRNA knockdown assays, literally covering the entire coding-sequence part of the genome


Results l.jpg

Results

273 HIV-dependency factors (HDFs) were discovered

Biological processes


Slide51 l.jpg

Molecular functions

Subcellular localizations


Observations l.jpg

Observations

  • Nuclear pore complex: their loss may impede HIV nuclear access

  • Mediator members (couples TFs to Pol II): requirement for activators to bind HIV LTRs

  • Enzymes involved in glycosilation: HIV’s envelope protein is heavily glycosilated assisting in the virus entry to cells


  • Login