Tools in bioinformatics
Download
1 / 52

Tools in bioinformatics - PowerPoint PPT Presentation


  • 171 Views
  • Updated On :

Tools in bioinformatics. Fall 2009-10. Goals. Overview. To provide students with practical knowledge of bioinformatics tools and their application in research. Prerequisites. The course “Introduction to bioinformatics”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tools in bioinformatics' - lola


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Goals l.jpg
Goals

Overview

  • To provide students with practical knowledge of bioinformatics tools and their application in research

Prerequisites

  • The course “Introduction to bioinformatics”

  • Familiarity with topics in molecular biology (cell biology, biochemistry, and genetics)

  • Basic familiarity with computers & internet


Course website l.jpg
Course website

Administration

http://ibis.tau.ac.il/intro_bioinfo/tools.html


Administration l.jpg
Administration

Classes:

A class will be given every two weeks

There are three class groups:Sunday 16:00-18:00Monday 12:00-14:00

Monday 14:00-16:00

Location:

Computer classroom Sherman 03


Administration5 l.jpg
Administration

Teachers:

  • Nimrod Rubinstein [email protected] (Sundays)

  • Daiana Alaluf [email protected] (Mondays I)

  • Osnat Penn [email protected] (Mondays II)

  • Reception hours:Email your instructor any question at any time or set an appointment (Britania 405, 6409245)


Slide6 l.jpg

Requirements

  • Assignments – 50% of final grade (compulsory)

    • Assignments include class and home works:

      • Class works are planned to be completed during the lesson and handed in at the end of it. They will be checked but not graded.

      • Home works should be handed in the following lesson (two weeks after their hand out). They will be checked and graded.

  • Final project – 50% of final grade

    When emailing your instructor (a question, your assignment, or whatever) please state in the “Subject” field: “Tools in Bioinfo”, IDs, CW/HW number (if relevant)



What s in a database l.jpg
What’s in a database?

  • Sequences – genes, proteins, etc…

  • Full genomes

  • Expression data

  • Structures

  • Annotation – information about genes/proteins:- function- cellular location- chromosomal location- introns/exons- phenotypes, diseases

  • Publications


Ncbi and entrez l.jpg
NCBI and Entrez

  • One of the most largest and comprehensive databases belonging to the NIH (national institute of health.The primary Federal agency for conducting and supporting medical research in the USA)

  • Entrez is the search engine of NCBI

  • Search for :genes, proteins, genomes, structures, diseases, publications, and more

http://www.ncbi.nlm.nih.gov


Pubmed ncbi s database of biomedical articles l.jpg
PubMed: NCBI’s database of biomedical articles

Yang X, Kurteva S, Ren X, Lee S, Sodroski J. “Subunit stoichiometry of human immunodeficiency virus type 1 envelope glycoprotein trimers during virus entry into host cells “, J Virol. 2006 May;80(9):4388-95.


Use fields l.jpg
Use fields!

Yang[AU] AND glycoprotein[TI] AND 2006[DP] AND J virol[TA]

For the full list of field tags: go to help -> Search Field Descriptions and Tags


Example l.jpg
Example

  • Retrieve all publications in which the first author is:Davidovich C and the last author is: Yonath A


Using limits l.jpg
Using limits

Retrieve the publications of Yonath A, in the journals: Nature and Proc Natl Acad Sci U S A., in the last 5 years


Google scholar l.jpg
Google scholar

http://scholar.google.com/


Genbank ncbi s gene protein database l.jpg
GenBank: NCBI’s gene & protein database

  • GenBank is an annotated collection of all publicly available DNA sequences (and their amino-acid translations)

  • Holds ~106.5 billionbases of ~108.5 millionsequence records (Oct. 2009)


Searching ncbi for the protein human cd4 l.jpg
Searching NCBI for the protein human CD4

Search demonstration


Using field descriptions qualifiers and boolean operators l.jpg
Using field descriptions, qualifiers, and boolean operators

  • Cd4[GENE] AND human[ORGN] Or Cd4[gene name] AND human[organism]

  • List of field codes: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/Summary_Matrices.html#Search_Fields_and_Qualifiers

    • Boolean Operators:ANDORNOT

      Note: do not use the field Protein name [PROT], only GENE!



Refseq l.jpg
RefSeq

  • Subcollection of NCBI databases with only non-redundant, highly annotated entries (genomic DNA, transcript (RNA), and protein products)



Fasta format l.jpg
Fasta format

header

description

ID/accession

> gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens]MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSIVYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPLHLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAKVSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCVRCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI

sequence

Save accession numbers for future use (makes searching quicker):RefSeq accession number: NP_000607.1

24



Swissprot l.jpg
Swissprot

  • A protein sequence database which strives to provide a high level of annotation regarding:* the function of a protein* domains structure* post-translational modifications* variants

  • One entry for each protein

http://www.expasy.ch/sprot


Genbank vs swissprot l.jpg
GenBank Vs. Swissprot

Swiss-Prot results

GenBank results


Pdb protein data bank l.jpg
PDB: Protein Data Bank

  • Main database of 3D structures of macromolecules

  • Includes ~61,000 entries (proteins, nucleic acids, complex assemblies)

  • Is highly redundant

http://www.rcsb.org




Genecards l.jpg
GeneCards

  • All-in-one database of human genes (a project by the Weizmann institute)

  • Attempts to integrate as many as possible databases, publications, and all available knowledge

http://www.genecards.org


Organism specific databases l.jpg
Organism specific databases

  • Model organisms have independent databases:

HIV database http://hiv-web.lanl.gov/content/index


Summary l.jpg
Summary

  • General and comprehensive databases:

    • NCBI, EMBL

  • Genome specific databases (to be discussed):

    • UCSC, ENSEMBL

  • Highly annotated databases:

    • Human genes

      • Genecards

    • Proteins:

      • Swissprot, RefSeq

    • Structures:

      • PDB


As important l.jpg
As important:

  • Google (or any search engine)


And always remember l.jpg
And always remember:

  • RT(F)M -Read the manual!!! (/help/FAQ)


Go g ene o ntology l.jpg
GO: Gene Ontology


Gene ontology l.jpg
Gene Ontology

  • Strives to provide consistent descriptions of gene products obtained from different databases

  • GO annotations include three hierarchicalontologies of gene products:

    • cellular component(s) – the environment in which the gene product functions

    • biological processe(s) – the biological program/pathway in which the gene product is involved

    • molecular function(s) – the elemental activities of the gene product

  • E.g., cytochrome c:

    • cellular components: mitochondrial matrix and mitochondrial inner membrane

    • biological processes: oxidative phosphorylation and induction of cell death

    • molecular functions: oxidoreductase activity


Amigo the official go browser l.jpg
AmiGO: the official GO browser


Slide43 l.jpg

.

.



Slide45 l.jpg

.

.

.

.


Enrichment analysis l.jpg
Enrichment analysis

Query set

Reference set

N

n

k

K

Total – N genes

Function f – K genes

Total – n genes

Function f – k genes

Is k/n > K/N, significantly ???


Statistical significance testing l.jpg
Statistical significance testing

Problem formulation:

In a group of N genes there are K “special” ones

If we sample n genes out of N (without replacement), and found k “special” ones, would that be considered a random outcome?

Mathematically, we use the hypergeometric distribution to compute the probability of obtaining k or more “special” ones in a sample of n


Materials methods l.jpg
Materials & Methods

21,121 siRNA knockdown assays, literally covering the entire coding-sequence part of the genome


Results l.jpg
Results

273 HIV-dependency factors (HDFs) were discovered

Biological processes


Slide51 l.jpg

Molecular functions

Subcellular localizations


Observations l.jpg
Observations

  • Nuclear pore complex: their loss may impede HIV nuclear access

  • Mediator members (couples TFs to Pol II): requirement for activators to bind HIV LTRs

  • Enzymes involved in glycosilation: HIV’s envelope protein is heavily glycosilated assisting in the virus entry to cells


ad