Tools in bioinformatics l.jpg
Sponsored Links
This presentation is the property of its rightful owner.
1 / 52

Tools in bioinformatics PowerPoint PPT Presentation


  • 143 Views
  • Uploaded on
  • Presentation posted in: General

Tools in bioinformatics. Fall 2009-10. Goals. Overview. To provide students with practical knowledge of bioinformatics tools and their application in research. Prerequisites. The course “Introduction to bioinformatics”

Download Presentation

Tools in bioinformatics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tools in bioinformatics

Fall 2009-10


Goals

Overview

  • To provide students with practical knowledge of bioinformatics tools and their application in research

Prerequisites

  • The course “Introduction to bioinformatics”

  • Familiarity with topics in molecular biology (cell biology, biochemistry, and genetics)

  • Basic familiarity with computers & internet


Course website

Administration

http://ibis.tau.ac.il/intro_bioinfo/tools.html


Administration

Classes:

A class will be given every two weeks

There are three class groups:Sunday 16:00-18:00Monday 12:00-14:00

Monday 14:00-16:00

Location:

Computer classroom Sherman 03


Administration

Teachers:

  • Nimrod Rubinstein rubi@post.tau.ac.il (Sundays)

  • Daiana Alaluf daianaal@post.tau.ac.il (Mondays I)

  • Osnat Penn penn@post.tau.ac.il (Mondays II)

  • Reception hours:Email your instructor any question at any time or set an appointment (Britania 405, 6409245)


Requirements

  • Assignments – 50% of final grade (compulsory)

    • Assignments include class and home works:

      • Class works are planned to be completed during the lesson and handed in at the end of it. They will be checked but not graded.

      • Home works should be handed in the following lesson (two weeks after their hand out). They will be checked and graded.

  • Final project – 50% of final grade

    When emailing your instructor (a question, your assignment, or whatever) please state in the “Subject” field: “Tools in Bioinfo”, IDs, CW/HW number (if relevant)


BIOINFORMATICS DATABASES


What’s in a database?

  • Sequences – genes, proteins, etc…

  • Full genomes

  • Expression data

  • Structures

  • Annotation – information about genes/proteins:- function- cellular location- chromosomal location- introns/exons- phenotypes, diseases

  • Publications


NCBI and Entrez

  • One of the most largest and comprehensive databases belonging to the NIH (national institute of health.The primary Federal agency for conducting and supporting medical research in the USA)

  • Entrez is the search engine of NCBI

  • Search for :genes, proteins, genomes, structures, diseases, publications, and more

http://www.ncbi.nlm.nih.gov


PubMed: NCBI’s database of biomedical articles

Yang X, Kurteva S, Ren X, Lee S, Sodroski J. “Subunit stoichiometry of human immunodeficiency virus type 1 envelope glycoprotein trimers during virus entry into host cells “, J Virol. 2006 May;80(9):4388-95.


Use fields!

Yang[AU] AND glycoprotein[TI] AND 2006[DP] AND J virol[TA]

For the full list of field tags: go to help -> Search Field Descriptions and Tags


Example

  • Retrieve all publications in which the first author is:Davidovich C and the last author is: Yonath A


Using limits

Retrieve the publications of Yonath A, in the journals: Nature and Proc Natl Acad Sci U S A., in the last 5 years


Google scholar

http://scholar.google.com/


GenBank: NCBI’s gene & protein database

  • GenBank is an annotated collection of all publicly available DNA sequences (and their amino-acid translations)

  • Holds ~106.5 billionbases of ~108.5 millionsequence records (Oct. 2009)


Searching NCBI for the protein human CD4

Search demonstration


Using field descriptions, qualifiers, and boolean operators

  • Cd4[GENE] AND human[ORGN] Or Cd4[gene name] AND human[organism]

  • List of field codes: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/Summary_Matrices.html#Search_Fields_and_Qualifiers

    • Boolean Operators:ANDORNOT

      Note: do not use the field Protein name [PROT], only GENE!


This time we directly search in the protein database


RefSeq

  • Subcollection of NCBI databases with only non-redundant, highly annotated entries (genomic DNA, transcript (RNA), and protein products)


An explanation on GenBank records


Fasta format

header

description

ID/accession

> gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens]MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSIVYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPLHLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAKVSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCVRCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI

sequence

Save accession numbers for future use (makes searching quicker):RefSeq accession number: NP_000607.1

24


Downloading

25


Swissprot

  • A protein sequence database which strives to provide a high level of annotation regarding:* the function of a protein* domains structure* post-translational modifications* variants

  • One entry for each protein

http://www.expasy.ch/sprot


GenBank Vs. Swissprot

Swiss-Prot results

GenBank results


PDB: Protein Data Bank

  • Main database of 3D structures of macromolecules

  • Includes ~61,000 entries (proteins, nucleic acids, complex assemblies)

  • Is highly redundant

http://www.rcsb.org


Human CD4 in complex with HIV gp120

PDB ID 1G9M

gp120

CD4


Accession Numbers


GeneCards

  • All-in-one database of human genes (a project by the Weizmann institute)

  • Attempts to integrate as many as possible databases, publications, and all available knowledge

http://www.genecards.org


Organism specific databases

  • Model organisms have independent databases:

HIV database http://hiv-web.lanl.gov/content/index


Summary

  • General and comprehensive databases:

    • NCBI, EMBL

  • Genome specific databases (to be discussed):

    • UCSC, ENSEMBL

  • Highly annotated databases:

    • Human genes

      • Genecards

    • Proteins:

      • Swissprot, RefSeq

    • Structures:

      • PDB


As important:

  • Google (or any search engine)


And always remember:

  • RT(F)M -Read the manual!!! (/help/FAQ)


GO: Gene Ontology


Gene Ontology

  • Strives to provide consistent descriptions of gene products obtained from different databases

  • GO annotations include three hierarchicalontologies of gene products:

    • cellular component(s) – the environment in which the gene product functions

    • biological processe(s) – the biological program/pathway in which the gene product is involved

    • molecular function(s) – the elemental activities of the gene product

  • E.g., cytochrome c:

    • cellular components: mitochondrial matrix and mitochondrial inner membrane

    • biological processes: oxidative phosphorylation and induction of cell death

    • molecular functions: oxidoreductase activity


AmiGO: the official GO browser


.

.


Through NCBI


.

.

.

.


Enrichment analysis

Query set

Reference set

N

n

k

K

Total – N genes

Function f – K genes

Total – n genes

Function f – k genes

Is k/n > K/N, significantly ???


Statistical significance testing

Problem formulation:

In a group of N genes there are K “special” ones

If we sample n genes out of N (without replacement), and found k “special” ones, would that be considered a random outcome?

Mathematically, we use the hypergeometric distribution to compute the probability of obtaining k or more “special” ones in a sample of n


Materials & Methods

21,121 siRNA knockdown assays, literally covering the entire coding-sequence part of the genome


Results

273 HIV-dependency factors (HDFs) were discovered

Biological processes


Molecular functions

Subcellular localizations


Observations

  • Nuclear pore complex: their loss may impede HIV nuclear access

  • Mediator members (couples TFs to Pol II): requirement for activators to bind HIV LTRs

  • Enzymes involved in glycosilation: HIV’s envelope protein is heavily glycosilated assisting in the virus entry to cells


  • Login