tools in bioinformatics
Download
Skip this Video
Download Presentation
Tools in bioinformatics

Loading in 2 Seconds...

play fullscreen
1 / 52

Tools in bioinformatics - PowerPoint PPT Presentation


  • 171 Views
  • Uploaded on

Tools in bioinformatics. Fall 2009-10. Goals. Overview. To provide students with practical knowledge of bioinformatics tools and their application in research. Prerequisites. The course “Introduction to bioinformatics”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tools in bioinformatics' - lola


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
goals
Goals

Overview

  • To provide students with practical knowledge of bioinformatics tools and their application in research

Prerequisites

  • The course “Introduction to bioinformatics”
  • Familiarity with topics in molecular biology (cell biology, biochemistry, and genetics)
  • Basic familiarity with computers & internet
course website
Course website

Administration

http://ibis.tau.ac.il/intro_bioinfo/tools.html

administration
Administration

Classes:

A class will be given every two weeks

There are three class groups:Sunday 16:00-18:00Monday 12:00-14:00

Monday 14:00-16:00

Location:

Computer classroom Sherman 03

administration5
Administration

Teachers:

slide6

Requirements

  • Assignments – 50% of final grade (compulsory)
    • Assignments include class and home works:
      • Class works are planned to be completed during the lesson and handed in at the end of it. They will be checked but not graded.
      • Home works should be handed in the following lesson (two weeks after their hand out). They will be checked and graded.
  • Final project – 50% of final grade

When emailing your instructor (a question, your assignment, or whatever) please state in the “Subject” field: “Tools in Bioinfo”, IDs, CW/HW number (if relevant)

what s in a database
What’s in a database?
  • Sequences – genes, proteins, etc…
  • Full genomes
  • Expression data
  • Structures
  • Annotation – information about genes/proteins:- function- cellular location- chromosomal location- introns/exons- phenotypes, diseases
  • Publications
ncbi and entrez
NCBI and Entrez
  • One of the most largest and comprehensive databases belonging to the NIH (national institute of health.The primary Federal agency for conducting and supporting medical research in the USA)
  • Entrez is the search engine of NCBI
  • Search for :genes, proteins, genomes, structures, diseases, publications, and more

http://www.ncbi.nlm.nih.gov

pubmed ncbi s database of biomedical articles
PubMed: NCBI’s database of biomedical articles

Yang X, Kurteva S, Ren X, Lee S, Sodroski J. “Subunit stoichiometry of human immunodeficiency virus type 1 envelope glycoprotein trimers during virus entry into host cells “, J Virol. 2006 May;80(9):4388-95.

use fields
Use fields!

Yang[AU] AND glycoprotein[TI] AND 2006[DP] AND J virol[TA]

For the full list of field tags: go to help -> Search Field Descriptions and Tags

example
Example
  • Retrieve all publications in which the first author is:Davidovich C and the last author is: Yonath A
using limits
Using limits

Retrieve the publications of Yonath A, in the journals: Nature and Proc Natl Acad Sci U S A., in the last 5 years

google scholar
Google scholar

http://scholar.google.com/

genbank ncbi s gene protein database
GenBank: NCBI’s gene & protein database
  • GenBank is an annotated collection of all publicly available DNA sequences (and their amino-acid translations)
  • Holds ~106.5 billionbases of ~108.5 millionsequence records (Oct. 2009)
using field descriptions qualifiers and boolean operators
Using field descriptions, qualifiers, and boolean operators
  • Cd4[GENE] AND human[ORGN] Or Cd4[gene name] AND human[organism]
  • List of field codes: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/Summary_Matrices.html#Search_Fields_and_Qualifiers
    • Boolean Operators:ANDORNOT

Note: do not use the field Protein name [PROT], only GENE!

refseq
RefSeq
  • Subcollection of NCBI databases with only non-redundant, highly annotated entries (genomic DNA, transcript (RNA), and protein products)
fasta format
Fasta format

header

description

ID/accession

> gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens]MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSIVYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPLHLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAKVSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCVRCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI

sequence

Save accession numbers for future use (makes searching quicker):RefSeq accession number: NP_000607.1

24

swissprot
Swissprot
  • A protein sequence database which strives to provide a high level of annotation regarding:* the function of a protein* domains structure* post-translational modifications* variants
  • One entry for each protein

http://www.expasy.ch/sprot

genbank vs swissprot
GenBank Vs. Swissprot

Swiss-Prot results

GenBank results

pdb protein data bank
PDB: Protein Data Bank
  • Main database of 3D structures of macromolecules
  • Includes ~61,000 entries (proteins, nucleic acids, complex assemblies)
  • Is highly redundant

http://www.rcsb.org

genecards
GeneCards
  • All-in-one database of human genes (a project by the Weizmann institute)
  • Attempts to integrate as many as possible databases, publications, and all available knowledge

http://www.genecards.org

organism specific databases
Organism specific databases
  • Model organisms have independent databases:

HIV database http://hiv-web.lanl.gov/content/index

summary
Summary
  • General and comprehensive databases:
    • NCBI, EMBL
  • Genome specific databases (to be discussed):
    • UCSC, ENSEMBL
  • Highly annotated databases:
    • Human genes
      • Genecards
    • Proteins:
      • Swissprot, RefSeq
    • Structures:
      • PDB
as important
As important:
  • Google (or any search engine)
and always remember
And always remember:
  • RT(F)M -Read the manual!!! (/help/FAQ)
gene ontology
Gene Ontology
  • Strives to provide consistent descriptions of gene products obtained from different databases
  • GO annotations include three hierarchicalontologies of gene products:
    • cellular component(s) – the environment in which the gene product functions
    • biological processe(s) – the biological program/pathway in which the gene product is involved
    • molecular function(s) – the elemental activities of the gene product
  • E.g., cytochrome c:
    • cellular components: mitochondrial matrix and mitochondrial inner membrane
    • biological processes: oxidative phosphorylation and induction of cell death
    • molecular functions: oxidoreductase activity
slide43

.

.

slide45

.

.

.

.

enrichment analysis
Enrichment analysis

Query set

Reference set

N

n

k

K

Total – N genes

Function f – K genes

Total – n genes

Function f – k genes

Is k/n > K/N, significantly ???

statistical significance testing
Statistical significance testing

Problem formulation:

In a group of N genes there are K “special” ones

If we sample n genes out of N (without replacement), and found k “special” ones, would that be considered a random outcome?

Mathematically, we use the hypergeometric distribution to compute the probability of obtaining k or more “special” ones in a sample of n

materials methods
Materials & Methods

21,121 siRNA knockdown assays, literally covering the entire coding-sequence part of the genome

results
Results

273 HIV-dependency factors (HDFs) were discovered

Biological processes

slide51

Molecular functions

Subcellular localizations

observations
Observations
  • Nuclear pore complex: their loss may impede HIV nuclear access
  • Mediator members (couples TFs to Pol II): requirement for activators to bind HIV LTRs
  • Enzymes involved in glycosilation: HIV’s envelope protein is heavily glycosilated assisting in the virus entry to cells
ad