Understanding protein lists from proteomics studies
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Understanding protein lists from proteomics studies PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on
  • Presentation posted in: General

Understanding protein lists from proteomics studies . Bing Zhang Department of Biomedical Informatics Vanderbilt University [email protected] A typical comparative shotgun proteomics study. IPI00375843 IPI00171798 IPI00299485 IPI00009542 IPI00019568 IPI00060627 IPI00168262

Download Presentation

Understanding protein lists from proteomics studies

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Understanding protein lists from proteomics studies

Understanding protein lists from proteomics studies

Bing Zhang

Department of Biomedical Informatics

Vanderbilt University

[email protected]


A typical comparative shotgun proteomics study

A typical comparative shotgun proteomics study

IPI00375843

IPI00171798

IPI00299485

IPI00009542

IPI00019568

IPI00060627

IPI00168262

IPI00082931

IPI00025084

IPI00412546

IPI00165528

IPI00043992

IPI00384992

IPI00006991

IPI00021885

IPI00377045

IPI00022471

…….

Li et.al. JPR, 2010

BCHM352, Spring 2011


Omics technologies generate gene protein lists

Omics technologies generate gene/protein lists

  • Genomics

    • Genome Wide Association Study (GWAS)

  • Transcriptomics

    • mRNA profiling

      • Microarrays

      • Serial analysis of gene expression (SAGE)

      • RNA-Seq

    • Protein-DNA interaction

      • Chromatin immunoprecipitation

  • Proteomics

    • Protein profiling

      • LC-MS/MS

    • Protein-protein interaction

      • Yeast two hybrid

      • Affinity pull-down/LC-MS/MS

BCHM352, Spring 2011


Sample files

Sample files

  • Samples files can be downloaded from

    • http://bioinfo.vanderbilt.edu/zhanglab/?q=node/410

  • Significant proteins

    • hnscc_sig_proteins.txt

  • Significant proteins with log fold change

    • hnscc_sig_withLogRatio.txt

  • All proteins identified in the study

    • hnscc_all_proteins.txt

BCHM352, Spring 2011


Understanding a protein list

Understanding a protein list

  • Level I

    • What are the proteins/genes behind the IDs and what do we know about the functions of the proteins/genes?

  • Level II

    • Which biological processes and pathways are the most interesting in terms of the experimental question?

  • Level III

    • How do the proteins work together to form a network?

BCHM352, Spring 2011


Level one information retrieval

Level one: information retrieval

Query interface (http://www.ebi.ac.uk/IPI)

Output

  • One-protein-at-a-time

  • Time consuming

  • Information is local and isolated

  • Hard to automate the information retrieval process

BCHM352, Spring 2011


Biomart a batch information retrieval system

Biomart: a batch information retrieval system

  • Biomart is a query-oriented data management system.

  • Batch information retrieval for complex queries

  • Particularly suited for providing 'data mining' like searches of complex descriptive data such as those related to genes and proteins

  • Open source and can be customized

  • Originally developed for the Ensembl genome databases

  • Adopted by many other projects including UniProt, InterPro, Reactome, Pancreatic Expression Database, and many others (see a complete list and get access to the tools from http://www.biomart.org/ )

BCHM352, Spring 2011


Biomart basic concepts

BioMart: basic concepts

  • Dataset

  • Filter

  • Attribute

  • From Prof. Kevin Schey (Biochemistry):“I’ve attached a spreadsheet of our proteomics results comparing 5 Vehicle and 5 Aldosterone treated patients. We’ve included only those proteins whose summed spectral counts are >30 in one treatment group. Would it be possible to get the GO annotationsfor these? The Uniprot name is listed in column Aand the gene name is listed in column R.  If this is a time consuming task (and I imagine that it is), can you tell me how to do it?”

  • From all human genes, selected those with the listed Uniprot IDs, and retrieve GO annotations.

BCHM352, Spring 2011


Ensembl biomart analysis

EnsemblBiomart analysis

  • Choose dataset

    • Choose database: Ensembl Genes 61

    • Choose dataset: Homo sapiens genes (GRch37)

  • Set filters

    • Gene: a list of genes/proteins identified by various database IDs (e.g. IPI IDs)

    • Gene Ontology: filter for proteins with specific GO terms (e.g. cell cycle)

    • Protein domains: filter for proteins with specific protein domains (e.g. SH2 domain)

    • Region: filter for genes in a specific chromosome region (e.g. chr1 1:1000000 or 11q13)

    • Others

  • Select output attributes

    • Gene annotation information in the Ensembl database, e.g. gene description, chromosome name, gene start, gene end, strand, band, gene name, etc.

    • External data: Gene Ontology, IDs in other databases

    • Expression: anatomical system, development stage, cell type, pathology

    • Protein domains: SMART, PFAM, Interpro, etc.

BCHM352, Spring 2011


Ensembl biomart query interface

EnsemblBioMart: query interface

Perl API

Count

Help

Results

Choose dataset

Set filters

Select output attributes

BCHM352, Spring 2011


Ensembl biomart sample output

EnsemblBiomart: sample output

Export all results to a file

BCHM352, Spring 2011


Understanding a protein list1

Understanding a protein list

  • Level I

    • What are the proteins/genes behind the IDs and what do we know about the functions of the proteins/genes?

  • Level II

    • Which biological processes and pathways are the most interesting in terms of the experimental question?

  • Level III

    • How do the proteins work together to form a network?

BCHM352, Spring 2011


Re organization of the information

Re-organization of the information

Differential genes

Functional category

Differential genes

Functional category

BCHM352, Spring 2011


Enrichment analysis

Enrichment analysis

MMP9

SERPINF1

A2ML1

F2

FN1

LYZ

TNXB

FGG

MPO

FBLN1

THBS1

HDLBP

GSN

FBN1

CA2

P11

CCL21

FGB

……

  • Enrichment analysis: is a functional group (e.g. cell cycle) significantly associated with the experimental question?

Random

Observed

9.2

22

IPI00375843IPI00171798IPI00299485IPI00009542IPI00019568IPI00060627IPI00168262IPI00082931IPI00025084IPI00412546IPI00165528IPI00043992IPI00384992IPI00006991IPI00021885…...

180

180

83

83

1305

1305

Compare

annotated

All identified proteins (1733)

Filter for significant proteins

Differentially expressed protein list (260 proteins)

Extracellular space

(83 proteins)

BCHM352, Spring 2011


Enrichment analysis hypergeometric test

Enrichment analysis: hypergeometric test

Hypergeometric test:given a total of m proteins where j proteins are in the functional group, if we pick n proteins randomly, what is the probability of having k or more proteins from the group?

Observed

k

n

j

m

Zhang et.al. Nucleic Acids Res. 33:W741, 2005

BCHM352, Spring 2011


Commonly used functional groups

Commonly used functional groups

  • Gene Ontology (http://www.geneontology.org)

    • Structured, precisely defined, controlled vocabulary for describing the roles of genes and gene products

    • Three organizing principles: molecular function, biological process, and cellular component

  • Pathways

    • KEGG (http://www.genome.jp/kegg/pathway.html)

    • Pathway commons (http://www.pathwaycommons.org)

    • WikiPathways (http://www.wikipathways.org)

  • Cytogenetic bands

BCHM352, Spring 2011


Webgestalt web based gene set analysis toolkit

WebGestalt: Web-based Gene Set Analysis Toolkit

WebGestalt

8 organisms

132 ID types

http://bioinfo.vanderbilt.edu/webgestalt

73,986 functional groups

Zhang et.al. Nucleic Acids Res. 33:W741, 2005

Duncan et al. BMC Bioinformatics. 11(Suppl 4) :P10, 2010

BCHM352, Spring 2011


Webgestalt analysis

WebGestalt analysis

  • Select the organism of interest.

  • Upload a gene/protein list in the txt format, one ID per row. Optionally, a value can be provided for each ID. In this case, put the ID and value in the same row and separate them by a tab. Then pick the ID type that corresponds to the list of IDs.

  • Categorize the uploaded ID list based upon GO Slim (a simplified version of Gene Ontology that focuses on high level classifications).

  • Analyze the uploaded ID list for for enrichment in various biological contexts. You will need to select an appropriate predefined reference set or upload a reference set. If a customized reference set is uploaded, ID type also needs to be selected. After this, select the analysis parameters (e.g., significance level, multiple test adjustment method, etc.).

  • Retrieve enrichment results by opening the respective results files. You may also open and/or download a TSV file, or download the zipped results to a directory on your desktop.

BCHM352, Spring 2011


Webgestalt id mapping

WebGestalt: ID mapping

  • Input list

    • 260 significant proteins identified in the HNSCC study (hnscc_sig_withLogRatio.txt)

  • Mapping result

    • Total number of User IDs: 260. Unambiguously mapped User IDs to Entrez IDs: 229. Unique User Entrez IDs: 224. The Enrichment Analysis will be based upon the unique IDs.

BCHM352, Spring 2011


Webgestalt goslim classification

WebGestalt: GOSlim classification

Molecular function

Biological process

Cellular component

BCHM352, Spring 2011


Webgestalt top 10 enriched go biological processes

WebGestalt: top 10 enriched GO biological processes

Reference list:

CSHL2010_hnscc_all_proteins.txt

BCHM352, Spring 2011


Webgestalt top 10 enriched wikipathways

WebGestalt: top 10 enriched WikiPathways

BCHM352, Spring 2011


Understanding a protein list2

Understanding a protein list

  • Level I

    • What are the proteins/genes behind the IDs and what do we know about the functions of the proteins/genes?

  • Level II

    • Which biological processes and pathways are the most interesting in terms of the experimental question?

  • Level III

    • How do the proteins work together to form a network?

BCHM352, Spring 2011


Resources

Resources

  • GeneMANIA

    • http://genemania.org

  • String

    • http://string-db.org/

  • Genes2Networks

    • http://actin.pharm.mssm.edu/genes2networks/

BCHM352, Spring 2011


Understanding a protein list summary

Understanding a protein list: summary

  • Level I

    • What are the proteins/genes behind the IDs and what do we know about the functions of the proteins/genes?

    • Biomart (http://www.biomart.org/)

  • Level II

    • Which biological processes and pathways are the most interesting in terms of the experimental question?

    • WebGestalt (http://bioinfo.vanderbilt.edu/webgestalt)

    • Related tools: DAVID (http://david.abcc.ncifcrf.gov/), GenMAPP (http://www.genmapp.org/)

  • Level III

    • How do the proteins work together to form a network?

    • GeneMANIA (http://genemania.org)

    • Related tools: Cytoscape (http://www.cytoscape.org/), STRING (http://string.embl.de/), Genes2Networks (http://actin.pharm.mssm.edu/genes2networks), Ingenuity (http://www.ingenuity.com/), Pathway Studio (http://www.ariadnegenomics.com/products/pathway-studio/)

BCHM352, Spring 2011


  • Login