arrayexpress and expression atlas mining functional genomics data n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
ArrayExpress and Expression Atlas: Mining Functional Genomics data PowerPoint Presentation
Download Presentation
ArrayExpress and Expression Atlas: Mining Functional Genomics data

Loading in 2 Seconds...

play fullscreen
1 / 52

ArrayExpress and Expression Atlas: Mining Functional Genomics data - PowerPoint PPT Presentation


  • 201 Views
  • Uploaded on

ArrayExpress and Expression Atlas: Mining Functional Genomics data. Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL gabry@ebi.ac.uk. What is functional genomics (FG)?. The aim of FG is to understand the function of genes and other parts of the genome

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ArrayExpress and Expression Atlas: Mining Functional Genomics data' - didina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
arrayexpress and expression atlas mining functional genomics data

ArrayExpress and Expression Atlas: Mining Functional Genomics data

Gabriella Rustici, PhD

Functional Genomics TeamEBI-EMBL

gabry@ebi.ac.uk

what is functional genomics fg
ArrayExpressWhat is functional genomics (FG)?
  • The aim of FG is to understand the function of genes and other parts of the genome
  • FG experiments typically utilize genome-wide assays to measure and track many genes (or proteins) in parallel under different conditions
  • High-throughput technologies such as microarrays and high-throughput sequencing (HTS) are frequently used in this field to interrogate the transcriptome
what biological questions is fg addressing
ArrayExpressWhat biological questions is FG addressing?
  • When and where are genes expressed?
  • How do gene expression levels differ in various cell types and states?
  • What are the functional roles of different genes and in what cellular processes do they participate?
  • How are genes regulated?
  • How do genes and gene products interact?
  • How is gene expression changed in various diseases or following a treatment?
slide4
ArrayExpress

Components of a FG experiment

fg public repositories arrayexpress
ArrayExpressFG public repositories: ArrayExpress
  • Is a public repository for FG data, which provides easy access to well annotated data in a structured and standardized format
  • Serves the scientific community as an archive for data supporting publications, together with GEO at NCBI and CIBEX at DDBJ
  • Facilitates the sharing of experimental information associated with the data such as microarray designs, experimental protocols,……
  • Based on community standards: MIAME guidelines & MAGE-TAB format for microarray, MINSEQE guidelines for HTS data (http://www.mged.org/minseqe/)
community standards for data requirement
ArrayExpressCommunity standards for data requirement
  • MIAME = Minimal Information About a Microarray Experiment
  • MINSEQE = Minimal Information about a high-throughput Nucleotide SEQuencing Experiment
  • The checklist:
standards for microarray sequencing mage tab format
MAGE-TAB is a simple spreadsheet format that uses a number of different files to capture information about a microarray or HTS experimentsStandards for microarray & sequencingMAGE-TAB format

ArrayExpress

what is the difference between them
What is the difference between them?

ArrayExpress Archive

  • Central object: experiment
  • Query to retrieve experimental information and associated data

Expression Atlas

  • Central object: gene/condition
  • Query for gene expression changes across experiments and across platforms

9

ArrayExpress

arrayexpress archive when to use it
ArrayExpress Archive – when to use it?
  • Find FG experiments that might be relevant to your research
  • Download data and re-analyze it.

Often data deposited in public repositories can be used to answer different biological questions from the one asked in the original experiments.

  • Submit microarray or HTS data that you want to publish.

Major journals will require data to be submitted to a public repository like ArrayExpress as part of the peer-review process.

11

ArrayExpress

hts data in ae archive as of mid september 2012
HTS data in AE Archive(as of mid-September 2012)

Microarray vs HTS

RNA-, DNA-, ChIP-seq breakdown

13

ArrayExpress

browsing the ae archive
Browsing the AE Archive

The date when the data were loaded in the Archive

Number of assays

Species investigated

Curated title of experiment

AE unique experiment ID

loaded in Atlas flag

Raw sequencing data available in ENA

The direct link to raw and processed data. An icon indicates that this type of data is available.

The total number of experiments and assay retrieved

The list of experiments retrieved can be printed, saved as Tab-delimited format or exported to Excel or as RSS feed

15

ArrayExpress

searching ae with the experimental factor ontology efo
Searching AE with the Experimental factor ontology (EFO)
  • Application focused ontology modeling the relationship between experimental factors (EFs) in AE
  • Developed to:
      • increase the richness of annotations that are currently made in AE Archive
      • to promote consistency
      • to facilitate automatic annotation and integrate external data
  • EFs are transformed into an ontological representation, forming classes and relationships between those classes
  • Combine terms from a subset of well-maintained and compatible ontologies, e.g. Gene Ontology, NCBI Taxonomy

17

ArrayExpress

slide18

Building EFO

An example

Take all experimental

factors

Find the logical connection between them

Organize them in an ontology

disease

disease

sarcoma

is the parent term

[-]

neoplasm

disease

neoplasm

cancer

is a type of

[-]

cancer

neoplasm

cancer

neoplasm

is synonym of

[-]

sarcoma

disease

sarcoma

cancer

is a type of

[-]

Kaposi’s sarcoma

Kaposi’s sarcoma

Kaposi’s sarcoma

sarcoma

is a type of

18

ArrayExpress

slide19

Exploring EFO

An example

More information at: http://www.ebi.ac.uk/efo

19

ArrayExpress

ae archive query output
AE Archive query output
  • Matches to exact terms are highlighted in yellow
  • Matches to synonyms are highlighted in green
  • Matches to child terms in the EFO are highlighted in pink

21

ArrayExpress

expression atlas when to use it
ArrayExpressExpression Atlas – when to use it?
  • Find out if the expression of a gene (or a group of genes with a common gene attribute, e.g. GO term) change(s) across all the experiments available in the Expression Atlas;
  • Discover which genes are differentially expressed in a particular biological condition that you are interested in.
expression atlas construction experiment selection criteria during curation
Array (platform) designs relating to the experiment must be provided. Probe annotation must be adequate to enable re-annotation of external references (e.g. Ensembl gene ID, Uniprot ID)

At least 3 replicates for each value of the experimental factor

Maximum 4 experimental factors

Adequate sample annotation using EFO terms

Presence of data files: CEL raw data files for Affymetrix assays, processed data files for non-Affymetrix ones

Expression Atlas constructionExperiment selection criteria during curation

26

ArrayExpress

expression atlas construction analysis pipeline
Expression AtlasconstructionAnalysispipeline

Cond.1

Cond.2

Cond.3

genes

Cond.1 Cond.2 Cond.3

Input data

(Affy CEL, non-Affy processed)

Linear model*

(Bio/C Limma)

Output: 2-D matrix

1= differentially expressed

0 = not differentially expressed

* More information about the statistical methodology: http://nar.oxfordjournals.org/content/38/suppl_1/D690.full

27

ArrayExpress

slide28

Expression AtlasconstructionAnalysispipeline

“Is gene X differentially expressed in condition 1 in this experiment?”

= a single expression value for gene X

Cond.1 mean

Cond.2 mean

Mean of all samples

Cond.3 mean

Compare and calculate statistic

28

ArrayExpress

slide29

Expression Atlasconstruction

Exp.1

Cond.1

Cond.2

Cond.3

Statistical

test

genes

Exp. 2

Cond.4

Cond.5

Cond.6

Statistical

test

genes

Cond.X

Cond.Y

Cond.Z

Exp. n

Statistical

test

genes

Each experiment has its own “verdict” or “vote” on whether a gene is differentially expressed or not under a certain condition

29

ArrayExpress

expression atlas construction
Expression Atlas construction

Summary of the “verdicts” from different experiments

30

ArrayExpress

expression atlas
Expression Atlas

31

ArrayExpress

atlas home page http www ebi ac uk gxa
Atlas home pagehttp://www.ebi.ac.uk/gxa/

Restrict query by direction of differential expression

Query for genes

Query for conditions

The ‘advanced query’ option allows building more complex queries

32

ArrayExpress

atlas experiment page
Atlas experiment page

36

ArrayExpress

atlas heatmap view
Atlas heatmap view

39

ArrayExpress

atlas advanced search
Atlas advanced search

41

ArrayExpress

atlas advanced search1
Atlas advanced search

42

ArrayExpress

atlas advanced search2
Atlas advanced search

43

ArrayExpress

slide44

A glimpse of what’s coming…

“Differential atlas”

“Is gene X differentially expressed in condition 1 in this experiment?”

= a single expression value for gene X

Cond.1 mean

Cond.2 mean

Mean of all samples

Cond.3 mean

Compare and calculate statistic

44

ArrayExpress

slide45

A glimpse of what’s coming…

“Differential atlas” mock-up (1)

45

ArrayExpress

slide46

A glimpse of what’s coming…

“Differential atlas” mock-up (2)

46

ArrayExpress

slide47

A glimpse of what’s coming…

“Baseline atlas”

  • Gene expression in normal tissues, not looking for differentially expressed genes based on different conditions
  • E.g. “Give me all the genes expressed in normal human kidney”
  • Can also filter genes by expression level (e.g. FPKM values)
  • Start with Illumina Body Map 2.0 RNA-seq data
  • 16 tissues: adrenal, adipose, brain, breast, colon, heart, kidney, liver, lung, lymph, ovary, prostate, skeletal muscle, testes, thyroid, and white blood cells
  • We are working on something similar for mouse

47

ArrayExpress

slide48

A glimpse of what’s coming…

“Baseline atlas” mock-up display

48

ArrayExpress

data submission to ae
Data submission to AE

49

ArrayExpress

data submission to ae www ebi ac uk microarray submissions html
Data submission to AEwww.ebi.ac.uk/microarray/submissions.html

50

ArrayExpress

submission of hts data
Submission of HTS data
  • ArrayExpress acts as a “broker” for submitter.
    • Meta-data and processed data: ArrayExpress
    • Raw sequence reads* (e.g. fastq, bam): ENA

*See http://www.ebi.ac.uk/ena/about/sra_data_formatfor accepted read file format

51

ArrayExpress

find out more
Find out more
  • Visit our eLearning portal, Train online, at http://www.ebi.ac.uk/training/online/ for courses on ArrayExpress and Atlas
  • Watch this short YouTube video on how to navigate the MAGE-TAB submission tool: http://youtu.be/KVpCVGpjw2Y
  • Email us at: miamexpress@ebi.ac.uk
  • Atlas mailing list: arrayexpress-atlas@ebi.ac.uk

52

ArrayExpress