Microrray data standardisation
Download
1 / 32

Microrray Data Standardisation - PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on

Microrray Data Standardisation. Microarray Gene Expression Database group -- MGED December, 2000. Public data repositories for microarray data.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Microrray Data Standardisation' - lemuel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Microrray data standardisation

Microrray Data Standardisation

Microarray Gene Expression Database group -- MGED

December, 2000


Public data repositories for microarray data
Public data repositories for microarray data

There is a growing consensus in the life science community for a need for public repositories of gene expression data analogous to DDBJ/EMBL/GenBank for sequences


Some of the reasons
Some of the reasons:

  • Gradually building up gene expression profiles for various organisms, tissues, cell types, developmental stages, various states, under influence of various compounds

  • Through links to other genomics databases builds up systematic knowledge about gene functions and networks

  • Comparison of profiles, access and analysis of data by third parties

  • Cross validation of results and platforms - quality control


Systematic gene expression profiling initiatives in public domain
Systematic gene expression profiling initiatives in public domain

The International Life Science Institute (ILSI) is coordinating a program undertaken by ~25 pharmaceutical and food companies to generate toxicity related gene expression data under defined experimental conditions

  • evaluate gene expression profiles in standardised test systems following exposure to toxicants

  • relate changes in gene expression to other measures of toxicity


Microarray data handling and analysis a major bottleneck calculations by jerry lanfear
Microarray data handling and analysis - a major bottleneck (Calculations by Jerry Lanfear)

  • Experiments:

    • 100 000 genes in human

    • 320 cell types

    • 2000 compounds

    • 3 time points

    • 2 concentrations

    • 2 replicates

  • Data

    • 8 x 1011 data-points

    • 1 x 1015 = 1 petaB of data


Expression data repository projects
Expression data repository projects (Calculations by Jerry Lanfear)

  • Public repositories in making:

    • GEO - NCBI

    • GeneX - NCGR

    • ArrayExpress - EBI

  • In-house databases - Stanford, MIT, University of Pennsylvania,

  • Organism specific databases: Mouse in Jackson

  • Proprietary databases - Gene Logic, NCI


Difficulties
Difficulties (Calculations by Jerry Lanfear)

  • Raw data are images

  • What is needed for higher level analysis and mining is gene expression matrix (genes/samples/gene expression levels)

    • lack of standard measurement units for gene expression

    • lack of standards for sample annoation


Raw data images
Raw data - images (Calculations by Jerry Lanfear)

Treated sample labeled red (Cy5)

Control data labeled green (Cy3)

Competitive hybridization onto chip

Red dot - gene overexpressed in treated sample

Green dot - gene underexpressed in treated sample

Yellow - equally expressed

Intensity - “absolute” level

red/green - ratio of expression

2 - 2x overexpressed

0.5 - 2x underexpressed

log2( red/green ) - “log ratio”

1 2x overexpressed

-1 2x underexpressed

cDNA plotted microarray

Stanford university (Yeast,1997)


Gene expression matrix
Gene expression matrix (Calculations by Jerry Lanfear)

Samples

Genes

Gene expression levels


Gene expression levels
Gene expression levels (Calculations by Jerry Lanfear)

  • What we would like to have

    • gene expression levels expressed in some standard units (e.g. molecules per cell)

    • reliability measure associated with each value (e.g. standard deviation)

  • What we do have

    • each experiment using different units

    • no reliability information


Comparing expression data

cm (Calculations by Jerry Lanfear)

inc

Comparing expression data


Comparing expression data1

? (Calculations by Jerry Lanfear)

?

Comparing expression data


Comparing expression data2
Comparing expression data (Calculations by Jerry Lanfear)


Measurement units
Measurement units (Calculations by Jerry Lanfear)

  • In perspective:

    • standard controls for experiments (on chips and in the samples)

    • replicate measurements

  • Temporary solution:

    • storing intermediate analysis results (including the images) and annotations of how they were obtained - i.e., the evidence


Comparing expression data problem 2
Comparing expression data - problem 2 (Calculations by Jerry Lanfear)

  • How gene names relate in different data matrices?

  • How samples relate in different data matrices?


Sample annotation
Sample annotation (Calculations by Jerry Lanfear)

  • Gene expression data have any meaning only in the context of what are the experimental conditions of the target system

  • Controlled vocabularies and ontologies (species, cell types, compound nomenclature, treatments, etc) are needed for unambiguous sample annotation

  • Sample annotations in current public databases are typically useless


In perspective
In perspective (Calculations by Jerry Lanfear)

  • Standard units for gene expression measurements

  • Standards for sample annotation.


More immediate actions
More immediate actions (Calculations by Jerry Lanfear)

  • To understand what information about microarray experiments should be captured to make the descriptions reasonably self-contained

  • Develop data exchange format able to capture this minimum information

  • Develop recommendations how data should be normalised and what controls should be used


Mged group
MGED group (Calculations by Jerry Lanfear)

The MGED group is an open discussion group initially established at the Microarray Gene Expression Database meeting MGED 1 (14-15 November, 1999, Cambridge, UK). The goal of the group is to facilitate the adoption of standards for DNA-array experiment annotation and data representation, as well as the introduction of standard experimental controls and data normalisation methods. The underlying goal is to facilitate the establishing of gene expression data repositories, comparability of gene expression data from different sources and interoperability of different gene expression databases and data analysis software. Since 1999 the group has had two general meetings and the third one is planned for 2001

For more see www.mged.org


Mged participants including

Affymetrix (Calculations by Jerry Lanfear)

Berkeley

DDBJ

DKFZ

EMBL

Gene Logic

Incyte

Max Plank Institute

NCBI

NCGR

NHGRI

Sanger Centre

Stanford

Uni Pennsylvania

Uni Washington

Whitehead Institute

MGED participants including


Working groups
Working groups (Calculations by Jerry Lanfear)

  • Microarray experiment annotations and minimum information standards (A. Brazma)

  • XML-data communication standards and interfaces (P. Spellman)

  • Ontology for sample description (M. Bittner)

  • Cross platform comparison and normalisation (F.Holstege, R.Bumgarner)

  • Future user group - queries, query languages and data mining (M. Vingron)


Mged state of art
MGED state of art (Calculations by Jerry Lanfear)

  • Formulation of the “minimum information about a microarray experiment” (MIAME) to ensure its interpretability and reproducibility

  • Data exchange format based on XML - microarray markup language (MAML) submitted to OMG in November


Miame six parts
MIAME six parts: (Calculations by Jerry Lanfear)

1. Experimental design: the set of the hybridisation experiments as a whole

2. Array design: each array used and each element (spot) on the array

3. Samples: samples used, the extract preparation and labeling

4. Hybridizations: procedures and parameters

5. Measurements: images, quantitation, specifications

6. Controls: types, values, specifications

see www.mged.org for details


Miame concepts
MIAME concepts (Calculations by Jerry Lanfear)

  • MIAME is aimed at co-operative data submitter

  • Concept of “qualifier, value, source” lists, where source is either user defined or an external reference

  • Reusable information can be referenced, but should be provided at least once (array descriptions, standard protocols)

  • Raw data should be reported, together with the authors interpretations


MAML (Calculations by Jerry Lanfear)

  • MAML is an XML based data exchange format able to capture MIAME compliant information

  • The work is still in progress, the first draft has been submitted to OMG as a data exchange standard for microarray data


Maml concepts
MAML concepts (Calculations by Jerry Lanfear)

  • Annotations + data; data can be given as a set of external 2D matrices

  • Data format independent on particular scanner or image analysis sofwater

  • Sample and treatment can be represented as a DAG

  • Concept of composite images and composite spots


Sample and treatment representation
Sample and treatment representation (Calculations by Jerry Lanfear)

Sample 1

Sample 2

Sample 3

Treatments

Array 2

Array 1


Expression matrix raw and processed

Images (Calculations by Jerry Lanfear)

Samples

Genes

Spots

Gene expression levels

Spot/Image quantiations

Expression matrix - raw and processed


Microarray image analysis data representation
Microarray image analysis data representation (Calculations by Jerry Lanfear)

Spots

Quantitations

composite

spots

primary

spots

Images

composite images

e.g., green/red ratios

primary images


Maml future
MAML future (Calculations by Jerry Lanfear)

  • The NOMAD microarray LIMS system will export data in MAML format

  • ArrayExpress and GEO will import data in MAML format

  • We hope that OMG will accept MAML as the industry standard

  • We hope that MAML will become a defacto standard


Mged steering committee
MGED steering committee (Calculations by Jerry Lanfear)

  • Meeting in Bethesda on 17 Nov 2000

  • MIAME accepted and a publication urging the journals and funding agencies to adopt it will be prepared

  • MGED will become ISCB Special Interest Group

  • Next general MGED meeting in Stanford, March 29-31



ad