dbgg database for genetical genomics update
Download
Skip this Video
Download Presentation
dbgg – database for genetical genomics update

Loading in 2 Seconds...

play fullscreen
1 / 55

dbgg – database for genetical genomics update - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

dbgg – database for genetical genomics update. Morris Swertz ( [email protected] ) Braunschweig CASIMIR meeeting July 2, 2008. Objective. Share genotype/phenotype data and tools:. 10. 10.000. Main work flow Data dependency Biomaterial/result Lab/analysis process Scale of information

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' dbgg – database for genetical genomics update' - arva


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
dbgg database for genetical genomics update
dbgg – database for genetical genomics

update

Morris Swertz ([email protected])

Braunschweig CASIMIR meeeting

July 2, 2008

objective
Objective
  • Share genotype/phenotype data and tools:
slide3

10

10.000

Main work flow

Data dependency

Biomaterial/result

Lab/analysis process

Scale of information

Associated data files

material

10.000

process

strains

genome

10,000

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

Complicated experiments

microarrays

probes

slide4

10

10.000

Collaborator 1

10.000

strains

genome

Incompatible data!

markers

inbreed

100

1,000,000

10,000

Collaborator 3

Incomplete data!

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

Collaborator 2

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

Barriers to sharing data

microarrays

probes

slide5

10

10.000

Investigation 1

10.000

Incomplete and/or incompatible data!

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

Investigation 3

10

10.000

hybridize

expressions

preprocess

norm exprs.

network

10.000

strains

genome

100

100,000

markers

microarrays

probes

inbreed

100

1,000,000

10,000

Investigation 2

10

10.000

individuals

genotype

genotypes

map

QTL profiles

correlate

10.000

strains

genome

100,000

10,000,00

markers

hybridize

expressions

preprocess

norm exprs.

network

inbreed

100

100,000

100

1,000,000

10,000

microarrays

probes

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

Barriers to sharing data

slide6

10

10.000

10.000

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

10

10.000

hybridize

expressions

preprocess

norm exprs.

network

10.000

strains

genome

100

100,000

markers

microarrays

probes

inbreed

100

1,000,000

10,000

10

10.000

individuals

genotype

genotypes

map

QTL profiles

correlate

10.000

strains

genome

100,000

10,000,00

markers

hybridize

expressions

preprocess

norm exprs.

network

inbreed

100

100,000

100

1,000,000

10,000

microarrays

probes

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

Barriers to sharing software tools

slide7

10

10.000

10.000

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

10

10.000

hybridize

expressions

preprocess

norm exprs.

network

10.000

strains

genome

100

100,000

markers

microarrays

probes

inbreed

100

1,000,000

10,000

10

10.000

individuals

genotype

genotypes

map

QTL profiles

correlate

10.000

strains

genome

100,000

10,000,00

markers

hybridize

expressions

preprocess

norm exprs.

network

inbreed

100

100,000

100

1,000,000

10,000

microarrays

probes

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

Barriers to sharing software tools

slide8

Hard to find and reuse tools

10,000

QTL profiles

10,000

QTL profiles

10,000

QTL profiles

Barriers to sharing software tools

slide9

10

Use a standard tool?

10.000

10.000

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

slide10

10

Main work flow

Data dependency

Biomaterial/result

Lab/analysis process

Scale of information

Associated data files

material

100.000

process

strains

genome

10,000

Yes, if it could be easily adapted!

(and they can’t)

SNP arrays

inbreed

100

10,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

1000

1000

LC/MS

mass peaks

preprocess

aligned peaks

network

More biotechnologies, more protocols

objectives
Objectives
  • Share genotype/phenotype data and tools:
    • Interoperable software
        • Simple flat file exchange format
        • Database server
        • R/web-service interfaces
        • A procedure to extend the software
    • Build on extensible data model
        • Data
        • Annotations
        • Investigations
        • Integration references
  • Next steps
the software
The software
  • Share genotype/phenotype data and tools:
    • Interoperable software
        • Simple flat file exchange format
        • Database server
        • R/web-service interfaces
        • A procedure to extend the software
    • Build on extensible data model
        • Data
        • Annotations
        • Investigations
        • Integration references
  • Next steps
software flat file exchange format
Software: flat file exchange format
  • Raw and processed data in matrix form

E.g. microarray data.

Rows = individuals,

cols = affy probes.

software flat file exchange format1
Software: flat file exchange format
  • Annotation info in tabular form

E.g. probe annotation data.

Rows = probes

cols = attributes of each probe.

software exchange an experiment
Software: exchange an experiment

Described on

http://gbic.biol.rug.nl/dbgg

annotations

Raw and processed data

dbGG Import

tool

dbGG Export

tool

dbGG database

software
SoftwareSoftware: web user interface

http://gbicserver1.biol.rug.nl:8080/dbgg/molgenis.do

software interface to r
Software: interface to R

source(“http://localhost:8080/molgenis4gg/R”)

#download data

use.experiment(name=“metanetwork”) #set default

traits <- get.metabolitedata(name=“mytraits”)

genotypes <- get.markerdata(name=“mygenotypes")

#calculate mQTLs

library(“MetaNetwork”)

qtls <- qtlMapTwoPart(genotypes=genotypes, traits=traits, spike=4)

#upload results for others to use

add.mqtldata(qtls, name=“myqtls”)

inspect

MetaNetwork protocol:

Fu, Swertz, Keurentjes, Jansen, Nature Protocols, 2007.

this enables automatic processing see also casimir use case 1
This enables automatic processing(see also CASIMIR use ‘case 1’)

dbGG

Smedley, Swertz, Wolstencroft et al, Submitted.

use biomart and molgenis to access data and taverna to automate the workflows
Use BioMART and MOLGENIS to access data and Taverna to automate the workflows

Gene

symbols

ws

ws

ws

SNPs

Strain SNP Alleles

Pathways

ws

Your

dbGG

Smedley, Swertz, Wolstencroft et al, Submitted.

software extension procedure using molgenis
Software: extension procedure(using MOLGENIS)

Little language

<!-- entity organization -->

<entityname="Experiment"label="Experiment">

<fieldname="ExperimentID"key="1“

readonly="true"

label="ExperimentID(autonum)"/>

<fieldname="Medium" type="xref"

xref_field="Medium.name"/>/>

<fieldname="Protocol"

label="Experiment Protocol"/>

<fieldname="Temperature"type="int"

Domain specific language

<!-- entity organization -->

<entityname="Experiment"label="Experiment">

<fieldname="ExperimentID"key="1“

readonly="true"

label="ExperimentID(autonum)"/>

<fieldname="Medium" type="xref"

xref_field="Medium.name"/>/>

<fieldname="Protocol"

label="Experiment Protocol"/>

<fieldname="Temperature"type="int"

Reusable assets and generator/interpreter

+

dbGG v1: for microarrays

dbGG v2: for mass spectrometry

website demos and downloads
Website: demos and downloads

http://gbic.biol.rug.nl/dbgg

outline
Outline
  • To share genotype/phenotype data and tools:

1. Interoperable software

        • Flat file exchange format
        • Database server
        • R/web-service interfaces
        • A procedure to extend the software

2. Build on extensible data model

        • Data
        • Annotations
        • Investigations
        • Integration references
  • Next steps
slide26
 Data
  • Simple and close to current practice:

Genotype data

Subjects: STRAINS

M

A

R

K

E

R

S

DATA ELEMENTS

T

r

a

i

t

s:

TRAIT  SUBJECT

slide27
 Data
  • Simple and close to current practice:

Genotype data

Expression data

Subjects: INDIVIDUALS

P

R

O

B

E

S

DATA ELEMENTS

T

r

a

i

t

s:

TRAIT  SUBJECT

slide28
 Data
  • Simple and close to current practice:

Genotype data

Expression data

Classic phenotype data

Metabolite abundance data

Protein abundance data

And so on…

TRAIT  SUBJECT

data with any dimension type
 Data with any Dimension Type
  • Individual,
  • Strain,
  • Sample,

SUBJECT

TRAIT

DATA

ELEMENT

  • Probe
  • Marker
  • Mass Peak

TRAIT  SUBJECT

slide30
 Data
  • Simple and close to current practice:

What about QTL data?

Traits: MARKERS

P

R

O

B

E

S

DATA

T

r

a

i

t

s:

slide31
 Data
  • Simple and close to current practice:

What about QTL data?

Probe association data?

Interaction network data?

Traits: MARKERS

P

R

O

B

E

S

DATA

T

r

a

i

t

s:

TRAIT  TRAIT

SUBJECT  SUBJECT

data with any dimension type1

dimension

ELEMENT

columns

rows

 Data with any Dimension Type
  • Minimal data model

SUBJECT

TRAIT

DATA

ELEMENT

DATA

ELEMENT

the data model
The data model
  • To share genotype/phenotype data and tools:
    • Extensible data model
        • Data
        • Annotations
        • Investigations
        • Integration references
annotations
 Annotations
  • Simple and close to current practice

Probe annotations

  • PROBE IS A VARIANT OF TRAIT
  • HAVING:
  • Name
  • Gene
  • Chromosme
  • Locus
annotation extends trait or subject
 Annotation extends Trait or Subject

SUBJECT

  • STRAIN
  • Name
  • Type: CSS, RIL..
  • Parent Strains
  • INDIVIDUAL
  • Name
  • Strain
  • Mother
  • Father
  • Sex
  • SAMPLE
  • Name
  • Individual
  • Tissue

And so on

TRAIT

dimension

ELEMENT

  • PROBE
  • Name
  • Gene
  • Chromosme
  • Locus

column

  • MARKER
  • Name
  • Allele
  • Chromosme
  • Locus
  • MASSPEAK
  • Name
  • MZ
  • RetentionTime

And so on

DATA

ELEMENT

row

annotation simple in practice
 Annotation simple in practice

QTL data

Genotype data

STRAIN

MARKER

MARKER

DATA

ELEMENT

PROBE

DATA

ELEMENT

Extensions are automatic “under the hood”

PROBE

isa

TRAIT

isa

DIMENSION ELEMENT

dimension

ELEMENT

Expression data

INDIVIDL

TRAIT

MARKER

DATA

ELEMENT

PROBE

data and annotations
 Data and  annotations

DATA ELEMENTS

PROBES

the data model1
The data model
  • To share genotype/phenotype data and tools:
    • Extensible data model
        • Data
        • Annotations
        • Investigations
        • Integration references
investigation workflow in the lab
 Investigation workflow in the lab

QTL data

Genotype data

DATA

STRAIN

DATA

MARKER

?

?

MARKER

DATA

ELEMENT

PROBE

DATA

ELEMENT

Expression data

DATA

INDIVIDL

?

MARKER

DATA

ELEMENT

investigation building on fuge
 Investigation building on FuGE

QTL data

Genotype data

DATA

Affy

Array

DATA

QTL

Mapping

DATA

DATA

Affy M430

Protocol

Affy M430

platform

Bioconductor

Norm.

Mapping

Protocol

R

Software

FuGE:

Expression data

DATA

DATA

SNP

Array

DATA

application

Protocol

Illumina

Protocol

Illumina

Bead

Studio

Equipment

Software

FuGE: Jones et al Nature Biotech 25, 1127-1133

summary of data model

column

row

Summary of data model

PROBE

MARKER

STRAIN

INDIVIDL

SUBJECT

DATA

PROTOCOL

APPLICTION

INVESTI

GATION

Software

TRAIT

dimension

ELEMENT

Equipment

PROTOCOL

DATA

ELEMENT

the data model2
The data model
  • To share genotype/phenotype data and tools:
    • Extensible data model
        • Data
        • Annotations
        • Investigations
        • Integration references
references for integration
 References for integration
  • Ontology references and database references

INVESTI

GATION 2

INVESTI

GATION 1

Hyperlink

Incompatible naming 

Map mouse on human ontologies

GENE

Name = Mip1alpha

GENE

Name = Mip1a

ONTOLOGY

ENTRY

Id = 0005615

Term = ABC

Ontology=GO

ONTOLOGY

ENTRY

Id = MP:0005385

Term = cardiovascular

Ontology=MP

Compatible

Identifiers 

DATABASE

REFERENCE

Id = ENSMUS098

Db=ENSEMBL

DATABASE

REFERENCE

Id = ENSMU0S98

Db=ENSEMBL

DATABASE

REFERENCE

Id = ENSMUS98

Db=ENSEMBL

DATABASE

REFERENCE

Id = 1419561_AT

Db=AFFY 430

FuGE: Jones et al Nature Biotech 25, 1127-1133

summary of data model1

column

row

Summary of data model

PROBE

MARKER

STRAIN

INDIVIDL

extensible to more experiments…

SUBJECT

DATA

PROTOCOL

APPLICTION

INVESTI

GATION

Software

TRAIT

dimension

ELEMENT

Equipment

PROTOCOL

DATA

ELEMENT

ONTOLOGY

ENTRY

Hyperlink

DATABASE

REFERENCE

slide46
Todo
  • Publication: submitted 
  • Building a catalog of tools on top of dbGG
      • Experiments: in Braunschweig and Groningen
        • Illumina, Affy, Metabolites
      • Tool ‘plug-ins’
        • QTL graphs, import of annotations etc.
  • Exploit interoperability
      • E.g. integrate mouse & human with ontologies
      • Load annotations from other dbGG/BioMARTs
      • Build on and extend R/Taverna interaction
summary and questions
Summary and questions
  • Share genotype/phenotype data and tools:
    • Interoperable software
        • Simple flat file exchange format
        • Database server
        • R/web-service interfaces
        • A procedure to extend the software
    • Build on extensible data model
        • Data
        • Annotations
        • Investigations
        • Integration references
  • Next steps
thank you
[email protected]

Morris A. Swertz

Bruno M. Tesson

Richard A. Scheltema

Gonzalo Vera

Rudi Alberts

Damian Smedley

Katy Wolstencroft

Andrew R. Jones

Klaus Schughart

John M. Hancock

Helen E. Parkinson

Engbert O. de Brock

Carole Goble

Paul Schofield

Ritsert C. Jansen

the GEN2PHEN consortium

the CASIMIR consortium

Thank you
describe in little language
Describe in little language

probes

individuals

expressions

case gg generate and evaluate
Case GG: Generate and evaluate

http://gbic.biol.rug.nl/supplementary/2007/molgenis_showcase

describe in little language3
Describe in little language

http://gbic.biol.rug.nl/supplementary/2007/molgenis_showcase

ad