Dbgg database for genetical genomics update
Sponsored Links
This presentation is the property of its rightful owner.
1 / 55

dbgg – database for genetical genomics update PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on
  • Presentation posted in: General

dbgg – database for genetical genomics update. Morris Swertz ( [email protected] ) Braunschweig CASIMIR meeeting July 2, 2008. Objective. Share genotype/phenotype data and tools:. 10. 10.000. Main work flow Data dependency Biomaterial/result Lab/analysis process Scale of information

Download Presentation

dbgg – database for genetical genomics update

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


dbgg – database for genetical genomics

update

Morris Swertz ([email protected])

Braunschweig CASIMIR meeeting

July 2, 2008


Objective

  • Share genotype/phenotype data and tools:


10

10.000

Main work flow

Data dependency

Biomaterial/result

Lab/analysis process

Scale of information

Associated data files

material

10.000

process

strains

genome

10,000

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

Complicated experiments

microarrays

probes


10

10.000

Collaborator 1

10.000

strains

genome

Incompatible data!

markers

inbreed

100

1,000,000

10,000

Collaborator 3

Incomplete data!

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

Collaborator 2

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

Barriers to sharing data

microarrays

probes


10

10.000

Investigation 1

10.000

Incomplete and/or incompatible data!

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

Investigation 3

10

10.000

hybridize

expressions

preprocess

norm exprs.

network

10.000

strains

genome

100

100,000

markers

microarrays

probes

inbreed

100

1,000,000

10,000

Investigation 2

10

10.000

individuals

genotype

genotypes

map

QTL profiles

correlate

10.000

strains

genome

100,000

10,000,00

markers

hybridize

expressions

preprocess

norm exprs.

network

inbreed

100

100,000

100

1,000,000

10,000

microarrays

probes

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

Barriers to sharing data


10

10.000

10.000

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

10

10.000

hybridize

expressions

preprocess

norm exprs.

network

10.000

strains

genome

100

100,000

markers

microarrays

probes

inbreed

100

1,000,000

10,000

10

10.000

individuals

genotype

genotypes

map

QTL profiles

correlate

10.000

strains

genome

100,000

10,000,00

markers

hybridize

expressions

preprocess

norm exprs.

network

inbreed

100

100,000

100

1,000,000

10,000

microarrays

probes

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

Barriers to sharing software tools


10

10.000

10.000

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

10

10.000

hybridize

expressions

preprocess

norm exprs.

network

10.000

strains

genome

100

100,000

markers

microarrays

probes

inbreed

100

1,000,000

10,000

10

10.000

individuals

genotype

genotypes

map

QTL profiles

correlate

10.000

strains

genome

100,000

10,000,00

markers

hybridize

expressions

preprocess

norm exprs.

network

inbreed

100

100,000

100

1,000,000

10,000

microarrays

probes

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

Barriers to sharing software tools


Hard to find and reuse tools

10,000

QTL profiles

10,000

QTL profiles

10,000

QTL profiles

Barriers to sharing software tools


10

Use a standard tool?

10.000

10.000

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes


10

Main work flow

Data dependency

Biomaterial/result

Lab/analysis process

Scale of information

Associated data files

material

100.000

process

strains

genome

10,000

Yes, if it could be easily adapted!

(and they can’t)

SNP arrays

inbreed

100

10,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

1000

1000

LC/MS

mass peaks

preprocess

aligned peaks

network

More biotechnologies, more protocols


Objectives

  • Share genotype/phenotype data and tools:

    • Interoperable software

      • Simple flat file exchange format

      • Database server

      • R/web-service interfaces

      • A procedure to extend the software

  • Build on extensible data model

    • Data

    • Annotations

    • Investigations

    • Integration references

  • Next steps


  • The software

    • Share genotype/phenotype data and tools:

      • Interoperable software

        • Simple flat file exchange format

        • Database server

        • R/web-service interfaces

        • A procedure to extend the software

    • Build on extensible data model

      • Data

      • Annotations

      • Investigations

      • Integration references

  • Next steps


  • Software: flat file exchange format

    • Raw and processed data in matrix form

    E.g. microarray data.

    Rows = individuals,

    cols = affy probes.


    Software: flat file exchange format

    • Annotation info in tabular form

    E.g. probe annotation data.

    Rows = probes

    cols = attributes of each probe.


    Software: exchange an experiment

    Described on

    http://gbic.biol.rug.nl/dbgg

    annotations

    Raw and processed data

    dbGG Import

    tool

    dbGG Export

    tool

    dbGG database


    Software

    Software: web user interface

    http://gbicserver1.biol.rug.nl:8080/dbgg/molgenis.do


    Software: interface to R

    source(“http://localhost:8080/molgenis4gg/R”)

    #download data

    use.experiment(name=“metanetwork”) #set default

    traits <- get.metabolitedata(name=“mytraits”)

    genotypes <- get.markerdata(name=“mygenotypes")

    #calculate mQTLs

    library(“MetaNetwork”)

    qtls <- qtlMapTwoPart(genotypes=genotypes, traits=traits, spike=4)

    #upload results for others to use

    add.mqtldata(qtls, name=“myqtls”)

    inspect

    MetaNetwork protocol:

    Fu, Swertz, Keurentjes, Jansen, Nature Protocols, 2007.


    Software: interface to Taverna

    add dbGG interface


    Software: interface to Taverna

    Use data in dbGG


    This enables automatic processing(see also CASIMIR use ‘case 1’)

    dbGG

    Smedley, Swertz, Wolstencroft et al, Submitted.


    Use BioMART and MOLGENIS to access data and Taverna to automate the workflows

    Gene

    symbols

    ws

    ws

    ws

    SNPs

    Strain SNP Alleles

    Pathways

    ws

    Your

    dbGG

    Smedley, Swertz, Wolstencroft et al, Submitted.


    Software: extension procedure(using MOLGENIS)

    Little language

    <!-- entity organization -->

    <entityname="Experiment"label="Experiment">

    <fieldname="ExperimentID"key="1“

    readonly="true"

    label="ExperimentID(autonum)"/>

    <fieldname="Medium" type="xref"

    xref_field="Medium.name"/>/>

    <fieldname="Protocol"

    label="Experiment Protocol"/>

    <fieldname="Temperature"type="int"

    Domain specific language

    <!-- entity organization -->

    <entityname="Experiment"label="Experiment">

    <fieldname="ExperimentID"key="1“

    readonly="true"

    label="ExperimentID(autonum)"/>

    <fieldname="Medium" type="xref"

    xref_field="Medium.name"/>/>

    <fieldname="Protocol"

    label="Experiment Protocol"/>

    <fieldname="Temperature"type="int"

    Reusable assets and generator/interpreter

    +

    dbGG v1: for microarrays

    dbGG v2: for mass spectrometry


    Software: extension procedure


    Website: demos and downloads

    http://gbic.biol.rug.nl/dbgg


    Outline

    • To share genotype/phenotype data and tools:

      1. Interoperable software

      • Flat file exchange format

      • Database server

      • R/web-service interfaces

      • A procedure to extend the software

        2. Build on extensible data model

      • Data

      • Annotations

      • Investigations

      • Integration references

  • Next steps


  •  Data

    • Simple and close to current practice:

      Genotype data

    Subjects: STRAINS

    M

    A

    R

    K

    E

    R

    S

    DATA ELEMENTS

    T

    r

    a

    i

    t

    s:

    TRAIT  SUBJECT


     Data

    • Simple and close to current practice:

      Genotype data

      Expression data

    Subjects: INDIVIDUALS

    P

    R

    O

    B

    E

    S

    DATA ELEMENTS

    T

    r

    a

    i

    t

    s:

    TRAIT  SUBJECT


     Data

    • Simple and close to current practice:

      Genotype data

      Expression data

      Classic phenotype data

      Metabolite abundance data

      Protein abundance data

      And so on…

    TRAIT  SUBJECT


     Data with any Dimension Type

    • Individual,

    • Strain,

    • Sample,

    SUBJECT

    TRAIT

    DATA

    ELEMENT

    • Probe

    • Marker

    • Mass Peak

    TRAIT  SUBJECT


     Data

    • Simple and close to current practice:

      What about QTL data?

    Traits: MARKERS

    P

    R

    O

    B

    E

    S

    DATA

    T

    r

    a

    i

    t

    s:


     Data

    • Simple and close to current practice:

      What about QTL data?

      Probe association data?

      Interaction network data?

    Traits: MARKERS

    P

    R

    O

    B

    E

    S

    DATA

    T

    r

    a

    i

    t

    s:

    TRAIT  TRAIT

    SUBJECT  SUBJECT


    dimension

    ELEMENT

    columns

    rows

     Data with any Dimension Type

    • Minimal data model

    SUBJECT

    TRAIT

    DATA

    ELEMENT

    DATA

    ELEMENT


    The data model

    • To share genotype/phenotype data and tools:

      • Extensible data model

        • Data

        • Annotations

        • Investigations

        • Integration references


     Annotations

    • Simple and close to current practice

      Probe annotations

    • PROBE IS A VARIANT OF TRAIT

    • HAVING:

    • Name

    • Gene

    • Chromosme

    • Locus


     Annotation extends Trait or Subject

    SUBJECT

    • STRAIN

    • Name

    • Type: CSS, RIL..

    • Parent Strains

    • INDIVIDUAL

    • Name

    • Strain

    • Mother

    • Father

    • Sex

    • SAMPLE

    • Name

    • Individual

    • Tissue

    And so on

    TRAIT

    dimension

    ELEMENT

    • PROBE

    • Name

    • Gene

    • Chromosme

    • Locus

    column

    • MARKER

    • Name

    • Allele

    • Chromosme

    • Locus

    • MASSPEAK

    • Name

    • MZ

    • RetentionTime

    And so on

    DATA

    ELEMENT

    row


     Annotation simple in practice

    QTL data

    Genotype data

    STRAIN

    MARKER

    MARKER

    DATA

    ELEMENT

    PROBE

    DATA

    ELEMENT

    Extensions are automatic “under the hood”

    PROBE

    isa

    TRAIT

    isa

    DIMENSION ELEMENT

    dimension

    ELEMENT

    Expression data

    INDIVIDL

    TRAIT

    MARKER

    DATA

    ELEMENT

    PROBE


     Data and  annotations

    DATA ELEMENTS

    PROBES


    The data model

    • To share genotype/phenotype data and tools:

      • Extensible data model

        • Data

        • Annotations

        • Investigations

        • Integration references


     Investigation workflow in the lab

    QTL data

    Genotype data

    DATA

    STRAIN

    DATA

    MARKER

    ?

    ?

    MARKER

    DATA

    ELEMENT

    PROBE

    DATA

    ELEMENT

    Expression data

    DATA

    INDIVIDL

    ?

    MARKER

    DATA

    ELEMENT


     Investigation building on FuGE

    QTL data

    Genotype data

    DATA

    Affy

    Array

    DATA

    QTL

    Mapping

    DATA

    DATA

    Affy M430

    Protocol

    Affy M430

    platform

    Bioconductor

    Norm.

    Mapping

    Protocol

    R

    Software

    FuGE:

    Expression data

    DATA

    DATA

    SNP

    Array

    DATA

    application

    Protocol

    Illumina

    Protocol

    Illumina

    Bead

    Studio

    Equipment

    Software

    FuGE: Jones et al Nature Biotech 25, 1127-1133


    column

    row

    Summary of data model

    PROBE

    MARKER

    STRAIN

    INDIVIDL

    SUBJECT

    DATA

    PROTOCOL

    APPLICTION

    INVESTI

    GATION

    Software

    TRAIT

    dimension

    ELEMENT

    Equipment

    PROTOCOL

    DATA

    ELEMENT


    The data model

    • To share genotype/phenotype data and tools:

      • Extensible data model

        • Data

        • Annotations

        • Investigations

        • Integration references


     References for integration

    • Ontology references and database references

    INVESTI

    GATION 2

    INVESTI

    GATION 1

    Hyperlink

    Incompatible naming 

    Map mouse on human ontologies

    GENE

    Name = Mip1alpha

    GENE

    Name = Mip1a

    ONTOLOGY

    ENTRY

    Id = 0005615

    Term = ABC

    Ontology=GO

    ONTOLOGY

    ENTRY

    Id = MP:0005385

    Term = cardiovascular

    Ontology=MP

    Compatible

    Identifiers 

    DATABASE

    REFERENCE

    Id = ENSMUS098

    Db=ENSEMBL

    DATABASE

    REFERENCE

    Id = ENSMU0S98

    Db=ENSEMBL

    DATABASE

    REFERENCE

    Id = ENSMUS98

    Db=ENSEMBL

    DATABASE

    REFERENCE

    Id = 1419561_AT

    Db=AFFY 430

    FuGE: Jones et al Nature Biotech 25, 1127-1133


    column

    row

    Summary of data model

    PROBE

    MARKER

    STRAIN

    INDIVIDL

    extensible to more experiments…

    SUBJECT

    DATA

    PROTOCOL

    APPLICTION

    INVESTI

    GATION

    Software

    TRAIT

    dimension

    ELEMENT

    Equipment

    PROTOCOL

    DATA

    ELEMENT

    ONTOLOGY

    ENTRY

    Hyperlink

    DATABASE

    REFERENCE


    What is on the todo


    Todo

    • Publication: submitted 

    • Building a catalog of tools on top of dbGG

      • Experiments: in Braunschweig and Groningen

        • Illumina, Affy, Metabolites

      • Tool ‘plug-ins’

        • QTL graphs, import of annotations etc.

  • Exploit interoperability

    • E.g. integrate mouse & human with ontologies

    • Load annotations from other dbGG/BioMARTs

    • Build on and extend R/Taverna interaction


  • Summary and questions

    • Share genotype/phenotype data and tools:

      • Interoperable software

        • Simple flat file exchange format

        • Database server

        • R/web-service interfaces

        • A procedure to extend the software

    • Build on extensible data model

      • Data

      • Annotations

      • Investigations

      • Integration references

  • Next steps


  • [email protected]

    Morris A. Swertz

    Bruno M. Tesson

    Richard A. Scheltema

    Gonzalo Vera

    Rudi Alberts

    Damian Smedley

    Katy Wolstencroft

    Andrew R. Jones

    Klaus Schughart

    John M. Hancock

    Helen E. Parkinson

    Engbert O. de Brock

    Carole Goble

    Paul Schofield

    Ritsert C. Jansen

    the GEN2PHEN consortium

    the CASIMIR consortium

    Thank you


    Appendix:Procedure to (re)generate a MOLGENIS


    MOLGENIS for data


    Describe in little language

    probes

    individuals

    expressions


    Describe in little language


    Describe in little language


    Case GG: Generate and evaluate

    http://gbic.biol.rug.nl/supplementary/2007/molgenis_showcase


    Describe in little language

    http://gbic.biol.rug.nl/supplementary/2007/molgenis_showcase


  • Login