Dbgg database for genetical genomics update
This presentation is the property of its rightful owner.
Sponsored Links
1 / 55

dbgg – database for genetical genomics update PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

dbgg – database for genetical genomics update. Morris Swertz ( [email protected] ) Braunschweig CASIMIR meeeting July 2, 2008. Objective. Share genotype/phenotype data and tools:. 10. 10.000. Main work flow Data dependency Biomaterial/result Lab/analysis process Scale of information

Download Presentation

dbgg – database for genetical genomics update

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Dbgg database for genetical genomics update

dbgg – database for genetical genomics

update

Morris Swertz ([email protected])

Braunschweig CASIMIR meeeting

July 2, 2008


Objective

Objective

  • Share genotype/phenotype data and tools:


Dbgg database for genetical genomics update

10

10.000

Main work flow

Data dependency

Biomaterial/result

Lab/analysis process

Scale of information

Associated data files

material

10.000

process

strains

genome

10,000

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

Complicated experiments

microarrays

probes


Dbgg database for genetical genomics update

10

10.000

Collaborator 1

10.000

strains

genome

Incompatible data!

markers

inbreed

100

1,000,000

10,000

Collaborator 3

Incomplete data!

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

Collaborator 2

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

Barriers to sharing data

microarrays

probes


Dbgg database for genetical genomics update

10

10.000

Investigation 1

10.000

Incomplete and/or incompatible data!

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

Investigation 3

10

10.000

hybridize

expressions

preprocess

norm exprs.

network

10.000

strains

genome

100

100,000

markers

microarrays

probes

inbreed

100

1,000,000

10,000

Investigation 2

10

10.000

individuals

genotype

genotypes

map

QTL profiles

correlate

10.000

strains

genome

100,000

10,000,00

markers

hybridize

expressions

preprocess

norm exprs.

network

inbreed

100

100,000

100

1,000,000

10,000

microarrays

probes

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

Barriers to sharing data


Dbgg database for genetical genomics update

10

10.000

10.000

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

10

10.000

hybridize

expressions

preprocess

norm exprs.

network

10.000

strains

genome

100

100,000

markers

microarrays

probes

inbreed

100

1,000,000

10,000

10

10.000

individuals

genotype

genotypes

map

QTL profiles

correlate

10.000

strains

genome

100,000

10,000,00

markers

hybridize

expressions

preprocess

norm exprs.

network

inbreed

100

100,000

100

1,000,000

10,000

microarrays

probes

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

Barriers to sharing software tools


Dbgg database for genetical genomics update

10

10.000

10.000

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

10

10.000

hybridize

expressions

preprocess

norm exprs.

network

10.000

strains

genome

100

100,000

markers

microarrays

probes

inbreed

100

1,000,000

10,000

10

10.000

individuals

genotype

genotypes

map

QTL profiles

correlate

10.000

strains

genome

100,000

10,000,00

markers

hybridize

expressions

preprocess

norm exprs.

network

inbreed

100

100,000

100

1,000,000

10,000

microarrays

probes

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

Barriers to sharing software tools


Dbgg database for genetical genomics update

Hard to find and reuse tools

10,000

QTL profiles

10,000

QTL profiles

10,000

QTL profiles

Barriers to sharing software tools


Dbgg database for genetical genomics update

10

Use a standard tool?

10.000

10.000

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes


Dbgg database for genetical genomics update

10

Main work flow

Data dependency

Biomaterial/result

Lab/analysis process

Scale of information

Associated data files

material

100.000

process

strains

genome

10,000

Yes, if it could be easily adapted!

(and they can’t)

SNP arrays

inbreed

100

10,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

1000

1000

LC/MS

mass peaks

preprocess

aligned peaks

network

More biotechnologies, more protocols


Objectives

Objectives

  • Share genotype/phenotype data and tools:

    • Interoperable software

      • Simple flat file exchange format

      • Database server

      • R/web-service interfaces

      • A procedure to extend the software

  • Build on extensible data model

    • Data

    • Annotations

    • Investigations

    • Integration references

  • Next steps


  • The software

    The software

    • Share genotype/phenotype data and tools:

      • Interoperable software

        • Simple flat file exchange format

        • Database server

        • R/web-service interfaces

        • A procedure to extend the software

    • Build on extensible data model

      • Data

      • Annotations

      • Investigations

      • Integration references

  • Next steps


  • Software flat file exchange format

    Software: flat file exchange format

    • Raw and processed data in matrix form

    E.g. microarray data.

    Rows = individuals,

    cols = affy probes.


    Software flat file exchange format1

    Software: flat file exchange format

    • Annotation info in tabular form

    E.g. probe annotation data.

    Rows = probes

    cols = attributes of each probe.


    Software exchange an experiment

    Software: exchange an experiment

    Described on

    http://gbic.biol.rug.nl/dbgg

    annotations

    Raw and processed data

    dbGG Import

    tool

    dbGG Export

    tool

    dbGG database


    Software

    Software

    Software: web user interface

    http://gbicserver1.biol.rug.nl:8080/dbgg/molgenis.do


    Software interface to r

    Software: interface to R

    source(“http://localhost:8080/molgenis4gg/R”)

    #download data

    use.experiment(name=“metanetwork”) #set default

    traits <- get.metabolitedata(name=“mytraits”)

    genotypes <- get.markerdata(name=“mygenotypes")

    #calculate mQTLs

    library(“MetaNetwork”)

    qtls <- qtlMapTwoPart(genotypes=genotypes, traits=traits, spike=4)

    #upload results for others to use

    add.mqtldata(qtls, name=“myqtls”)

    inspect

    MetaNetwork protocol:

    Fu, Swertz, Keurentjes, Jansen, Nature Protocols, 2007.


    Software interface to taverna

    Software: interface to Taverna

    add dbGG interface


    Software interface to taverna1

    Software: interface to Taverna

    Use data in dbGG


    This enables automatic processing see also casimir use case 1

    This enables automatic processing(see also CASIMIR use ‘case 1’)

    dbGG

    Smedley, Swertz, Wolstencroft et al, Submitted.


    Use biomart and molgenis to access data and taverna to automate the workflows

    Use BioMART and MOLGENIS to access data and Taverna to automate the workflows

    Gene

    symbols

    ws

    ws

    ws

    SNPs

    Strain SNP Alleles

    Pathways

    ws

    Your

    dbGG

    Smedley, Swertz, Wolstencroft et al, Submitted.


    Software extension procedure using molgenis

    Software: extension procedure(using MOLGENIS)

    Little language

    <!-- entity organization -->

    <entityname="Experiment"label="Experiment">

    <fieldname="ExperimentID"key="1“

    readonly="true"

    label="ExperimentID(autonum)"/>

    <fieldname="Medium" type="xref"

    xref_field="Medium.name"/>/>

    <fieldname="Protocol"

    label="Experiment Protocol"/>

    <fieldname="Temperature"type="int"

    Domain specific language

    <!-- entity organization -->

    <entityname="Experiment"label="Experiment">

    <fieldname="ExperimentID"key="1“

    readonly="true"

    label="ExperimentID(autonum)"/>

    <fieldname="Medium" type="xref"

    xref_field="Medium.name"/>/>

    <fieldname="Protocol"

    label="Experiment Protocol"/>

    <fieldname="Temperature"type="int"

    Reusable assets and generator/interpreter

    +

    dbGG v1: for microarrays

    dbGG v2: for mass spectrometry


    Software extension procedure

    Software: extension procedure


    Website demos and downloads

    Website: demos and downloads

    http://gbic.biol.rug.nl/dbgg


    Outline

    Outline

    • To share genotype/phenotype data and tools:

      1. Interoperable software

      • Flat file exchange format

      • Database server

      • R/web-service interfaces

      • A procedure to extend the software

        2. Build on extensible data model

      • Data

      • Annotations

      • Investigations

      • Integration references

  • Next steps


  • Dbgg database for genetical genomics update

     Data

    • Simple and close to current practice:

      Genotype data

    Subjects: STRAINS

    M

    A

    R

    K

    E

    R

    S

    DATA ELEMENTS

    T

    r

    a

    i

    t

    s:

    TRAIT  SUBJECT


    Dbgg database for genetical genomics update

     Data

    • Simple and close to current practice:

      Genotype data

      Expression data

    Subjects: INDIVIDUALS

    P

    R

    O

    B

    E

    S

    DATA ELEMENTS

    T

    r

    a

    i

    t

    s:

    TRAIT  SUBJECT


    Dbgg database for genetical genomics update

     Data

    • Simple and close to current practice:

      Genotype data

      Expression data

      Classic phenotype data

      Metabolite abundance data

      Protein abundance data

      And so on…

    TRAIT  SUBJECT


    Data with any dimension type

     Data with any Dimension Type

    • Individual,

    • Strain,

    • Sample,

    SUBJECT

    TRAIT

    DATA

    ELEMENT

    • Probe

    • Marker

    • Mass Peak

    TRAIT  SUBJECT


    Dbgg database for genetical genomics update

     Data

    • Simple and close to current practice:

      What about QTL data?

    Traits: MARKERS

    P

    R

    O

    B

    E

    S

    DATA

    T

    r

    a

    i

    t

    s:


    Dbgg database for genetical genomics update

     Data

    • Simple and close to current practice:

      What about QTL data?

      Probe association data?

      Interaction network data?

    Traits: MARKERS

    P

    R

    O

    B

    E

    S

    DATA

    T

    r

    a

    i

    t

    s:

    TRAIT  TRAIT

    SUBJECT  SUBJECT


    Data with any dimension type1

    dimension

    ELEMENT

    columns

    rows

     Data with any Dimension Type

    • Minimal data model

    SUBJECT

    TRAIT

    DATA

    ELEMENT

    DATA

    ELEMENT


    The data model

    The data model

    • To share genotype/phenotype data and tools:

      • Extensible data model

        • Data

        • Annotations

        • Investigations

        • Integration references


    Annotations

     Annotations

    • Simple and close to current practice

      Probe annotations

    • PROBE IS A VARIANT OF TRAIT

    • HAVING:

    • Name

    • Gene

    • Chromosme

    • Locus


    Annotation extends trait or subject

     Annotation extends Trait or Subject

    SUBJECT

    • STRAIN

    • Name

    • Type: CSS, RIL..

    • Parent Strains

    • INDIVIDUAL

    • Name

    • Strain

    • Mother

    • Father

    • Sex

    • SAMPLE

    • Name

    • Individual

    • Tissue

    And so on

    TRAIT

    dimension

    ELEMENT

    • PROBE

    • Name

    • Gene

    • Chromosme

    • Locus

    column

    • MARKER

    • Name

    • Allele

    • Chromosme

    • Locus

    • MASSPEAK

    • Name

    • MZ

    • RetentionTime

    And so on

    DATA

    ELEMENT

    row


    Annotation simple in practice

     Annotation simple in practice

    QTL data

    Genotype data

    STRAIN

    MARKER

    MARKER

    DATA

    ELEMENT

    PROBE

    DATA

    ELEMENT

    Extensions are automatic “under the hood”

    PROBE

    isa

    TRAIT

    isa

    DIMENSION ELEMENT

    dimension

    ELEMENT

    Expression data

    INDIVIDL

    TRAIT

    MARKER

    DATA

    ELEMENT

    PROBE


    Data and annotations

     Data and  annotations

    DATA ELEMENTS

    PROBES


    The data model1

    The data model

    • To share genotype/phenotype data and tools:

      • Extensible data model

        • Data

        • Annotations

        • Investigations

        • Integration references


    Investigation workflow in the lab

     Investigation workflow in the lab

    QTL data

    Genotype data

    DATA

    STRAIN

    DATA

    MARKER

    ?

    ?

    MARKER

    DATA

    ELEMENT

    PROBE

    DATA

    ELEMENT

    Expression data

    DATA

    INDIVIDL

    ?

    MARKER

    DATA

    ELEMENT


    Investigation building on fuge

     Investigation building on FuGE

    QTL data

    Genotype data

    DATA

    Affy

    Array

    DATA

    QTL

    Mapping

    DATA

    DATA

    Affy M430

    Protocol

    Affy M430

    platform

    Bioconductor

    Norm.

    Mapping

    Protocol

    R

    Software

    FuGE:

    Expression data

    DATA

    DATA

    SNP

    Array

    DATA

    application

    Protocol

    Illumina

    Protocol

    Illumina

    Bead

    Studio

    Equipment

    Software

    FuGE: Jones et al Nature Biotech 25, 1127-1133


    Summary of data model

    column

    row

    Summary of data model

    PROBE

    MARKER

    STRAIN

    INDIVIDL

    SUBJECT

    DATA

    PROTOCOL

    APPLICTION

    INVESTI

    GATION

    Software

    TRAIT

    dimension

    ELEMENT

    Equipment

    PROTOCOL

    DATA

    ELEMENT


    The data model2

    The data model

    • To share genotype/phenotype data and tools:

      • Extensible data model

        • Data

        • Annotations

        • Investigations

        • Integration references


    References for integration

     References for integration

    • Ontology references and database references

    INVESTI

    GATION 2

    INVESTI

    GATION 1

    Hyperlink

    Incompatible naming 

    Map mouse on human ontologies

    GENE

    Name = Mip1alpha

    GENE

    Name = Mip1a

    ONTOLOGY

    ENTRY

    Id = 0005615

    Term = ABC

    Ontology=GO

    ONTOLOGY

    ENTRY

    Id = MP:0005385

    Term = cardiovascular

    Ontology=MP

    Compatible

    Identifiers 

    DATABASE

    REFERENCE

    Id = ENSMUS098

    Db=ENSEMBL

    DATABASE

    REFERENCE

    Id = ENSMU0S98

    Db=ENSEMBL

    DATABASE

    REFERENCE

    Id = ENSMUS98

    Db=ENSEMBL

    DATABASE

    REFERENCE

    Id = 1419561_AT

    Db=AFFY 430

    FuGE: Jones et al Nature Biotech 25, 1127-1133


    Summary of data model1

    column

    row

    Summary of data model

    PROBE

    MARKER

    STRAIN

    INDIVIDL

    extensible to more experiments…

    SUBJECT

    DATA

    PROTOCOL

    APPLICTION

    INVESTI

    GATION

    Software

    TRAIT

    dimension

    ELEMENT

    Equipment

    PROTOCOL

    DATA

    ELEMENT

    ONTOLOGY

    ENTRY

    Hyperlink

    DATABASE

    REFERENCE


    What is on the todo

    What is on the todo


    Dbgg database for genetical genomics update

    Todo

    • Publication: submitted 

    • Building a catalog of tools on top of dbGG

      • Experiments: in Braunschweig and Groningen

        • Illumina, Affy, Metabolites

      • Tool ‘plug-ins’

        • QTL graphs, import of annotations etc.

  • Exploit interoperability

    • E.g. integrate mouse & human with ontologies

    • Load annotations from other dbGG/BioMARTs

    • Build on and extend R/Taverna interaction


  • Summary and questions

    Summary and questions

    • Share genotype/phenotype data and tools:

      • Interoperable software

        • Simple flat file exchange format

        • Database server

        • R/web-service interfaces

        • A procedure to extend the software

    • Build on extensible data model

      • Data

      • Annotations

      • Investigations

      • Integration references

  • Next steps


  • Thank you

    [email protected]

    Morris A. Swertz

    Bruno M. Tesson

    Richard A. Scheltema

    Gonzalo Vera

    Rudi Alberts

    Damian Smedley

    Katy Wolstencroft

    Andrew R. Jones

    Klaus Schughart

    John M. Hancock

    Helen E. Parkinson

    Engbert O. de Brock

    Carole Goble

    Paul Schofield

    Ritsert C. Jansen

    the GEN2PHEN consortium

    the CASIMIR consortium

    Thank you


    Appendix procedure to re generate a molgenis

    Appendix:Procedure to (re)generate a MOLGENIS


    Molgenis for data

    MOLGENIS for data


    Describe in little language

    Describe in little language

    probes

    individuals

    expressions


    Describe in little language1

    Describe in little language


    Describe in little language2

    Describe in little language


    Case gg generate and evaluate

    Case GG: Generate and evaluate

    http://gbic.biol.rug.nl/supplementary/2007/molgenis_showcase


    Describe in little language3

    Describe in little language

    http://gbic.biol.rug.nl/supplementary/2007/molgenis_showcase


  • Login