Dbgg database for genetical genomics update
Download
1 / 55

dbgg – database for genetical genomics update - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

dbgg – database for genetical genomics update. Morris Swertz ( [email protected] ) Braunschweig CASIMIR meeeting July 2, 2008. Objective. Share genotype/phenotype data and tools:. 10. 10.000. Main work flow Data dependency Biomaterial/result Lab/analysis process Scale of information

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' dbgg – database for genetical genomics update' - arva


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Dbgg database for genetical genomics update

dbgg – database for genetical genomics

update

Morris Swertz ([email protected])

Braunschweig CASIMIR meeeting

July 2, 2008


Objective
Objective

  • Share genotype/phenotype data and tools:


10

10.000

Main work flow

Data dependency

Biomaterial/result

Lab/analysis process

Scale of information

Associated data files

material

10.000

process

strains

genome

10,000

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

Complicated experiments

microarrays

probes


10

10.000

Collaborator 1

10.000

strains

genome

Incompatible data!

markers

inbreed

100

1,000,000

10,000

Collaborator 3

Incomplete data!

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

Collaborator 2

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

Barriers to sharing data

microarrays

probes


10

10.000

Investigation 1

10.000

Incomplete and/or incompatible data!

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

Investigation 3

10

10.000

hybridize

expressions

preprocess

norm exprs.

network

10.000

strains

genome

100

100,000

markers

microarrays

probes

inbreed

100

1,000,000

10,000

Investigation 2

10

10.000

individuals

genotype

genotypes

map

QTL profiles

correlate

10.000

strains

genome

100,000

10,000,00

markers

hybridize

expressions

preprocess

norm exprs.

network

inbreed

100

100,000

100

1,000,000

10,000

microarrays

probes

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

Barriers to sharing data


10

10.000

10.000

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

10

10.000

hybridize

expressions

preprocess

norm exprs.

network

10.000

strains

genome

100

100,000

markers

microarrays

probes

inbreed

100

1,000,000

10,000

10

10.000

individuals

genotype

genotypes

map

QTL profiles

correlate

10.000

strains

genome

100,000

10,000,00

markers

hybridize

expressions

preprocess

norm exprs.

network

inbreed

100

100,000

100

1,000,000

10,000

microarrays

probes

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

Barriers to sharing software tools


10

10.000

10.000

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

10

10.000

hybridize

expressions

preprocess

norm exprs.

network

10.000

strains

genome

100

100,000

markers

microarrays

probes

inbreed

100

1,000,000

10,000

10

10.000

individuals

genotype

genotypes

map

QTL profiles

correlate

10.000

strains

genome

100,000

10,000,00

markers

hybridize

expressions

preprocess

norm exprs.

network

inbreed

100

100,000

100

1,000,000

10,000

microarrays

probes

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes

Barriers to sharing software tools


Hard to find and reuse tools

10,000

QTL profiles

10,000

QTL profiles

10,000

QTL profiles

Barriers to sharing software tools


10

Use a standard tool?

10.000

10.000

strains

genome

markers

inbreed

100

1,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

100,000

10,000,00

hybridize

expressions

preprocess

norm exprs.

network

100

100,000

microarrays

probes


10

Main work flow

Data dependency

Biomaterial/result

Lab/analysis process

Scale of information

Associated data files

material

100.000

process

strains

genome

10,000

Yes, if it could be easily adapted!

(and they can’t)

SNP arrays

inbreed

100

10,000,000

10,000

individuals

genotype

genotypes

map

QTL profiles

correlate

1000

1000

LC/MS

mass peaks

preprocess

aligned peaks

network

More biotechnologies, more protocols


Objectives
Objectives

  • Share genotype/phenotype data and tools:

    • Interoperable software

      • Simple flat file exchange format

      • Database server

      • R/web-service interfaces

      • A procedure to extend the software

  • Build on extensible data model

    • Data

    • Annotations

    • Investigations

    • Integration references

  • Next steps


  • The software
    The software

    • Share genotype/phenotype data and tools:

      • Interoperable software

        • Simple flat file exchange format

        • Database server

        • R/web-service interfaces

        • A procedure to extend the software

    • Build on extensible data model

      • Data

      • Annotations

      • Investigations

      • Integration references

  • Next steps


  • Software flat file exchange format
    Software: flat file exchange format

    • Raw and processed data in matrix form

    E.g. microarray data.

    Rows = individuals,

    cols = affy probes.


    Software flat file exchange format1
    Software: flat file exchange format

    • Annotation info in tabular form

    E.g. probe annotation data.

    Rows = probes

    cols = attributes of each probe.


    Software exchange an experiment
    Software: exchange an experiment

    Described on

    http://gbic.biol.rug.nl/dbgg

    annotations

    Raw and processed data

    dbGG Import

    tool

    dbGG Export

    tool

    dbGG database


    Software

    Software

    Software: web user interface

    http://gbicserver1.biol.rug.nl:8080/dbgg/molgenis.do


    Software interface to r
    Software: interface to R

    source(“http://localhost:8080/molgenis4gg/R”)

    #download data

    use.experiment(name=“metanetwork”) #set default

    traits <- get.metabolitedata(name=“mytraits”)

    genotypes <- get.markerdata(name=“mygenotypes")

    #calculate mQTLs

    library(“MetaNetwork”)

    qtls <- qtlMapTwoPart(genotypes=genotypes, traits=traits, spike=4)

    #upload results for others to use

    add.mqtldata(qtls, name=“myqtls”)

    inspect

    MetaNetwork protocol:

    Fu, Swertz, Keurentjes, Jansen, Nature Protocols, 2007.


    Software interface to taverna
    Software: interface to Taverna

    add dbGG interface



    This enables automatic processing see also casimir use case 1
    This enables automatic processing(see also CASIMIR use ‘case 1’)

    dbGG

    Smedley, Swertz, Wolstencroft et al, Submitted.


    Use biomart and molgenis to access data and taverna to automate the workflows
    Use BioMART and MOLGENIS to access data and Taverna to automate the workflows

    Gene

    symbols

    ws

    ws

    ws

    SNPs

    Strain SNP Alleles

    Pathways

    ws

    Your

    dbGG

    Smedley, Swertz, Wolstencroft et al, Submitted.


    Software extension procedure using molgenis
    Software: extension procedure automate the workflows(using MOLGENIS)

    Little language

    <!-- entity organization -->

    <entityname="Experiment"label="Experiment">

    <fieldname="ExperimentID"key="1“

    readonly="true"

    label="ExperimentID(autonum)"/>

    <fieldname="Medium" type="xref"

    xref_field="Medium.name"/>/>

    <fieldname="Protocol"

    label="Experiment Protocol"/>

    <fieldname="Temperature"type="int"

    Domain specific language

    <!-- entity organization -->

    <entityname="Experiment"label="Experiment">

    <fieldname="ExperimentID"key="1“

    readonly="true"

    label="ExperimentID(autonum)"/>

    <fieldname="Medium" type="xref"

    xref_field="Medium.name"/>/>

    <fieldname="Protocol"

    label="Experiment Protocol"/>

    <fieldname="Temperature"type="int"

    Reusable assets and generator/interpreter

    +

    dbGG v1: for microarrays

    dbGG v2: for mass spectrometry


    Software extension procedure
    Software: extension procedure automate the workflows


    Website demos and downloads
    Website: demos and downloads automate the workflows

    http://gbic.biol.rug.nl/dbgg


    Outline
    Outline automate the workflows

    • To share genotype/phenotype data and tools:

      1. Interoperable software

      • Flat file exchange format

      • Database server

      • R/web-service interfaces

      • A procedure to extend the software

        2. Build on extensible data model

      • Data

      • Annotations

      • Investigations

      • Integration references

  • Next steps


  •  Data automate the workflows

    • Simple and close to current practice:

      Genotype data

    Subjects: STRAINS

    M

    A

    R

    K

    E

    R

    S

    DATA ELEMENTS

    T

    r

    a

    i

    t

    s:

    TRAIT  SUBJECT


     Data automate the workflows

    • Simple and close to current practice:

      Genotype data

      Expression data

    Subjects: INDIVIDUALS

    P

    R

    O

    B

    E

    S

    DATA ELEMENTS

    T

    r

    a

    i

    t

    s:

    TRAIT  SUBJECT


     Data automate the workflows

    • Simple and close to current practice:

      Genotype data

      Expression data

      Classic phenotype data

      Metabolite abundance data

      Protein abundance data

      And so on…

    TRAIT  SUBJECT


    Data with any dimension type
     Data automate the workflowswith any Dimension Type

    • Individual,

    • Strain,

    • Sample,

    SUBJECT

    TRAIT

    DATA

    ELEMENT

    • Probe

    • Marker

    • Mass Peak

    TRAIT  SUBJECT


     Data automate the workflows

    • Simple and close to current practice:

      What about QTL data?

    Traits: MARKERS

    P

    R

    O

    B

    E

    S

    DATA

    T

    r

    a

    i

    t

    s:


     Data automate the workflows

    • Simple and close to current practice:

      What about QTL data?

      Probe association data?

      Interaction network data?

    Traits: MARKERS

    P

    R

    O

    B

    E

    S

    DATA

    T

    r

    a

    i

    t

    s:

    TRAIT  TRAIT

    SUBJECT  SUBJECT


    Data with any dimension type1

    dimension automate the workflows

    ELEMENT

    columns

    rows

     Data with any Dimension Type

    • Minimal data model

    SUBJECT

    TRAIT

    DATA

    ELEMENT

    DATA

    ELEMENT


    The data model
    The data model automate the workflows

    • To share genotype/phenotype data and tools:

      • Extensible data model

        • Data

        • Annotations

        • Investigations

        • Integration references


    Annotations
     Annotations automate the workflows

    • Simple and close to current practice

      Probe annotations

    • PROBE IS A VARIANT OF TRAIT

    • HAVING:

    • Name

    • Gene

    • Chromosme

    • Locus


    Annotation extends trait or subject
     Annotation automate the workflowsextends Trait or Subject

    SUBJECT

    • STRAIN

    • Name

    • Type: CSS, RIL..

    • Parent Strains

    • INDIVIDUAL

    • Name

    • Strain

    • Mother

    • Father

    • Sex

    • SAMPLE

    • Name

    • Individual

    • Tissue

    And so on

    TRAIT

    dimension

    ELEMENT

    • PROBE

    • Name

    • Gene

    • Chromosme

    • Locus

    column

    • MARKER

    • Name

    • Allele

    • Chromosme

    • Locus

    • MASSPEAK

    • Name

    • MZ

    • RetentionTime

    And so on

    DATA

    ELEMENT

    row


    Annotation simple in practice
     Annotation automate the workflowssimple in practice

    QTL data

    Genotype data

    STRAIN

    MARKER

    MARKER

    DATA

    ELEMENT

    PROBE

    DATA

    ELEMENT

    Extensions are automatic “under the hood”

    PROBE

    isa

    TRAIT

    isa

    DIMENSION ELEMENT

    dimension

    ELEMENT

    Expression data

    INDIVIDL

    TRAIT

    MARKER

    DATA

    ELEMENT

    PROBE


    Data and annotations
     Data and  annotations automate the workflows

    DATA ELEMENTS

    PROBES


    The data model1
    The data model automate the workflows

    • To share genotype/phenotype data and tools:

      • Extensible data model

        • Data

        • Annotations

        • Investigations

        • Integration references


    Investigation workflow in the lab
     Investigation automate the workflowsworkflow in the lab

    QTL data

    Genotype data

    DATA

    STRAIN

    DATA

    MARKER

    ?

    ?

    MARKER

    DATA

    ELEMENT

    PROBE

    DATA

    ELEMENT

    Expression data

    DATA

    INDIVIDL

    ?

    MARKER

    DATA

    ELEMENT


    Investigation building on fuge
     Investigation automate the workflowsbuilding on FuGE

    QTL data

    Genotype data

    DATA

    Affy

    Array

    DATA

    QTL

    Mapping

    DATA

    DATA

    Affy M430

    Protocol

    Affy M430

    platform

    Bioconductor

    Norm.

    Mapping

    Protocol

    R

    Software

    FuGE:

    Expression data

    DATA

    DATA

    SNP

    Array

    DATA

    application

    Protocol

    Illumina

    Protocol

    Illumina

    Bead

    Studio

    Equipment

    Software

    FuGE: Jones et al Nature Biotech 25, 1127-1133


    Summary of data model

    column automate the workflows

    row

    Summary of data model

    PROBE

    MARKER

    STRAIN

    INDIVIDL

    SUBJECT

    DATA

    PROTOCOL

    APPLICTION

    INVESTI

    GATION

    Software

    TRAIT

    dimension

    ELEMENT

    Equipment

    PROTOCOL

    DATA

    ELEMENT


    The data model2
    The data model automate the workflows

    • To share genotype/phenotype data and tools:

      • Extensible data model

        • Data

        • Annotations

        • Investigations

        • Integration references


    References for integration
     References for integration automate the workflows

    • Ontology references and database references

    INVESTI

    GATION 2

    INVESTI

    GATION 1

    Hyperlink

    Incompatible naming 

    Map mouse on human ontologies

    GENE

    Name = Mip1alpha

    GENE

    Name = Mip1a

    ONTOLOGY

    ENTRY

    Id = 0005615

    Term = ABC

    Ontology=GO

    ONTOLOGY

    ENTRY

    Id = MP:0005385

    Term = cardiovascular

    Ontology=MP

    Compatible

    Identifiers 

    DATABASE

    REFERENCE

    Id = ENSMUS098

    Db=ENSEMBL

    DATABASE

    REFERENCE

    Id = ENSMU0S98

    Db=ENSEMBL

    DATABASE

    REFERENCE

    Id = ENSMUS98

    Db=ENSEMBL

    DATABASE

    REFERENCE

    Id = 1419561_AT

    Db=AFFY 430

    FuGE: Jones et al Nature Biotech 25, 1127-1133


    Summary of data model1

    column automate the workflows

    row

    Summary of data model

    PROBE

    MARKER

    STRAIN

    INDIVIDL

    extensible to more experiments…

    SUBJECT

    DATA

    PROTOCOL

    APPLICTION

    INVESTI

    GATION

    Software

    TRAIT

    dimension

    ELEMENT

    Equipment

    PROTOCOL

    DATA

    ELEMENT

    ONTOLOGY

    ENTRY

    Hyperlink

    DATABASE

    REFERENCE


    What is on the todo

    What is on the todo automate the workflows


    Todo automate the workflows

    • Publication: submitted 

    • Building a catalog of tools on top of dbGG

      • Experiments: in Braunschweig and Groningen

        • Illumina, Affy, Metabolites

      • Tool ‘plug-ins’

        • QTL graphs, import of annotations etc.

  • Exploit interoperability

    • E.g. integrate mouse & human with ontologies

    • Load annotations from other dbGG/BioMARTs

    • Build on and extend R/Taverna interaction


  • Summary and questions
    Summary and questions automate the workflows

    • Share genotype/phenotype data and tools:

      • Interoperable software

        • Simple flat file exchange format

        • Database server

        • R/web-service interfaces

        • A procedure to extend the software

    • Build on extensible data model

      • Data

      • Annotations

      • Investigations

      • Integration references

  • Next steps



  • Appendix procedure to re generate a molgenis

    Appendix: automate the workflowsProcedure to (re)generate a MOLGENIS


    Molgenis for data

    MOLGENIS for data automate the workflows


    Describe in little language
    Describe in little language automate the workflows

    probes

    individuals

    expressions


    Describe in little language1
    Describe in little language automate the workflows


    Describe in little language2
    Describe in little language automate the workflows


    Case gg generate and evaluate
    Case GG: Generate and evaluate automate the workflows

    http://gbic.biol.rug.nl/supplementary/2007/molgenis_showcase


    Describe in little language3
    Describe in little language automate the workflows

    http://gbic.biol.rug.nl/supplementary/2007/molgenis_showcase


    ad