Data integration issues in biodiversity research
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

Data Integration Issues in Biodiversity Research PowerPoint PPT Presentation


  • 45 Views
  • Uploaded on
  • Presentation posted in: General

Data Integration Issues in Biodiversity Research. Jessie Kennedy Shawn Bowers, Matthew Jones, Josh Madin, Robert Peet, Deana Pennington, Mark Schildhauer, Aimee Stewart. SEEK. Science Environment for Ecological Knowledge

Download Presentation

Data Integration Issues in Biodiversity Research

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data integration issues in biodiversity research

Data Integration Issues in Biodiversity Research

Jessie Kennedy

Shawn Bowers, Matthew Jones, Josh Madin, Robert Peet, Deana Pennington, Mark Schildhauer, Aimee Stewart


Data integration issues in biodiversity research

SEEK

  • Science Environment for Ecological Knowledge

    • Research and develop information technology to radically improve the type and scale of ecological science that can be addressed

Visual Tools for Managing Taxonomic Concepts


Science and scientific data are complex

Climatology

Hydrology

Meteorology

Geography

Oceanography

Geology

Ecology

Paleontology

Genomics

Taxonomy

Proteomics

Morphology

Nomenclature

Biochemistry

Science and Scientific Data are Complex

Visual Tools for Managing Taxonomic Concepts


Data integration issues in biodiversity research

Climatology

Hydrology

Meteorology

Geography

Oceanography

Temperature

Geology

Depth

Location

Organism

Ecology

Paleontology

Taxon

concept

Gene

sequence

Genomics

Taxonomy

Proteomics

Name

Protein

Morphology

Nomenclature

Pathway

Biochemistry

Visual Tools for Managing Taxonomic Concepts


Scientific community complex

Scientific Community: complex

Small Scientific Community

Individual Scientist

Large Scientific Community

Scientific Laboraotory

Visual Tools for Managing Taxonomic Concepts


Data integration issues in biodiversity research

Climatology

Climatology

Climatology

Climatology

Hydrology

Hydrology

Hydrology

Hydrology

Meteorology

Meteorology

Meteorology

Meteorology

Geography

Geography

Geography

Geography

Oceanography

Oceanography

Oceanography

Oceanography

Temperature

Temperature

Temperature

Temperature

Geology

Geology

Geology

Geology

Depth

Depth

Depth

Depth

Location

Location

Location

Location

Organism

Organism

Organism

Organism

Ecology

Ecology

Ecology

Ecology

Paleontology

Paleontology

Paleontology

Paleontology

Taxon

concept

Taxon

concept

Taxon

concept

Taxon

concept

Gene

sequence

Gene

sequence

Gene

sequence

Gene

sequence

Genomics

Genomics

Genomics

Genomics

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Proteomics

Proteomics

Proteomics

Proteomics

Name

Name

Name

Name

Protein

Protein

Protein

Protein

Morphology

Morphology

Morphology

Morphology

Nomenclature

Nomenclature

Nomenclature

Nomenclature

Pathway

Pathway

Pathway

Pathway

Biochemistry

Biochemistry

Biochemistry

Biochemistry

Visual Tools for Managing Taxonomic Concepts


Science scientific data are continually changing

conclusion

observation

experiment

hypothesis

Science & Scientific Data are Continually Changing

  • Conclusions become foundations for new hypotheses

  • New experiments invalidate existing knowledge

  • Knowledge is open to interpretation

    • Different opinions

  • Need to build this into our technological solutions

Visual Tools for Managing Taxonomic Concepts


Exploiting scientific data

Exploiting Scientific Data

  • To support scientists in

    • Discovery

    • Access

    • Sharing

    • Integration/Linking

    • Analysis

  • Scientists can then improve their potential for new scientific discovery

Visual Tools for Managing Taxonomic Concepts


Data integration linking approaches

Data Integration/Linking: approaches

  • Metadata

    • to describe the data sets and know how to interpret the data sets

  • Ontologies

    • to define the terminology used and know how data might be related and to aid automatic transformation of the data

  • Standardisation of formats

    • for exchange of data + to ease integration

  • LSIDs

    • to uniquely identify things; know when 2 things are the same

  • Workflows

    • to enable specification, refinement and repetition of integration/analysis

  • Provenance of data

    • to record where the data has come from and what has happened to it en route.

Visual Tools for Managing Taxonomic Concepts


Projects in most sciences

ESG

Projects in most sciences:

Visual Tools for Managing Taxonomic Concepts


Ecological science analysis

Where do species occur now?

Image from http://www.lifemapper.org

Where will they occur in the future?

Ecological Science - Analysis

  • Ecological niche modeling of species distributions

Visual Tools for Managing Taxonomic Concepts


Ecological niche modeling

Environmental

Characteristics

Of Surrounding

Geographic Area

Native

Distribution

Prediction

Known

Species Locations

Environmental

Characteristics

Of Different

Geographic Area

Environmental

Characteristics

from gridded

GIS layers

Develop

Model

Temperature layer

Invasion

Area

Prediction

Many other layers

Multidimensional

Ecological Space

Future Scenarios

Of Environmental

Characteristics

D2

Dn

Environmental

Change

Prediction

D1 = Temperature

Ecological Niche Modeling

Visual Tools for Managing Taxonomic Concepts


Sources of scientific data

Sources of Scientific Data

  • Data are massively dispersed

    • Ecological field stations and research centers (100’s)

    • Natural history museums and biocollection facilities (100’s)

    • Agency data collections (10’s to 100’s)

    • Individual scientists (1000’s)

  • Data are heterogeneous

    • Syntax (format)

    • Schema (model)

    • Semantics (meaning)

Visual Tools for Managing Taxonomic Concepts


Challenge data integration

Challenge: Data Integration

Visual Tools for Managing Taxonomic Concepts


Seek components

SEEK Components

Visual Tools for Managing Taxonomic Concepts


Semantic annotation seek ontologies

Semantic Annotation – SEEK ontologies

  • Integration/merge

    • Concept mapping

    • Units conversion

    • Spatial & temporal scaling

  • Data discovery

    • Finding relevant data sets

    • Understanding data set content

Visual Tools for Managing Taxonomic Concepts


Smart data integration merge

Smart (Data) Integration: Merge

  • Discover data of interest

  • … connect to merge actor

  • … “compute merge”

Visual Tools for Managing Taxonomic Concepts


Smart merge

Biomass

Site

Site

Biomass

a1

a3

a1 a2 a3 a4

a 5 10

b 6 11

a1a8

a1 a3 a4

a 5.0 10

b 6.0 11

a 0.1

c 0.2

d 0.3

a4

a3a6

Merge

Merge Result

a6

a4

a5 a6 a7 a8

0.1 a

0.2 c

0.3 d

a8

Smart Merge …

  • Semantic type annotationsandontology definitionsused to find mappings between sources

  • Executing the merge actor results in an integrated data product (via “outer union”)

Visual Tools for Managing Taxonomic Concepts


Challenges of taxonomic data

Challenges of Taxonomic Data

Scientific names change in meaning over time + geographical region

 conclusions being drawn from analysis of data integrated on names.

Visual Tools for Managing Taxonomic Concepts


What is abies lasiocarpa

Abies lasiocarpa

var. lasiocarpa

Abies lasiocarpa

Abies bifolia

var. arizonica

What is Abies lasiocarpa?

USDA Plants & ITIS

Flora North America

SubAlpine Fir

Visual Tools for Managing Taxonomic Concepts


Changes in meaning of names

Linneaus 1758

Archer 1965

Fry 1989

Tucker 1991

Pargiter 2003

Aus L.1758

Aus L.1758

Aus L.1758

Aus L.1758

Aus L.1758

Aus aus L. 1758

Aus aus L.1758

Aus aus L.1758

Aus aus

L.1758

Aus ceus

BFry 1989

Aus bea

Archer 1965

Aus bea

Archer 1965

Aus aus L.1758

(vi) Xus Pargiter 2003

Aus cea

BFry 1989

Aus cea

BFry 1989

Xus beus (Archer)

Pargiter 2003.

Pyle1990

Aus bea and Aus cea noted as invalid names and replaced with Aus beus and Aus ceus.

Changes in meaning of names

Taxonomic history of imaginary genus Aus L. 1758

5 Revisions of Aus

1 name spelling change

Visual Tools for Managing Taxonomic Concepts


Changes in meaning of names1

Archer 1965

Fry 1989

Aus L.1758

Aus aus L.1758

Aus bea

Archer 1965

Aus cea

BFry 1989

Changes in meaning of names

Linneaus 1758

Tucker 1991

Pargiter 2003

Aus L.1758

Aus L.1758

Aus L.1758

Aus L.1758

Aus aus L. 1758

Aus aus L.1758

Aus aus

L.1758

Aus ceus

BFry 1989

Aus bea

Archer 1965

Aus aus L.1758

(vi) Xus Pargiter 2003

Aus cea

BFry 1989

Xus beus (Archer)

Pargiter 2003.

Pyle1990

  • 8 Names

    • 2 genus

    • 6 species

Aus bea and Aus cea noted as invalid names and replaced with Aus beusand Aus ceus.

Visual Tools for Managing Taxonomic Concepts


Each name has many concepts or meanings

C0.1

C0.1 - Aus L.1758 sec. Linneaeus 1758

C0.2

C0.2 - Aus L.1758 sec. Archer 1965

Each name has many concepts ormeanings

N0

C0.3

C0.3 - Aus L.1758 sec. Fry 1989

N0 - Aus L.1758

C0.4

C0.4 - Aus L.1758 sec. Tucker 1991

C0.5

C0.5 - Aus L.1758 sec. Pargiter 2003

C1.1

C1.1 - Aus aus L.1758 sec. Linneaeus 1758

C1.2

C1.2 - Aus aus L.1758 sec. Archer 1965

C1.3

N1

C1.3 - Aus aus L.1758 sec. Fry 1989

N1 - Aus aus L.1758

C1.4

C1.4 - Aus aus L.1758 sec. Tucker 1991

C1.5

C1.5 - Aus aus L.1758 sec. Pargiter 2003

C2.2

C2.2 - Aus bea Archer 1965 sec. Archer 1965

N2

C2.3

N2 - Aus bea Archer 1965

C2.3 - Aus bea Archer 1965 sec. Fry 1989

N3

C3.3

C3.3 - Aus cea Fry 1989 sec. Fry 1989

N3 - Aus cea Fry 1989

N4

C3.4 - Aus cea Fry 1989 sec. Tucker 1991

C3.4

N4 - Aus beus Archer 1965

C5.5

N5

C5.5 - Aus ceus Fry 1989 sec. Fry 1989

N5 - Aus ceus Fry 1989

C6.5

N6

C6.6 - Xus beus Pargiter 2003 sec. Pargiter 2003

N6 - Xus beus Pargiter 2003

C7.5

N7

C7.6 - Xus Pargiter 2003 sec. Pargiter 2003

N7 - Xus Pargiter 2003

8 Names 17 Concepts


Find data sets containing aus aus

Many possible interpretations of Aus aus (N1)

Original concept: C1.1

Most recent concept: C1.5

Preferred Authority (e.g. Fry 1989): C1.3

Everything ever named N1: Union(C1.1,C1.2,C1.3,C1.4,C1.5)

Best fit according to some matching algorithm Best(C1.1,C1.2,C1.3,C1.4,C1.5)

New concept containing only those features common to all concepts with the name N1: Intersection(C1.1,C1.2,C1.3,C1.4,C1.5)

Is it appropriate to link or merge data sets returned on the scientific names?

Depends on the user’s purpose

Level of precision required

Find data sets containing Aus aus

C1.1

N1 - Aus aus L.1758

C1.2

N1

C1.3

C1.4

C1.5

Visual Tools for Managing Taxonomic Concepts


Information from literature on synonymy

N7

N0

C0.5

C7.5

C0.2

C0.4

C0.1

C0.3

C1.5

C5.5

C6.5

C1.2

C1.3

C2.3

C3.3

C1.4

C3.4

C1.1

C2.2

N5

N6

N4

N3

N1

N2

Information from literature on synonymy

Taxonomists record which names their concepts are synonymous with

and any name changes

Parent child relationships in 5 revisions

Names for each of the concepts

Visual Tools for Managing Taxonomic Concepts


Find data sets with aus aus n1

N7

N0

C0.5

C7.5

C0.2

C0.4

C0.1

C0.3

C1.5

C5.5

C6.5

C1.2

C1.3

C2.3

C3.3

C1.4

C3.4

C1.1

C2.2

N5

N6

N4

N3

N1

N2

Find data sets with Aus aus (N1)

C1.5

C1.2

C1.3

C1.4

C1.1

N1

N1

Visual Tools for Managing Taxonomic Concepts


Find data sets with aus aus n11

N7

N0

C0.5

C7.5

C0.2

C0.4

C0.1

C0.3

C1.5

C5.5

C6.5

C1.2

C1.3

C2.3

C3.3

C1.4

C3.4

C1.1

C2.2

N5

N6

N4

N3

N1

N2

Find data sets with Aus aus (N1)

C1.5

C1.2

C1.3

C2.3

C1.4

C1.1

C2.2

N1

N1

N2

Visual Tools for Managing Taxonomic Concepts


Find data sets with aus aus n12

N7

N0

C0.5

C7.5

C0.2

C0.4

C0.1

C0.3

C1.5

C5.5

C6.5

C1.2

C1.3

C2.3

C3.3

C1.4

C3.4

C1.1

C2.2

N5

N6

N4

N3

N1

N2

Find data sets with Aus aus (N1)

C1.5

C1.2

C1.3

C2.3

C3.3

C1.4

C3.4

C1.1

C2.2

N3

N2

N1

N2

N1

N2

Visual Tools for Managing Taxonomic Concepts


Find data sets with aus aus n13

N7

N0

C0.5

C7.5

C0.2

C0.4

C0.1

C0.3

C1.5

C5.5

C6.5

C1.2

C1.3

C2.3

C3.3

C1.4

C3.4

C1.1

C2.2

N5

N6

N4

N3

N1

N2

Find data sets with Aus aus (N1)

C1.5

C6.5

C1.2

C1.3

C2.3

C3.3

C1.4

C3.4

C1.1

C2.2

N6

N4

N3

N2

N1

N2

N1

N2

Visual Tools for Managing Taxonomic Concepts


Find data sets with aus aus n14

N7

N0

C0.5

C7.5

C0.2

C0.4

C0.1

C0.3

C1.5

C5.5

C6.5

C1.2

C1.3

C2.3

C3.3

C1.4

C3.4

C1.1

C2.2

N5

N6

N4

N3

N1

N2

Find data sets with Aus aus (N1)

Results in everything returned for Aus aus by traversing the synonymy and name links

C1.5

C5.5

C6.5

C1.2

C1.3

C2.3

C3.3

C1.4

C3.4

C1.1

C2.2

N5

N6

N3

N4

N3

N2

N1

N2

N1

N2

Visual Tools for Managing Taxonomic Concepts


Information to improve data sets returned

Information to improve data sets returned

We can build systems to return data suit for purpose

N7

Minimally what we need are set relationships

from concepts in any taxonomy to earlier concepts

N0



C0.5

C7.5

C0.2

C0.4

C0.1

C0.3

C1.5

C5.5

C6.5

C1.2

C1.3

C2.3

C3.3

C1.4

C3.4

C1.1

C2.2

and name changes related to earlier names

N5

N6

N4

N3

N1

N2

=

=

Visual Tools for Managing Taxonomic Concepts


Real biological taxonomies

Real Biological Taxonomies

  • Larger and change more frequently than the Aus example

  • German mosses

    • 14 classifications in 73 years

    • covering 1548 taxa

    • only 35% thought to be stable concepts

      • 65% of names used in legacy data sets are ambiguous

  • Taxonomic Revisions of genus Alteromonas 34 years: from 1972 to 2006

    • At the species level

      • 18 “emendations”

      • 19 species reassigned to 4 genera

        • 3 new combinations

        • 6 synonyms

        • 2 species to subspecies

        • 2 subspecies to species

      • 21 new species

Visual Tools for Managing Taxonomic Concepts


Seek taxon approach

SEEK Taxon Approach

  • Use Taxon Concepts for referring to organisms

    • Aus aus L. 1758 sec. Tucker 1991

    • Abies lasiocarpa (Hook) Nutt. sec FNA 1997

  • Taxon Concept/Name Resolution

    • International data exchange schema

      • TCS (Taxonomic Concept Schema)

    • Concept Repository and Resolution web service

    • Linked to Kepler workflow system

    • Globally unique identifiers (LSIDs)

    • Visualization software for comparing Taxonomies and Asserting Concept Relationships

Visual Tools for Managing Taxonomic Concepts


Taxon object server

Database

to TCS

Mapping

Tool

TCS

TOS

SEEK Cache

TCS

Concept

Extraction

Tool

Mammal Species of the World

Taxonomic

Data

Providers

Concept

Mapper

Taxonomic

Literature

Taxon Object Server

Visual Tools for Managing Taxonomic Concepts


Taxonomic object service seek

Identify species

Data Analysis

Mark up datasets

TCS

TOS

Concept

Mapper

SEEK Cache

LSID

Authority

Morpho

EML

Datasets

  • Get Best Concept

EML(TCS)

  • Get Synonymous Concepts

  • Find All Concepts

TCS

http://seek.nhm.ku.edu/TaxObjServ/services

Taxonomic Object Service: SEEK

Visual Tools for Managing Taxonomic Concepts


Recap

Recap…

  • Re-emphasised the problems with Taxonomic Names

    • not good identifiers for organisms

    • problem extends to most areas

      • characters, countries, habitats, vegetation types, genes…..

  • Shown that Taxonomic concepts are better for referring to organisms, specimens, observations…

    • but

  • Need better systems for resolving taxonomic names/concepts

    • Which require better information

Visual Tools for Managing Taxonomic Concepts


Provide better tools for users

Provide better tools for users

  • To help taxonomists create better quality data

    • Better access to reference/legacy data

    • Explore differences/similarities in existing taxonomies

    • To create relationships between concepts

    • Improved data can be made available to the general biology community for incorporating into bio-referenced databases.

  • To help end users understand and use the data

    • and its limitations

    • Biologists can use tools to understand the impact of using particular data on their analysis

Visual Tools for Managing Taxonomic Concepts


Conclusion

Conclusion

  • Science is complex (and therefore split into specialisms)

    • Identify the overlaps/linkages in the different domains

      • Need useful approximations of things to simplify linked domain

      • Need to understand the approximations or linking points well

    • Support re-composition, linking or building on the components

  • Science is inherently changing

    • Science is full of legacy data

      • Today’s scientific research is tomorrow’s legacy data

    • Track the changes in the data

      • know when components or links have changed

  • Provide long-term persistent storage

    • Any published scientific discovery should store the data as evidence

    • Data needs to be accurately annotated

      • Sufficient to repeat analyses to test hypotheses

Visual Tools for Managing Taxonomic Concepts


Acknowledgements

Acknowledgements

  • Colleagues on the SEEK project

  • NSF and EPSRC funding

  • e-Science Centre funding

  • Colleagues in TDWG

Visual Tools for Managing Taxonomic Concepts


Thank you

Thank You

Questions…


  • Login