lsid and tcs deployment in the catalogue of life n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
LSID and TCS deployment in the Catalogue of Life PowerPoint Presentation
Download Presentation
LSID and TCS deployment in the Catalogue of Life

Loading in 2 Seconds...

play fullscreen
1 / 41

LSID and TCS deployment in the Catalogue of Life - PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on

LSID and TCS deployment in the Catalogue of Life. TDWG 2007, Bratislava Richard J. White, Andrew C. Jones & Ewen R. Orme Cardiff University, UK e.r.orme | andrew.c.jones | r.j.white@cs.cf.ac.uk. TDWG Infrastructure Project.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'LSID and TCS deployment in the Catalogue of Life' - minya


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lsid and tcs deployment in the catalogue of life

LSID and TCS deployment in the Catalogue of Life

TDWG 2007, Bratislava

Richard J. White, Andrew C. Jones & Ewen R. Orme

Cardiff University, UK

e.r.orme | andrew.c.jones | r.j.white@cs.cf.ac.uk

tdwg infrastructure project
TDWG Infrastructure Project
  • This talk describes work in progress on a project being carried out at Cardiff University and by Species 2000, supported by GBIF, titled "TCS and LSID deployment in the Catalogue of Life"
  • This talk has a similar title to the project, with a small concession to the fact that the presentation will be in the LSID session
objectives of the project
Objectives of the project

To add support for Life Sciences Identifiers (LSIDs) and the Taxon Concept Schema (TCS) to the Catalogue of Life (CoL), by implementing a limited set of changes to the CoL which will

  • act as a test-bed or demonstrator and
  • inform discussion based on this prototype
  • leading to a plan for deployment of LSIDs and TCS in the CoL from 2008

It is hoped this project will lead to

  • increased use of TDWG standards,
  • accelerated LSID deployment and uptake of TCS,

which will in turn

  • assist providers and users to ascribe data unambiguously to specified taxon concepts, and
  • speed the growth of shared biodiversity data resources
the catalogue of life
The Catalogue of Life

The CoL partners, Species 2000 and ITIS, have

  • built a checklist to contain all the world's species, which
  • acts as framework for organisation and enrichment of species biodiversity data by individuals, institutions and projects (some of which were mentioned by Frank Bisby in his talk yesterday)
  • is delivered as the Annual and Dynamic Checklists (AC and DC)
project participants
Project participants

Cardiff University, UK

to investigate feasible solution(s), implement, set up an experimental system as a basis for discussion and planning:

  • Ewen Orme (technical and implementation issues)
  • Andrew Jones (ideas and concepts)
  • Richard White (blame for failure)

Species 2000

(Secretariat at the University of Reading, UK)

to survey needs and capabilities in the light of the demonstration system, test it, help CoL decide how and when to deploy LSIDs in a staged manner:

  • Frank Bisby (policy)
  • Yuri Roskov (testing)
objectives of this talk
Objectives of this talk
  • to tell you about our progress so far
  • and what we plan to do between now and December 2007
  • to give an example of a data integrator adopting LSIDs and TCS
  • to stimulate discussion which will help us with the project
  • and ensure optimal cooperation between the CoL, TDWG and our data providers and consumers

Therefore many of the slides pose questions rather than provide answers!

progress 1
Progress (1)

We have

  • set up experimental installations of
    • the CoL Spice hub software and cache
    • the Annual Checklist (database and user interfaces)
  • addressed some Spice portability issues
  • established an LSID resolution service to support the use of CoL LSIDs
    • currently available for testing
    • the exact LSIDs are provisional and are NOT for real use yet!
  • designed and implemented provisional RDF/TCS responses from LSID resolver generated from the experimental AC
progress 2
Progress (2)
  • We will shortly modify the experimental AC database and software to issue provisional LSIDs for taxon concepts
  • Later we will modify the Spice hub and cache to do the same for the Dynamic Checklist
generating col lsids
Generating CoL LSIDs
  • An extra LSID field in experimental copy of AC 2007
  • Making the LSIDs available to users
    • on local AC client
    • on AC web site
  • Current LSID format is urn:lsid:lsid.sp2000.org:ac2007:159044
access to our resolver
Access to our resolver
  • Directly (if you know the IP address and port details)
  • via Firefox plug-in
  • via TDWG proxy (in due course)
  • We wish to encourage discussion of how users will resolve LSIDs
demo of the firefox plug in
Demo of the Firefox plug-in

(OK, it’s not a demo, it’s a set of screen-shots)

  • The TDWG Firefox LSID resolver plug-in
    • converts an LSID to a URL
    • forwards it to a (configured known) resolver
    • creates a digest from the returned RDF
  • Enter: lsidres:urn:lsid:sp2000.cs.cf.ac.uk:AC2007:159044
  • Later, it will permit lsidres:urn:lsid:lsid.sp2000.org:ac2007:159044
direct access to the resolver
Direct access to the resolver

(with a listing of the raw RDF)

  • One can enter this in the URL field of any browser: http://stilgar.cs.cf.ac.uk:8090/authority/metadata?lsid=urn:lsid:sp2000.cs.cf.ac.uk:AC2007:159044
  • which produces a following response in RDF
  • Here is a response for Chlorosarcinopsis negevensis.
  • I have split it up into sections to make it easier to read
vocabularies used
Vocabularies used

First, the vocabularies (name spaces) used (RDF and three TCS elements) are declared:

<rdf:RDF

xmlns:rdf=

"http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:TaxonConcept=

"http://rs.tdwg.org/ontology/voc/TaxonConcept#"

xmlns:TaxonName=

"http://rs.tdwg.org/ontology/voc/TaxonName#"

xmlns:TaxonRank=

"http://rs.tdwg.org/ontology/voc/TaxonRank#">

[In what follows I've abbreviated "http://rs.tdwg.org/ontology/voc/" to "..."]

accepted name
Accepted name

The resolver first returns two names. The first is the accepted name and the second is a synonym, but there is nothing to indicate that yet - these are just their names:

<rdf:Description rdf:nodeID="A0">

<rdf:type rdf:resource="...TaxonName#TaxonName"/>

<TaxonName:rank rdf:resource="...TaxonRank#Species"/>

<TaxonName:nameComplete>Chlorosarcinopsis negevensis

</TaxonName:nameComplete>

<TaxonName:genusPart> Chlorosarcinopsis

</TaxonName:genusPart>

<TaxonName:specificEpithet> negevensis

</TaxonName:specificEpithet>

<TaxonName:authorship> I. Friedmann & R. Ocampo-Paus

</TaxonName:authorship>

</rdf:Description>

synonym
Synonym

<rdf:Description rdf:nodeID="A1">

<rdf:type rdf:resource="...TaxonName#TaxonName"/>

<TaxonName:rank rdf:resource="...TaxonRank#Species"/>

<TaxonName:nameComplete>Neochlorosarcina negevensis

</TaxonName:nameComplete>

<TaxonName:genusPart> Neochlorosarcina

</TaxonName:genusPart>

<TaxonName:specificEpithet> negevensis

</TaxonName:specificEpithet>

<TaxonName:authorship> (I. Friedmann & R. Ocampo-Paus) S. Watanabe

</TaxonName:authorship>

</rdf:Description>

taxon concept
Taxon concept

The resolver then returns a set of concepts.

The concept for the species to which the LSID refers (populating the "about" attribute with the LSID); it has an accepted name and two relationships referred to by nodeIDs (which ultimately link the species to its synonym and genus):

<rdf:Description rdf:about="urn:lsid:lsid.sp2000.org:ac2007:174752">

<rdf:type rdf:resource="...TaxonConcept#TaxonConcept"/>

<TaxonConcept:hasName rdf:nodeID="A0"/>

<TaxonConcept:hasRelationship rdf:nodeID="A3"/>

<TaxonConcept:hasRelationship rdf:nodeID="A5"/>

</rdf:Description>

synonym concept
Synonym concept

The concept for the synonym, which has a name and and a relationship (to its accepted taxon concept defined later):

<rdf:Description rdf:nodeID="A4">

<rdf:type rdf:resource="...TaxonConcept#TaxonConcept"/>

<TaxonConcept:hasName rdf:nodeID="A1"/>

<TaxonConcept:hasRelationship rdf:nodeID="A2"/>

</rdf:Description>

hassynonym
HasSynonym

The remaining concepts are for three relationships.

The "HasSynonym" relationship from the accepted concept to the synonym concept:

<rdf:Description rdf:nodeID="A3">

<rdf:type rdf:resource="...TaxonConcept#Relationship"/>

<TaxonConcept:relationshipCategory

rdf:resource="...TaxonConcept#HasSynonym"/>

<TaxonConcept:fromTaxon

rdf:resource="urn:lsid:lsid.sp2000.org:ac2007:174752"/>

<TaxonConcept:toTaxon

rdf:nodeID= "A4"/>

</rdf:Description>

issynonymfor
IsSynonymFor

The reverse "IsSynonymFor" relationship from the synonym concept to the accepted concept:

<rdf:Description rdf:nodeID="A2">

<rdf:type rdf:resource="...TaxonConcept#Relationship"/>

<TaxonConcept:relationshipCategory

rdf:resource="...TaxonConcept#IsSynonymFor"/>

<TaxonConcept:fromTaxon

rdf:nodeID= "A4"/>

<TaxonConcept:toTaxon

rdf:resource="urn:lsid:lsid.sp2000.org:ac2007:174752"/>

</rdf:Description>

genus
Genus

Finally a relationship from the accepted species concept to its genus concept (which, being a separate taxon with its own LSID, is not contained in this document):

<rdf:Description rdf:nodeID="A5">

<rdf:type rdf:resource="...TaxonConcept#Relationship"/>

<TaxonConcept:relationshipCategory

rdf:resource="...TaxonConcept#IsChildTaxonOf"/>

<TaxonConcept:fromTaxon

rdf:resource="urn:lsid:lsid.sp2000.org:ac2007:174752"/>

<TaxonConcept:toTaxon

rdf:resource="urn:lsid:lsid.sp2000.org:ac2007:10868"/>

</rdf:Description>

</rdf:RDF>

questions to be resolved
Questions to be resolved

For what kinds of entities will the CoL issue LSIDs?

(Note that this is not the same question as which entities will be represented as <TaxonConcepts> in the RDF)

  • only accepted species taxon concepts [yes] or
  • all names (including synonyms)? [no, to be done by the nomenclator projects?]
  • taxa at other levels
    • lower ("infra-specific") taxa (CoL limits itself to only one infra-specific level, i.e. a species may have either subspecies or varieties but not both)
    • higher taxa: (Col limits itself to genus, family, order, class, phylum; but in practice also permits subgenera, superfamilies, and the eight "top-level" nodes above the phylum level)
what is returned by the resolver
What is returned by the resolver?
  • Should it return all available data? [yes]
  • or just enough to let the user query the AC or DC for more information? [no]
  • If only accepted taxa have LSIDs, the resolver response can include all synonyms
  • if all names get LSIDs (e.g. from nomenclators),
    • should all the names relating to a taxon be returned in one response? [yes]
    • or return only one concept per response? [no, need to call the resolver again]

How are higher taxa linked?

  • should the resolver return TCS about a single accepted taxon concept [yes, call the resolver again to "navigate"]
  • or embed all linked taxa (above and below) [no, could be huge]
types of synonymy in the col
Types of synonymy in the CoL

and how they are handled in RDF/TCS (provisional!)

propagating lsids and other guids
Propagating LSIDs and other GUIDs

We will demonstrate

  • receiving GUIDs from a data provider
  • and making them available in the metadata returned by the LSID resolver.

We will summarise the options for doing this in a preliminary plan to be refined by the Catalogue of Life partners in November 2007

decisions about deployment
Decisions about deployment
  • to be made by discussion with Sp2000 and ITIS
timetable for completion of the project
Timetable for completion of the project
  • We will complete the project by the end of December 2007, including
  • drafting updated documentation of the enhanced CDM and schemas
questions 1
Questions (1)

To be resolved during the rest of the project, and interesting areas for wider consideration and future research, some of which will be described in the presentation.

How to express the different types of CoL LSIDs within the allowed LSID syntax:

  • urn:lsid:sp2000.org:ac2007:159044
  • urn:ac2007.lsid:sp2000.org:taxon:159044
  • urn:dc.lsid:sp2000.org:taxon:159044
questions 2
Questions (2)
  • How will users (human or software) obtain LSIDs in the first place? Do we need to set up a modified version of the AC (and eventually DC) Web Service? [yes]
  • How the GUIDs (not necessarily LSIDs) that the data providers might supply will be propagated through the hub
  • The expectations of users concerning the activities that LSIDs will help them with
tcs completeness
TCS “Completeness”

Can all the information in the Sp2000 standard data set (CDM) be translated into TCS?

  • so that it could be turned back into a Sp2000 data set without loss of information?
  • could Sp2000 usefully employ any features of TCS not currently in the CDM?
  • such as finer distinctions between types of synonymy and other relationships between names and taxa?
taxonomic hierarchy issues
Taxonomic hierarchy issues

Whether and how the taxonomic hierarchy will be navigable using LSIDs

  • It is natural to assign LSIDs to higher taxa
  • Using the LSID Resolution Service to navigate up and down the taxonomic hierarchy raises some interesting issues
    • going up is straightforward (at least while only one hierarchy is considered)
    • going down may involve subtaxa at different levels
  • the children of one node may not all be at the same level
  • for example some genera in an order may not have been classified in any family ("incertae sedis")
  • you can’t give an LSID to an unnamed node, especially because you don't know whether two unnamed sister nodes are supposed to be the same or not)
more questions
More questions
  • How users (human or software) will obtain LSIDs for entities of interest?
  • Users’ expectations concerning tasks that LSIDs might assist,
    • including navigating the taxonomic hierarchy and
    • linking data to taxa
  • The role of CoL LSIDs in building the biodiversity information systems of the future
summary
Summary

We hope to:

  • improve the compatibility of the protocols and public software interfaces used by Species 2000 with TDWG standards
  • increase the usefulness of the CoL to users, including GBIF, by
    • improving the CoL’s compatibility with other biodiversity tools,
    • supplying its information to clients expressed as taxon concepts
    • by enhancing interoperability between data providers and consumers by means of LSIDs referring to these concepts
  • The updated Spice protocol, documentation and enhanced Spice software will be available for use by other projects to build species information systems for their own purposes.
homework for tdwg delegates
Homework for TDWG delegates
  • Informal meeting at 18:00 on Tuesday (today) in the vestibule (at the registration desk end) - all interested persons welcome!
  • Further information about this project and its progress, updated periodically, will be placed at http://spice.cs.cf.ac.uk/lsid/ (but don’t look there just yet!)
  • Email us at e.r.orme | andrew.c.jones | r.j.white@cs.cf.ac.uk