Download
richard white n.
Skip this Video
Loading SlideShow in 5 Seconds..
Richard White PowerPoint Presentation
Download Presentation
Richard White

Richard White

151 Views Download Presentation
Download Presentation

Richard White

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Biodiversity Informatics Richard White

  2. Part One • An introduction to biodiversity data

  3. Outline Biodiversity: what is it? Definitions: is biodiversity: A resource? Something which can be measured? How to measure it Who is it for? Data providers Researchers Users Biodiversity Informatics Research into techniques for handling data

  4. Threats to the planet Human activities Economics Habitat conservation Ecology Ecological diversity Exploitation Conservation Management Species diversity Evolution Species conservation Information services Genetic diversity Genetics Genetic resources Molecular biology Legal issues

  5. Biodiversity data types Kinds of biodiversity information: Data about areas, habitats, etc. Data about individual specimens Data about species Biodiversity data dimensions Species Diverse information types Descriptive, geographical, chemical, genomic etc.

  6. Data about areas, habitats, etc. Species lists, for Conservation Management Legal obligations Ecological processes Modelling ecosystems Predicting impacts

  7. Information about individual specimens Curatorial data about the management of each specimen Data describing characteristics of the specimen itself (which can also describe an entire species)

  8. Curatorial information Collection event Date and place of collection Collector’s name Identifications (determinations) Species name (see data about individual specimens and species) Who identified it, date, etc. Management information Location within the specimen collection (storage) Treatments given to specimen, etc.

  9. Data about specimens and species curatorial data nomenclatural data descriptive data geographical data, maps images bibliographic data

  10. Data describing specimens and species (1) Genetic diversity Allele and chromosome frequencies Molecular bioinformatics Molecular data – enzyme properties, etc. Molecular sequences – DNA, protein, polysaccharides, etc. “Traditional” data used in taxonomy etc. See next slide

  11. Data describing specimens and species (2) Nomenclature – accepted name, synonyms Taxonomy – higher taxa Geographical data – distribution (range) Chemical constituents (especially in plants) Behavioural information (animals) Descriptive data Anatomical and morphological descriptors Images Bibliographic data (source references, especially for species data)

  12. Geographical data - storage Database may store: Individual locations of specimens or sightings Status in an area based on a number of specimens or sightings: (present, absent, introduced, etc.) Locations may be stored as Area names (languages, synonyms, hierarchies, overlaps) Grid coordinates (various systems)

  13. Geographical data - use May be used to generate summary distributions (e.g. for species distribution from specimen data) Maps (point locations or shaded areas) May be used to allow searching by location or area – user may specify a point or an area name

  14. Descriptive data Should be carefully designed, because it is complex and may be used for many purposes It should be Structured Consistently applied It may include data types suitable for statistical and multivariate analysis Special problems exist

  15. Descriptive data Structured, for Querying Classification, phylogenetic analysis Identification Documentation and dissemination

  16. Descriptive data Consistency and comparability: Consistent terminology (c.f. attempts to standardise terms for indexing purposes, as in BioCASE Thesaurus) Same characters for all specimens or taxa Characters precisely defined Discontinuous - set of character states Continuous – units, precision

  17. Descriptive data – special problems Variability specimens within a species repeated structures within a specimen Character dependence (inapplicable characters) Taxonomic hierarchy issues, e.g. Is the data for a species in agreement with the data for a genus? Can the data be stored at the appropriate taxonomic level only?

  18. Images Type Bitmap files, e.g. JPEG Vector graphics, e.g. drawings, diagrams Location In the local database Elsewhere in a separate image bank The Web makes the latter option easy – just store the URL in the database

  19. Biodiversity organisations Database level: ILDIS (International Legume Database and Information Service) - www.ildis.org Data portal level: Species 2000 - www.sp2000.org GBIF (Global Biodiversity Information Facility) - www.gbif.org International agencies: CBD, CITES, WCMC, etc. ... Standards, etc. TDWG (Taxonomic Database Working Group) Lots more

  20. Practical session In the practical session, we will Look at what some of the various biodiversity organisations are doing Try some of their data portals Evaluate some of the biodiversity information systems available (introduced in Part Two of this talk, to follow), from the points of view of scientific and professional users and the general public

  21. Part Two • Biodiversity information systems • (Some of this material appeared in the Computing for Bioinformatics module)

  22. Thoughts • Role of biodiversity data in bioinformatics • assisting with organising and retrieving bioinformatic (molecular) data • a separate area with different users (taxonomy, ecology, conservation, resource management …) • Demand from users for taxonomic and species diversity information on the Web • Pressure on the taxonomic community to deliver • Demand for more sophisticated use of available data: interoperability = online analysis, not just browsing

  23. Assembling biodiversity information sources Delivering species diversity information by assembling, merging & linking databases and publishing on the Web, with special emphasis on linking

  24. Issues in assembling and linking biodiversity information sources Assembling a web-site (ERMS) Assembling databases by merging (ILDIS) Linking on-line databases through a gateway (Species 2000 and SPICE) Onward links to related information Checking the reliability of links (LITCHI) Intelligent linking Persistent identifiers

  25. Assembling species databases First of all, before we start merging and linking databases, let’s assemble a database from scratch: ERMS (European Register of Marine Species) Now at www.marbef.org/data/erms.php

  26. ERMS

  27. Incoming data Approximately 100 separate lists for different taxonomic groups Mostly compiled as spreadsheets Scientific names, synonyms, geography (at least Atlantic or Mediterranean) Some optional fields Objective to create a book and a web-site, partially supported by a database

  28. List conversion was carried out in several stages: Excel spreadsheets were exported to text files Tab-delimited text files were imported into a client-server database (MySQL) Database queries results are passed through templates to generate either RTF (for the printed publication) or HTML (for the Web site)

  29. Variations on a theme Fields may be combined or separated e.g. genus species authority date Higher taxa may be: repeated in fields of the species record given once in separate preceding records in various different formats Synonyms may be: in a separate field of the species record, or mixed with other remarks, with various delimiters and separators in separate records, linked by code or by name or even abbreviated implied, e.g. Genus1 specname (Smith as Genus2) Geographical information is often free text

  30. ERMS book page

  31. Osteichthyes: brief checklist

  32. Reptilia: full details

  33. Taxonomic hierarchy for Reptilia

  34. Merging versus linking Merging databases to create a single larger database Linking databases to create a distributed information system

  35. Merging species databases 1 The original databases are physically copied into a new combined database. 2 The user interacts with the new combined database.

  36. Linking 1 The user interacts with an access system which does not itself contain data. 2When the user requests data, it is fetched from the appropriate database.

  37. Assembling databases by merging Now we have some databases, let’s build a bigger one by merging: ILDIS (International Legume Database and Information Service)

  38. ILDIS International Legume Database and Information Service International collaborative project 10 Regional Centres 30 Taxonomic Coordinators Its goals include building, maintaining and enhancing the ILDIS World Database of Legumes designing and providing services from it to users, including: ILDIS LegumeWeb via Species 2000

  39. ILDIS World Database of Legumes v. 7.00 Taxa Species 15,500 Subspecies 1,600 Varieties 2,400 19,500 Names Accepted names 19,500 Synonyms 19,000 39,500

  40. ILDIS’s data model: core data A core taxonomic checklist, assembled from regional data sets and nearing completion, provides a consensus taxonomy - a unified taxonomic treatment or backbone on which other data can be hung Various kinds of additional data may be attached to this backbone (see later)

  41. Features of ILDIS LegumeWeb We’ll look at examples of the use of LegumeWeb, to show a couple of features: Two-stage access with “synonymic indexing” A gateway to external information - “onward links” (direct species name links) to further sources of information

  42. User access to LegumeWeb: Step 1 The user types in a name, which may be incomplete (or wrong!) LegumeWeb responds by showing a list of the species names which fit the user’s specification

  43. User access to LegumeWeb: Step 2 The user chooses one of the species names provided (which may be synonym or an accepted name) In this example, the user chooses Abrus cyaneus (a synonym for Abrus precatorius) LegumeWeb responds by showing a standard set of information about the chosen species

  44. Synonymic indexing Automated synonymic indexing synonym entered  accepted name found(name  taxon) taxon found  synonyms listed Types of synonyms Unambiguous Ambiguous pro parte homonyms misapplied names In these cases an explanation is offered to the user

  45. Assembling databases by linking Now we have some biggish databases, let’s build something even bigger by linking databases together: Species 2000 SPICE Species 2000 Europa

  46. Linking 1 The user interacts with an access system which does not itself contain data. 2When the user requests data, it is fetched from the appropriate database.