1 / 52

EcoTerm IV NBII/EioNet Demo of Federated KOS Search

EcoTerm IV NBII/EioNet Demo of Federated KOS Search. Mike Frame Vienna, Austria April 2007. Discussion Topics…. Project Background NBII Thesaurus GEMET Thesaurus Prototype Client Sample Query Results Including no, 1, or both thesauri Overall Findings.

Download Presentation

EcoTerm IV NBII/EioNet Demo of Federated KOS Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EcoTerm IVNBII/EioNet Demo of Federated KOS Search Mike Frame Vienna, Austria April 2007

  2. Discussion Topics… • Project Background • NBII Thesaurus • GEMET Thesaurus • Prototype Client • Sample Query Results • Including no, 1, or both thesauri • Overall Findings

  3. Biocomplexity Thesaurushttp://thesaurus.nbii.gov http://thesaurus.nbii.gov

  4. EIONET GEMET Thesaurushttp://www.eionet.europa.eu/gemet/webservices?langcode=en

  5. NBII/EIONET Thesaurus Web-service • Background - collaboration through Ecoinformatics TWG • Primary Goal – access distributed multi-lingual thesauri • Results – SKOS web-service & client 1

  6. Latest Client & Service capabilities • Access to both NBII and GEMET • Single language capability • Results are provided by source • All documentation is completed http://thesaurus.nbii.gov

  7. Demo Client

  8. Initial Challenges Identified • Thesaurus scope, intent, purpose, and coverage is different • NBII = sub-discipline of environment • Endangered species • Broader Terms:Species , Special status species , Taxa • EIOINET = broad environment • Broader Terms:environmental protection

  9. Current State • Users • Most aren’t aware of the underlying vocabulary • Vocabulary are often unique to organization and more for “categorization” than retrieval • Goal • Include all Vocabularies and let Search Engine handle results

  10. Demonstration Search Retrieval • Created a demonstration datasets • NBII Cataloged Resources • ~30,000 web-sites, publications, images, maps, etc. • Xml structured data – controlled subject • NBII FGDC Metadata • ~22,000 resources on research studies • 150-200 elements • Semi-structured with no controlled vocabulary

  11. NBII Catalog Records • Based on the Dublin Core + • 18 elements, of which 10 are mandatory • In place since 2002 • Used by distributed content managers

  12. NBII Metadata CH

  13. Process • Added thesaurus capabilities to Development Search Engine for: • NBII Thesaurus • EIONET GEMET Thesaurus • Used BT, RT, NT relationships & weighting • Performed sample queries within the test repositories for: • No thesaurus • GEMET only aided searching • NBII only aided searching • GEMET+NBII aided searching (X)

  14. Test Repository 1 • NBII Resource Catalog (Dublin Core)

  15. No Thesauri – “invasive species”

  16. NBII Thesaurus – “invasive species”

  17. GEMET Thesaurus – “invasive species”

  18. No Thesauri – “Endangered Species”

  19. NBII Thesaurus – “endangered species”

  20. GEMET Only – “endangered species”

  21. No Thesaurus – “rare species”

  22. NBII Thesaurus – “rare species”

  23. GEMET Thesaurus – “rare species”

  24. GEMET Thesaurus – “rare species” (expanded degrees of relevance)

  25. No Thesauri – “protected species”

  26. NBII Thesaurus – “protected species”

  27. GEMET Thesaurus – “protected species”

  28. Results – NBII Catalog Resources

  29. Results – NBII Resource Catalog

  30. Test Repository 2 • NBII FGDC Metadata

  31. Sample Queries – No vocabulariesMetadata CH “ invasive species”

  32. Sample Queries – NBII onlyMetadata CH “invasive species”

  33. Sample Queries – GEMET onlyMetadata CH“ invasive species”

  34. Sample Queries – No vocabulariesMetadata CH“endangered species”

  35. Sample Queries – NBII onlyMetadata CH“endangered species”

  36. Sample Queries – GEMET onlyMetadata CH“ endangered species”

  37. No Thesauri – Metadata CH“rare species”

  38. NBII Thesaurus – Metadata CH “rare species”

  39. GEMET Thesaurus – Metadata CH“rare species”

  40. Sample Queries – No vocabulariesMetadata CH “protected species”

  41. Sample Queries – NBII onlyMetadata CH “protected species”

  42. Sample Queries – GEMET onlyMetadata CH“ protected species”

  43. Results – FGDC Metadata

  44. Results – NBII Resource Catalog

  45. Overall ResultsGeneral Findings • Assumption that a Thesaurus improves “number” of results is valid • Degree does vary by the term and mappings • Since users search from a # of perspectives, backgrounds, expertise, multiple thesaurus do improve the number of results

  46. Overall ResultsUsing only GEMET Terminology • Terms not included in the NBII thesaurus that were in GEMET improved search results • GEMET strength of broad coverage aided searches • In General for the Metadata repository • Results varied somewhat, but often same top 10 results

  47. Overall ResultsGeneral Findings • With “No thesaurus” test results produced poorer #1 results • Thesaurus results for the structured set ordered results list more differently than unstructured set (Metadata)

  48. Issues • “integrating” multi-scope and purpose thesauri presents challenges: • Can’t turn the effort into a thesaurus project • Degrees of relevance of terms is an issue • Concept matching or different intent • Differing classification (RT vs. NT) across thesauri • Differing “weighting” algorithms

  49. Further Study Options 1.) Take multiple thesauri “as is” 2.) Do some “attempted” concept matching i.e. “endangered animal species” – “endangered animal” 3.) If not match is present, add term and relationship as is 4.) Obtain terms from XMDR

  50. Further Study Options – cont. • Follow-up with additional repositories • Repeat with other query terms • Re-look at weighting algorithms • Do queries with subset of terms • Repeat with completely integrated thesaurus as compared to>>>>>>> • Repeat queries with machine integration Complete By June

More Related