1 / 17

Controlled Vocabulary Working Group – Report September 2009

http:// intranet.lternet.edu/im/news/committees/working_groups/controlled_vocabulary Working Group: “Synthesis through data discovery and use: Past Present and Future Wed. 10-12pm. Long-Term Ecological Research. Controlled Vocabulary Working Group – Report September 2009.

adah
Download Presentation

Controlled Vocabulary Working Group – Report September 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. http://intranet.lternet.edu/im/news/committees/working_groups/controlled_vocabularyhttp://intranet.lternet.edu/im/news/committees/working_groups/controlled_vocabulary Working Group: “Synthesis through data discovery and use: Past Present and Future Wed. 10-12pm Long-Term Ecological Research Controlled Vocabulary Working Group – Report September 2009

  2. Background and Past Activities • Finalizing the list – who approves? • Procedures for managing the list • Next steps • Tool development • Keywording • Searching • Hierarchies/polytaxonomys/thesauri/ontologies Agenda for Vocab working group

  3. For past activities, see the report at: http://intranet.lternet.edu/im/node/114 and http://intranet.lternet.edu/archives/documents/Newsletters/DataBits/06spring/ • Summary: • Eclectic keywords make searching difficult – most terms are used only once! • No easy way to group or organize similar datasets to facilitate “browse” searches The Problem

  4. Assembled list of LTER EML Keywords • Cross linked that list to: • NBII Thesaurus Words • GCMD Keywords • Metacat Searchers • Edited • Changed words to preferred forms (kept track of synonyms) • Removed specific places, taxonomic names Steps takeN

  5. Selected • Keywords shared with GCMD and NBII, or • Keywords used at more than one LTER site • Reviewed • Removals and additions were suggested • Voting via SurveyMonkey • Edited • Added words voted for • Removed words voted against • When vote was close – went with current status Steps Taken

  6. 640 keywords 148 synonyms 201 NBII keywords 21 GCMD keywords The LIST

  7. Is additional editing required? • Who decides if it is an LTER “official” list? • And what does it mean if it is? • What procedures should be followed for subsequent editing of the list? • Who should manage the list database? • Term • Scope • Definition • Synonyms LTER SCIENCE KEYWORD LIST 1.0???

  8. Autocomplete search tool  - Duane Costa Autocompletekeywording tool - Duane Costa Update-document-keywords tool? Advanced search tool? Next steps - Tools

  9. There is general agreement that keywords are most useful when they can be tied to other keywords • How do we create the needed keyword taxonomy(s)? • Barbara Benson has done some work looking at other hierarchies (KNB, GCMD) • GiriPalanisamy has sent us the broader, narrower and related terms for the ~1/3 of the words that are also in the NBII thesaurus Next steps Hierarchies

  10. the existing KNB browse hierarchy is rather limited (the LTER version that gives the number of hits is a good feature) • a browse hierarchy could be useful to sites in developing one at the site • it could be hooked into any tools that are developed to assist in assigning keywords to datasets • it could be used in a tool that enables the creation of a browse hierarchy from a keyword list • it could assist in searches done by keywords in offering an option to go up a level from the keyword to a broader concept and thus yield a high number of hits in the search Hierarchies - Rationale

  11. Taxonomic and place keywords were excluded from the science keywords • Do we need a gazetteer for places? • Do we need taxonomic lists & tools for taxonomic information? • Are there other types of lists that are needed? Next steps – Other lists

  12. Feedback on tools • Ideas for additional tools • Hierarchy Discussion topics

  13. LTER words emerging organically • Not just general search • Other efforts • Vegetation ecology community interested in ontologies for vegetation traits • LTER words are not specialized • Would be good to keep in touch with other efforts • SONET – intercommunication (Gries) critical • Rob Raskin taking GCMD and ontologizing it • NASA is developing “Suite” – upper level ontology • Semtools – (O’Brien) – using Morpho and making it better database management system – using subsumption hierarchies in OWL • OWL allows use of generic applications (JENA) – standard format Around the room – next step

  14. Autocompletion tools helpful for NEW EML • But need tools for updating existing metadata • Having a first cut of recommendations would help • Tool that does suggestions based on document content would be helpful • Semantic annotation • Hook to parents, children and related • Educate PI’s on using list is important • Just availability of list is important Around the room

  15. Automatic annotation with broader terms • Identify “unfindable” datasets – what datasets have no LTER Keywords or synonyms? • Go dataset by dataset and see which have no hits • EML is limited in how it assigns keyword lists • Could target tools at keyword set • Namespacing control could be relaxed to go beyond “theme” and “place” Next Steps

  16. Ecotrends – predated LTER list • Would have been good to have LTER list • Eventually would like to integrate • May be able to exploit synonym rings • When title and dataset don’t match – Title says “Productivity” but attribute is “biomass” need to examine holistically • Linking terms to definitions needed • Also taxonomic database would be useful for “bugs” (true bugs vs insects) NEXT STEPS

  17. Practices in design • When develop – always think about how they are tied to organizational routines • Think proactively about how to make it routine – getting people to think in categories • Pursue Polytaxonomys based on Barbara’s list • Develop synonym list further • See how keyword lists match • AND has 3-level hierarchy • Start at top or bottom in adding…. NexT Steps

More Related