1 / 15

Controlled Vocabulary

Controlled Vocabulary. Don Henshaw US Forest Service Research Corvallis, OR Andrews Forest LTER With materials from : John Porter (Virginia Coast LTER, Univ Virginia) Deanna Pennington (SEEK, Univ. New Mexico) Eric Landis (consultant, Natural Resource Information Management). Keywords.

Mia_John
Download Presentation

Controlled Vocabulary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Controlled Vocabulary Don Henshaw US Forest Service Research Corvallis, OR Andrews Forest LTER With materials from : John Porter (Virginia Coast LTER, Univ Virginia) Deanna Pennington (SEEK, Univ. New Mexico) Eric Landis (consultant, Natural Resource Information Management) EAP LTER Workshop - 9 July 2007

  2. Keywords • Keyword = Key ideas/concepts expressed as words • Keywording = The practice of selecting the most appropriate keywords to describe an object, image, or work. EAP LTER Workshop - 9 July 2007

  3. Controlled Vocabulary Groups different ways of describing a concept under a single word or phrase Makes a database easier to search Makes searching more efficient and precise Saves time of having to search under all of the synonyms for that term Requires consistency on the part of the individual indexing the database and the use of pre-determined terms. EAP LTER Workshop - 9 July 2007

  4. Retrieval Performance Controlled vocabulary vs natural language systems • Free Text or Natural Language systems (e.g., Google search engine) often provide more results in a shorter time span because you are searching all the fields of a given database • Free text searches work well for very specific searches, however, when a topic is older or broader in scope, you likely will retrieve irrelevant hits • Free text searches may miss some records relevant to your search because the proper search term is not used • Searching a database requires striking a balance between preciseness and generating enough hits to make the search successful EAP LTER Workshop - 9 July 2007

  5. Thesaurus • A structured list of approved subject headings (also known as "preferred terms") that shows the relationships among those terms • Relationships include broader/parent terms, narrower/child terms, and related terms • Acts as a controlled vocabulary that specifies non-preferred terms (terms that should not be used for indexing or retrieval) and references the preferred concept that should be used instead • Include term definitions, and/or scope notes that explain a particular context in which a term is being used EAP LTER Workshop - 9 July 2007

  6. Use of Keyword/Terms in the LTER Data Catalog and Bibliography

  7. Common Methods to Achieve Semantic Clarification • Keywords - Assign keywords (often called annotating) to resources • More efficient than searching through the entire text of the resource • Generally uncontrolled and inconsistent • Data dictionary - Provides a defined list of keywords (data dictionary) • Clarifies what terms may be searched and what those terms explicitly mean • Controlled vocabulary – Controls and limits terms that may be used • Incorporates the use of synonyms for more efficient searches. Synonyms are terms that represent the same concept • Thesaurus - allows synonyms and specifies the link between them, and in addition, shows other relationships, e.g., “related to” • Searches return resources annotated to the search term • Shows how terms are associated e.g. antonym • Taxonomy - adds a classification hierarchy • Relationships are “vertical” meaning they are limited to broader or narrower searches (e.g., parent-child relationships) • Ontology – Encodes a conceptual model describing and defining the relationships among terms • Allows searches by a term’s properties. Properties are the defining characteristics of each concept EAP LTER Workshop - 9 July 2007

  8. Semantic Methods and CharacteristicsDeanna Pennington, LTER DataBits, Spring 2006 Attribute EAP LTER Workshop - 9 July 2007

  9. Semantic ClarificationDeanna Pennington, LTER DataBits, Spring 2006 EAP LTER Workshop - 9 July 2007

  10. Examples of other Resources • Global Change Master Directory • http://gcmd.nasa.gov/ • GEMET, the GEneral Multilingual Environmental Thesaurus • http://www.eionet.europa.eu/gemet • Idea is to use the best of the presently available excellent multilingual thesauri to define a core of general terminology for the environment • WordNet • http://wordnet.princeton.edu/ • Knowledge Network for Biocomplexity (KNB) • http://knb.ecoinformatics.org/index.jsp EAP LTER Workshop - 9 July 2007

  11. Andrews LTER Theme Keywords • Controlled vocabulary of preferred keywords • Provides primary basis for information search • Developed locally by committee in time-consuming process • Existing vocabularies considered too general • I.e., Global Change Master Directory (GCMD) • Lacks interoperability with other vocabularies • Non-preferred keywords • Provides mechanism to maintain legacy keywords • Link provided to preferred keyword • Hierarchic structure – 3 levels • Improves search capability • Imposes additional maintenance overhead EAP LTER Workshop - 9 July 2007

  12. Andrews LTER Theme KeywordsTop-level keywords • Habitat/Environment • Aquatic/Riparian habitat, Estuarine habitat, Marine habitat, Terrestrial/Upland habitat • Discipline/Approach • Climate/Meteorology, Conservation biology, Data & Information management, Disturbance, Ecology, Genetics, Geology/geomorphology, History, Human dimension, Landscape ecology, Hydrology/Water, Invertebrates, Methods, Microbiota, Modeling, Plants, Program administration, Remote sensing, Resource management, Silviculture, Soils, Taxonomy/Systematics, Vertebrates • Principle/Process • Biological diversity, Biomass, Ecosystem processes, Physiological processes, Pollution, Population dynamics, Trophic relations EAP LTER Workshop - 9 July 2007

  13. Catalog Catalog_id Catalog_type_id Theme_keyword Keyword_type Theme_keyword_id Parent_keyword_id Keyword_name Keyword_type_id Is_preferred_keyword Keyword_type_id Keyword_type Catalog_theme_keyword Catalog_id Theme_keyword_id Related_theme_keyword Theme_keyword_id Related_theme_keyword_id Is_synonym Andrews LTER Theme Keywords - Database Design Catalog_type Database GIS database Image (planned) Keyword_type Theme Keyword Temporal Keyword Stratum Keyword Taxonomic Keyword Methodology Keyword EAP LTER Workshop - 9 July 2007

  14. Strategy to Implementation of a Controlled Vocabulary • Goal: contribute to the credibility, usability, accessibility, and persistence of information products • Considerations: • Cost • Sustainability • Acceptability EAP LTER Workshop - 9 July 2007

  15. Strategy to Implementation of a Controlled Vocabulary • Options: • Do nothing • Establish taskforce or committee • Adopt an existing vocabulary • Adopt an existing vocabulary with modifications • Build your own vocabulary • Considerations: • Cost, Sustainability, Acceptability EAP LTER Workshop - 9 July 2007

More Related