1 / 42

Controlled vocabularies : Thesauri and information retrieval

Michael Middleton QUT School of Information Systems, Brisbane, Australia m.middleton@qut.edu.au for STIMULATE 5 Vrije Universiteit Brussel Brussels, Belgium July, 2005. Controlled vocabularies : Thesauri and information retrieval. Introduction. Context ….. History

marvene
Download Presentation

Controlled vocabularies : Thesauri and information retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Michael Middleton QUT School of Information Systems, Brisbane, Australia m.middleton@qut.edu.au for STIMULATE 5 Vrije Universiteit Brussel Brussels, Belgium July, 2005 Controlled vocabularies:Thesauri and information retrieval

  2. Introduction • Context ….. History • Vocabulary principles • Thesaurus software • Thesaurus building …. application • Thesaurus evaluation • The future

  3. Context: Information life cycle create • Organise to maintain distribute dispose store use reuse maintain recall

  4. Context: Information management Domains • Operational • Analytical • Strategic

  5. Context: indexing • Producing representations of records or documents that constitute a finding aid to the records in a database or to part of a document • Assigned indexing • Derived indexing

  6. Indexer qualities • The ‘Art’ of assigned indexing: • Empathy • Meticulousness • Consistency • General knowledge • Patience

  7. Indexing guidelines • Conceptual analysis and assigning • Aboutness • Elements of the document to consider • Exhaustivity • Specificity • Index what is in the item • Co-ordination

  8. Assigned index representations • Alphabetical Subject • Classified • Alphabetical • Notation • Chain

  9. Indexing exercise How consistent is database indexing? Example: the same paper in multiple databases: Middleton, M Skills expectations of library graduates http://eprints.qut.edu.au/archive/00000094/ • Index it yourself • Compare your indexing with others • Compare the indexing in ERIC and INSPEC

  10. Context: metadata • Agent • Document description • Responsibility • Administrative • Provenance • Connections • Conditions of use

  11. Context: metadata • Content • Topic (application of vocabulary control) • Coverage • Role

  12. Controlled vocabulary • Thesaurus • A controlled vocabulary of terms in natural language that are designed for post-coordination • Classification scheme • A scheme for organisation by categories in a systematic manner; this may involve grouping by subject, function or other criteria, or determining document naming conventions • Often involves notation

  13. Purpose • Indexing by translating diverse natural language to consistent terminology • Establishing relationships among terms • Information retrieval improving precision and recall

  14. History • Bibliographic databases • Many applications, list of online associated thesauri and classification schemes at http://sky.fit.qut.edu.au/~middletm/cont_voc.html • Standards • ISO2788; ISO 5964 • ANSI Z39.19

  15. Thesaurus principles • Term relationships • Continuing evolution • Internally consistent hierarchies to support database searching

  16. The Thesaurus • The vocabulary of a controlled indexing language formally organised so that the a priori relationships between concepts are made explicit. • A thesaurus is an example of metadata

  17. 35 mm CAMERAS BT MINIATURE CAMERAS CAMERAS BT OPTICAL EQUIPMENT NT MOVING PICTURE CAMERAS STEREO CAMERAS STILL CAMERAS UNDERWATER CAMERAS RT PHOTOGRAPHY CINE CAMERAS BT MOVING PICTURE CAMERAS NT UNDERWATER CINE CAMERAS RT CINEMA CINEMA RT CINE CAMERAS DIVING RT UNDERWATER CAMERAS INSTANT PICTURE CAMERAS SN Cameras which produce a finished print directly BT STILL CAMERAS Land cameras USE VIEW CAMERAS MICROSCOPES BT OPTICAL EQUIPMENT MINIATURE CAMERAS BT STILL CAMERAS NT 35 mm CAMERAS MOVING PICTURE CAMERAS BT CAMERAS NT CINE CAMERAS TELEVISION CAMERAS OPTICAL EQUIPMENT NT CAMERAS MICROSCOPES PHOTOGRAPHY RT CAMERAS Thesaurus extract (ISO sample)

  18. Standardising the Vocabulary • Types of entities & forms of terms • Singular vs plural • Homonyms • Choice of terms • Scope notes and history notes

  19. Compound terms • Terms should be factored into simpler elements to improve user’s understanding. • Semantic factoring • Syntactic factoring

  20. Semantic Relationships • Equivalence • Establishing relationships between preferred (postable) and non-preferred (non-postable) terms • Hierarchical • Establishing relationships between subordinate and superordinate terms. These may be distinguished as: • Generic • Whole-part • Instance • Associative • Establishing relationships between terms that are mentally associated, but not equivalent or hierarchical

  21. … but, the Functions thesaurus Whereas • agenda papers might have • broader termdocuments In a functions thesaurus • agenda papers might have • broader termmeetings

  22. Applying a functional thesaurus Top Term • PERSONNEL Scope Notes The function of managing all employees …… Related Terms • COMPENSATION • ESTABLISHMENT • INDUSTRIAL RELATIONS etc, etc Narrower Terms • ALLOWANCES • APPEALS (Decisions) • APPOINTMENT • ARRANGEMENTS • AUTHORISATION • COMMITTEES • COMPLIANCE etc, etc Use For Terms • Employees • Public Servants • Staff

  23. Thesaurus Display • Alphabetical hierarchies • One level above and below entry term • Complete hierarchy for each term or separate TT display • Permuted term lists • Combination with classification notation • Graphic Displays

  24. Applying a thesaurus Download Term Tree from http://www.termtree.com.au Free trial download from

  25. Thesaurus software • Assigned • Integrated database • Deriving terminology

  26. Thesaurus software - assigned Terms are assigned by vocabulary specialists in independent database • a.k.a.™ • Synercon Management Consulting • MultiTes • OpenCyc • SuperTHES • from THESmain/THESshow for mono-/multilingual thesauri • Term Tree 2000 • WebChoir • Wordmap

  27. Thesaurus software – integrated database Terms are assigned by specialists, thesaurus works like active data dictionary to control database • BASIS • InMagic Bibliotech PRO • BRS/Search • STAR

  28. Thesaurus software for deriving terminology Terms are created automatically from text • Entrieva • SemioTagger™, SemioMap™ and SemioSkyline™ for viewing • Intology • taxonomy builder • Verity • Thematic Mapping • Autonomy • taxonomy generation & categorization

  29. Thesaurus Building - 1 • Users • Define • Identify needs • Define Thesaurus range & depth • Raw vocabulary building • Identify sources • Collect and record terms

  30. Thesaurus Building -2 • Vocabulary organisation • Cluster terms • Establish relationships using symbols • Maintenance

  31. Business application • Not long term collaborative efforts of classification specialists • Instead, adapt to business changes • Not just descriptions of present business processes • Instead, reflect strategic planning, competitors • Not necessarily a single taxonomy • Instead, multiple overlapping taxonomies

  32. Content management • Describe content as it’s being created rather than classify after creation • User-needs orientation

  33. Integrating taxonomies • Accurate reporting • Exchange of data • Assist resource discovery • Information retrieval

  34. Thesaurus evaluation • Qualities • Information retrieval evaluation

  35. Thesaurus Qualities • Scope and features description • Display forms • Correctness of hierarchies • Use of scope, history and qualification • Adherence to standards • Syndetic measures • Connectedness • Accessibility

  36. Thesauri & Retrieval evaluation • Cranfield experiments & since • Recall and precision • Influence on indexing • Conceptual analysis • Translation failure • Omissions • Exhaustivity/Specificity • Syntax and ‘false drops’ • Maintenance costs

  37. Post-controlled vocabularies • Use of a ‘Hedge’ of terms to represent a broad concept, eg: • ‘psychological aspects of..........’ • ‘........in Australia’ • ‘....review items on.....’

  38. Still to come …… Research areas • Metathesauri • Super – interlinked vocabularies (e.g. NLM) • Semantic Web • Enhancing word association with usage statistics like links (e.g. THESUS)

  39. Review • Controlled vocabulary types • Software support • Business processes • Website • http://sky.fit.qut.edu.au/~middletm/cont_voc.html • (about to move to database driven site – redirection will be applied)

  40. Questions?

More Related