controlled vocabularies thesauri and information retrieval
Download
Skip this Video
Download Presentation
Controlled vocabularies : Thesauri and information retrieval

Loading in 2 Seconds...

play fullscreen
1 / 42

Controlled vocabularies: Thesauri and information retrieval - PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on

Michael Middleton QUT School of Information Systems, Brisbane, Australia [email protected] for STIMULATE 5 Vrije Universiteit Brussel Brussels, Belgium July, 2005. Controlled vocabularies : Thesauri and information retrieval. Introduction. Context ….. History

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Controlled vocabularies: Thesauri and information retrieval' - marvene


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
controlled vocabularies thesauri and information retrieval

Michael Middleton

QUT School of Information Systems, Brisbane, Australia

[email protected]

for

STIMULATE 5

Vrije Universiteit Brussel

Brussels, Belgium

July, 2005

Controlled vocabularies:Thesauri and information retrieval

introduction
Introduction
  • Context ….. History
  • Vocabulary principles
  • Thesaurus software
  • Thesaurus building …. application
  • Thesaurus evaluation
  • The future
context information life cycle
Context: Information life cycle

create

  • Organise to maintain

distribute

dispose

store

use

reuse

maintain

recall

context information management
Context: Information management

Domains

  • Operational
  • Analytical
  • Strategic
context indexing
Context: indexing
  • Producing representations of records or documents that constitute a finding aid to the records in a database or to part of a document
    • Assigned indexing
    • Derived indexing
indexer qualities
Indexer qualities
  • The ‘Art’ of assigned indexing:
    • Empathy
    • Meticulousness
    • Consistency
    • General knowledge
    • Patience
indexing guidelines
Indexing guidelines
  • Conceptual analysis and assigning
  • Aboutness
  • Elements of the document to consider
  • Exhaustivity
  • Specificity
  • Index what is in the item
  • Co-ordination
assigned index representations
Assigned index representations
  • Alphabetical Subject
  • Classified
    • Alphabetical
    • Notation
  • Chain
indexing exercise
Indexing exercise

How consistent is database indexing?

Example: the same paper in multiple databases:

Middleton, M Skills expectations of library graduates http://eprints.qut.edu.au/archive/00000094/

  • Index it yourself
  • Compare your indexing with others
  • Compare the indexing in ERIC and INSPEC
context metadata
Context: metadata
  • Agent
    • Document description
    • Responsibility
    • Administrative
    • Provenance
    • Connections
    • Conditions of use
context metadata11
Context: metadata
  • Content
    • Topic (application of vocabulary control)
    • Coverage
    • Role
controlled vocabulary
Controlled vocabulary
  • Thesaurus
    • A controlled vocabulary of terms in natural language that are designed for post-coordination
  • Classification scheme
    • A scheme for organisation by categories in a systematic manner; this may involve grouping by subject, function or other criteria, or determining document naming conventions
    • Often involves notation
purpose
Purpose
  • Indexing by translating diverse natural language to consistent terminology
  • Establishing relationships among terms
  • Information retrieval improving precision and recall
history
History
  • Bibliographic databases
    • Many applications, list of online associated thesauri and classification schemes at http://sky.fit.qut.edu.au/~middletm/cont_voc.html
  • Standards
    • ISO2788; ISO 5964
    • ANSI Z39.19
thesaurus principles
Thesaurus principles
  • Term relationships
  • Continuing evolution
  • Internally consistent hierarchies to support database searching
the thesaurus
The Thesaurus
  • The vocabulary of a controlled indexing language formally organised so that the a priori relationships between concepts are made explicit.
  • A thesaurus is an example of metadata
thesaurus extract iso sample
35 mm CAMERAS

BT MINIATURE CAMERAS

CAMERAS

BT OPTICAL EQUIPMENT

NT MOVING PICTURE CAMERAS

STEREO CAMERAS

STILL CAMERAS

UNDERWATER CAMERAS

RT PHOTOGRAPHY

CINE CAMERAS

BT MOVING PICTURE CAMERAS

NT UNDERWATER CINE CAMERAS

RT CINEMA

CINEMA

RT CINE CAMERAS

DIVING

RT UNDERWATER CAMERAS

INSTANT PICTURE CAMERAS

SN Cameras which produce a finished

print directly

BT STILL CAMERAS

Land cameras USE VIEW CAMERAS

MICROSCOPES

BT OPTICAL EQUIPMENT

MINIATURE CAMERAS

BT STILL CAMERAS

NT 35 mm CAMERAS

MOVING PICTURE CAMERAS

BT CAMERAS

NT CINE CAMERAS

TELEVISION CAMERAS

OPTICAL EQUIPMENT

NT CAMERAS

MICROSCOPES

PHOTOGRAPHY

RT CAMERAS

Thesaurus extract (ISO sample)
standardising the vocabulary
Standardising the Vocabulary
  • Types of entities & forms of terms
  • Singular vs plural
  • Homonyms
  • Choice of terms
  • Scope notes and history notes
compound terms
Compound terms
  • Terms should be factored into simpler elements to improve user’s understanding.
  • Semantic factoring
  • Syntactic factoring
semantic relationships
Semantic Relationships
  • Equivalence
    • Establishing relationships between preferred (postable) and non-preferred (non-postable) terms
  • Hierarchical
    • Establishing relationships between subordinate and superordinate terms. These may be distinguished as:
      • Generic
      • Whole-part
      • Instance
  • Associative
    • Establishing relationships between terms that are mentally associated, but not equivalent or hierarchical
but the functions thesaurus
… but, the Functions thesaurus

Whereas

  • agenda papers might have
    • broader termdocuments

In a functions thesaurus

  • agenda papers might have
    • broader termmeetings
applying a functional thesaurus
Applying a functional thesaurus

Top Term

  • PERSONNEL

Scope Notes The function of managing all employees ……

Related Terms

  • COMPENSATION
  • ESTABLISHMENT
  • INDUSTRIAL RELATIONS etc, etc

Narrower Terms

  • ALLOWANCES
  • APPEALS (Decisions)
  • APPOINTMENT
  • ARRANGEMENTS
  • AUTHORISATION
  • COMMITTEES
  • COMPLIANCE etc, etc

Use For Terms

  • Employees
  • Public Servants
  • Staff
thesaurus display
Thesaurus Display
  • Alphabetical hierarchies
    • One level above and below entry term
    • Complete hierarchy for each term or separate TT display
  • Permuted term lists
  • Combination with classification notation
  • Graphic Displays
applying a thesaurus
Applying a thesaurus

Download Term Tree from http://www.termtree.com.au

Free trial download from

thesaurus software
Thesaurus software
  • Assigned
  • Integrated database
  • Deriving terminology
thesaurus software assigned
Thesaurus software - assigned

Terms are assigned by vocabulary specialists in independent database

  • a.k.a.™
    • Synercon Management Consulting
  • MultiTes
  • OpenCyc
  • SuperTHES
    • from THESmain/THESshow for mono-/multilingual thesauri
  • Term Tree 2000
  • WebChoir
  • Wordmap
thesaurus software integrated database
Thesaurus software – integrated database

Terms are assigned by specialists, thesaurus works like active data dictionary to control database

  • BASIS
  • InMagic Bibliotech PRO
  • BRS/Search
  • STAR
thesaurus software for deriving terminology
Thesaurus software for deriving terminology

Terms are created automatically from text

  • Entrieva
    • SemioTagger™, SemioMap™ and SemioSkyline™ for viewing
  • Intology
    • taxonomy builder
  • Verity
    • Thematic Mapping
  • Autonomy
    • taxonomy generation & categorization
thesaurus building 1
Thesaurus Building - 1
  • Users
    • Define
    • Identify needs
    • Define Thesaurus range & depth
  • Raw vocabulary building
    • Identify sources
    • Collect and record terms
thesaurus building 2
Thesaurus Building -2
  • Vocabulary organisation
    • Cluster terms
    • Establish relationships using symbols
  • Maintenance
business application
Business application
  • Not long term collaborative efforts of classification specialists
    • Instead, adapt to business changes
  • Not just descriptions of present business processes
    • Instead, reflect strategic planning, competitors
  • Not necessarily a single taxonomy
    • Instead, multiple overlapping taxonomies
content management
Content management
  • Describe content as it’s being created rather than classify after creation
  • User-needs orientation
integrating taxonomies
Integrating taxonomies
  • Accurate reporting
  • Exchange of data
  • Assist resource discovery
    • Information retrieval
thesaurus evaluation
Thesaurus evaluation
  • Qualities
  • Information retrieval evaluation
thesaurus qualities
Thesaurus Qualities
  • Scope and features description
  • Display forms
  • Correctness of hierarchies
  • Use of scope, history and qualification
  • Adherence to standards
  • Syndetic measures
    • Connectedness
    • Accessibility
thesauri retrieval evaluation
Thesauri & Retrieval evaluation
  • Cranfield experiments & since
  • Recall and precision
  • Influence on indexing
    • Conceptual analysis
    • Translation failure
    • Omissions
    • Exhaustivity/Specificity
    • Syntax and ‘false drops’
  • Maintenance costs
post controlled vocabularies
Post-controlled vocabularies
  • Use of a ‘Hedge’ of terms to represent a broad concept, eg:
    • ‘psychological aspects of..........’
    • ‘........in Australia’
    • ‘....review items on.....’
still to come
Still to come ……

Research areas

  • Metathesauri
    • Super – interlinked vocabularies (e.g. NLM)
  • Semantic Web
    • Enhancing word association with usage statistics like links (e.g. THESUS)
review
Review
  • Controlled vocabulary types
  • Software support
  • Business processes
  • Website
    • http://sky.fit.qut.edu.au/~middletm/cont_voc.html
    • (about to move to database driven site – redirection will be applied)
ad