Controlled vocabularies thesauri and information retrieval
Download
1 / 42

Controlled vocabularies: Thesauri and information retrieval - PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on

Michael Middleton QUT School of Information Systems, Brisbane, Australia [email protected] for STIMULATE 5 Vrije Universiteit Brussel Brussels, Belgium July, 2005. Controlled vocabularies : Thesauri and information retrieval. Introduction. Context ….. History

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Controlled vocabularies: Thesauri and information retrieval' - marvene


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Controlled vocabularies thesauri and information retrieval l.jpg

Michael Middleton

QUT School of Information Systems, Brisbane, Australia

[email protected]

for

STIMULATE 5

Vrije Universiteit Brussel

Brussels, Belgium

July, 2005

Controlled vocabularies:Thesauri and information retrieval


Introduction l.jpg
Introduction

  • Context ….. History

  • Vocabulary principles

  • Thesaurus software

  • Thesaurus building …. application

  • Thesaurus evaluation

  • The future


Context information life cycle l.jpg
Context: Information life cycle

create

  • Organise to maintain

distribute

dispose

store

use

reuse

maintain

recall


Context information management l.jpg
Context: Information management

Domains

  • Operational

  • Analytical

  • Strategic


Context indexing l.jpg
Context: indexing

  • Producing representations of records or documents that constitute a finding aid to the records in a database or to part of a document

    • Assigned indexing

    • Derived indexing


Indexer qualities l.jpg
Indexer qualities

  • The ‘Art’ of assigned indexing:

    • Empathy

    • Meticulousness

    • Consistency

    • General knowledge

    • Patience


Indexing guidelines l.jpg
Indexing guidelines

  • Conceptual analysis and assigning

  • Aboutness

  • Elements of the document to consider

  • Exhaustivity

  • Specificity

  • Index what is in the item

  • Co-ordination


Assigned index representations l.jpg
Assigned index representations

  • Alphabetical Subject

  • Classified

    • Alphabetical

    • Notation

  • Chain


Indexing exercise l.jpg
Indexing exercise

How consistent is database indexing?

Example: the same paper in multiple databases:

Middleton, M Skills expectations of library graduates http://eprints.qut.edu.au/archive/00000094/

  • Index it yourself

  • Compare your indexing with others

  • Compare the indexing in ERIC and INSPEC


Context metadata l.jpg
Context: metadata

  • Agent

    • Document description

    • Responsibility

    • Administrative

    • Provenance

    • Connections

    • Conditions of use


Context metadata11 l.jpg
Context: metadata

  • Content

    • Topic (application of vocabulary control)

    • Coverage

    • Role


Controlled vocabulary l.jpg
Controlled vocabulary

  • Thesaurus

    • A controlled vocabulary of terms in natural language that are designed for post-coordination

  • Classification scheme

    • A scheme for organisation by categories in a systematic manner; this may involve grouping by subject, function or other criteria, or determining document naming conventions

    • Often involves notation


Purpose l.jpg
Purpose

  • Indexing by translating diverse natural language to consistent terminology

  • Establishing relationships among terms

  • Information retrieval improving precision and recall


History l.jpg
History

  • Bibliographic databases

    • Many applications, list of online associated thesauri and classification schemes at http://sky.fit.qut.edu.au/~middletm/cont_voc.html

  • Standards

    • ISO2788; ISO 5964

    • ANSI Z39.19


Thesaurus principles l.jpg
Thesaurus principles

  • Term relationships

  • Continuing evolution

  • Internally consistent hierarchies to support database searching


The thesaurus l.jpg
The Thesaurus

  • The vocabulary of a controlled indexing language formally organised so that the a priori relationships between concepts are made explicit.

  • A thesaurus is an example of metadata


Thesaurus extract iso sample l.jpg

35 mm CAMERAS

BT MINIATURE CAMERAS

CAMERAS

BT OPTICAL EQUIPMENT

NT MOVING PICTURE CAMERAS

STEREO CAMERAS

STILL CAMERAS

UNDERWATER CAMERAS

RT PHOTOGRAPHY

CINE CAMERAS

BT MOVING PICTURE CAMERAS

NT UNDERWATER CINE CAMERAS

RT CINEMA

CINEMA

RT CINE CAMERAS

DIVING

RT UNDERWATER CAMERAS

INSTANT PICTURE CAMERAS

SN Cameras which produce a finished

print directly

BT STILL CAMERAS

Land cameras USE VIEW CAMERAS

MICROSCOPES

BT OPTICAL EQUIPMENT

MINIATURE CAMERAS

BT STILL CAMERAS

NT 35 mm CAMERAS

MOVING PICTURE CAMERAS

BT CAMERAS

NT CINE CAMERAS

TELEVISION CAMERAS

OPTICAL EQUIPMENT

NT CAMERAS

MICROSCOPES

PHOTOGRAPHY

RT CAMERAS

Thesaurus extract (ISO sample)


Standardising the vocabulary l.jpg
Standardising the Vocabulary

  • Types of entities & forms of terms

  • Singular vs plural

  • Homonyms

  • Choice of terms

  • Scope notes and history notes


Compound terms l.jpg
Compound terms

  • Terms should be factored into simpler elements to improve user’s understanding.

  • Semantic factoring

  • Syntactic factoring


Semantic relationships l.jpg
Semantic Relationships

  • Equivalence

    • Establishing relationships between preferred (postable) and non-preferred (non-postable) terms

  • Hierarchical

    • Establishing relationships between subordinate and superordinate terms. These may be distinguished as:

      • Generic

      • Whole-part

      • Instance

  • Associative

    • Establishing relationships between terms that are mentally associated, but not equivalent or hierarchical


But the functions thesaurus l.jpg
… but, the Functions thesaurus

Whereas

  • agenda papers might have

    • broader termdocuments

      In a functions thesaurus

  • agenda papers might have

    • broader termmeetings


Applying a functional thesaurus l.jpg
Applying a functional thesaurus

Top Term

  • PERSONNEL

    Scope Notes The function of managing all employees ……

    Related Terms

  • COMPENSATION

  • ESTABLISHMENT

  • INDUSTRIAL RELATIONS etc, etc

    Narrower Terms

  • ALLOWANCES

  • APPEALS (Decisions)

  • APPOINTMENT

  • ARRANGEMENTS

  • AUTHORISATION

  • COMMITTEES

  • COMPLIANCE etc, etc

    Use For Terms

  • Employees

  • Public Servants

  • Staff


Thesaurus display l.jpg
Thesaurus Display

  • Alphabetical hierarchies

    • One level above and below entry term

    • Complete hierarchy for each term or separate TT display

  • Permuted term lists

  • Combination with classification notation

  • Graphic Displays


Applying a thesaurus l.jpg
Applying a thesaurus

Download Term Tree from http://www.termtree.com.au

Free trial download from


Thesaurus software l.jpg
Thesaurus software

  • Assigned

  • Integrated database

  • Deriving terminology


Thesaurus software assigned l.jpg
Thesaurus software - assigned

Terms are assigned by vocabulary specialists in independent database

  • a.k.a.™

    • Synercon Management Consulting

  • MultiTes

  • OpenCyc

  • SuperTHES

    • from THESmain/THESshow for mono-/multilingual thesauri

  • Term Tree 2000

  • WebChoir

  • Wordmap


Thesaurus software integrated database l.jpg
Thesaurus software – integrated database

Terms are assigned by specialists, thesaurus works like active data dictionary to control database

  • BASIS

  • InMagic Bibliotech PRO

  • BRS/Search

  • STAR


Thesaurus software for deriving terminology l.jpg
Thesaurus software for deriving terminology

Terms are created automatically from text

  • Entrieva

    • SemioTagger™, SemioMap™ and SemioSkyline™ for viewing

  • Intology

    • taxonomy builder

  • Verity

    • Thematic Mapping

  • Autonomy

    • taxonomy generation & categorization


Thesaurus building 1 l.jpg
Thesaurus Building - 1

  • Users

    • Define

    • Identify needs

    • Define Thesaurus range & depth

  • Raw vocabulary building

    • Identify sources

    • Collect and record terms


Thesaurus building 2 l.jpg
Thesaurus Building -2

  • Vocabulary organisation

    • Cluster terms

    • Establish relationships using symbols

  • Maintenance


Business application l.jpg
Business application

  • Not long term collaborative efforts of classification specialists

    • Instead, adapt to business changes

  • Not just descriptions of present business processes

    • Instead, reflect strategic planning, competitors

  • Not necessarily a single taxonomy

    • Instead, multiple overlapping taxonomies


Content management l.jpg
Content management

  • Describe content as it’s being created rather than classify after creation

  • User-needs orientation


Integrating taxonomies l.jpg
Integrating taxonomies

  • Accurate reporting

  • Exchange of data

  • Assist resource discovery

    • Information retrieval


Thesaurus evaluation l.jpg
Thesaurus evaluation

  • Qualities

  • Information retrieval evaluation


Thesaurus qualities l.jpg
Thesaurus Qualities

  • Scope and features description

  • Display forms

  • Correctness of hierarchies

  • Use of scope, history and qualification

  • Adherence to standards

  • Syndetic measures

    • Connectedness

    • Accessibility


Thesauri retrieval evaluation l.jpg
Thesauri & Retrieval evaluation

  • Cranfield experiments & since

  • Recall and precision

  • Influence on indexing

    • Conceptual analysis

    • Translation failure

    • Omissions

    • Exhaustivity/Specificity

    • Syntax and ‘false drops’

  • Maintenance costs


Post controlled vocabularies l.jpg
Post-controlled vocabularies

  • Use of a ‘Hedge’ of terms to represent a broad concept, eg:

    • ‘psychological aspects of..........’

    • ‘........in Australia’

    • ‘....review items on.....’


Still to come l.jpg
Still to come ……

Research areas

  • Metathesauri

    • Super – interlinked vocabularies (e.g. NLM)

  • Semantic Web

    • Enhancing word association with usage statistics like links (e.g. THESUS)


Review l.jpg
Review

  • Controlled vocabulary types

  • Software support

  • Business processes

  • Website

    • http://sky.fit.qut.edu.au/~middletm/cont_voc.html

    • (about to move to database driven site – redirection will be applied)



ad