Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih - PowerPoint PPT Presentation

Controlled Vocabularies in Searching
1 / 33

  • Uploaded on
  • Presentation posted in: General

Controlled Vocabularies in Searching. Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih.gov. Definition Purpose and Role A Brief History Who is in Control? Spell Checkers. Folksonomies Tagging Search Focus Search refinement Web X.Y. Controlled Vocabularies.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Tamas doszkocs ph d computer scientist doszkocs nlm nih

Controlled Vocabularies in Searching

Tamas Doszkocs, Ph.D.Computer Scientistdoszkocs@nlm.nih.gov

Controlled vocabularies


Purpose and Role

A Brief History

Who is in Control?

Spell Checkers



Search Focus

Search refinement

Web X.Y

Controlled Vocabularies

Related topics that we won t talk about

Related Topics(that we won’t talk about)

Definition and purpose

Definition and Purpose

  • Controlled vocabulary is a list of terms that have been enumerated explicitly.

  • In Library and Information Science Controlled vocabulary is a carefully selected list of words and phrases, which are used to tag units of information so that they may be more easily retrieved by a search. The terms are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's text. Fully developed controlled vocabulary systems, such as the Library of Congress Subject Headings, are often published in a reference work that is called a thesaurus. Controlled vocabularies form part of a larger universe of nomenclatural approaches to data classification called metadata. (Wikipedia)

More information

More Information

  • Bridging the gap between languages used by authors, search systems and users:

  • http://sky.fit.qut.edu.au/~middletm/cont_voc.html

  • http://www.controlledvocabulary.com/

  • http://php.iupui.edu/~kcmcreyn/su03/control.html

  • http://www.hsl.creighton.edu/hsl/Searching/c-vocab1.html

  • http://www.dlese.org/Metadata/vocabularies/term_expln.htm

A brief history

A Brief History

  • The 1970’s and 1980’s: bloody battles and casualties

    • Controlled vocabularies vs. natural language

    • Command languages vs. free-form queries

    • CVs vs. abstracts vs. full text

    • Librarians vs. end users

  • The 1990’s and the Web: natural language for the masses

  • The 21st Century: the best of both worlds

Vocabulary control for information retrieval 1972

Vocabulary Control for Information Retrieval, 1972

  • by F. Wilfrid Lancaster

  • About this title: Contents- * Why Vocabulary Control? * Pre-coordinate & Post-coordinate Systems * Vocabulary Structure & Display * Gathering the Raw Material * Standards & Guidelines * Organization of Terms: The Hierarchical Relationship * Organization of Terms: The Associative Relationship * Terms: Form & Compounding * The Entry Vocabulary * Homography & Scope Notes * Thesaurus Display * Vocabulary Growth Updating * The Role of the Computer * Identifiers & Checklists * The Influences of Vocabulary on the Performance of a Retrieval System * Evaluation of Thesauri * Natural-language Searching & the Post-controlled Vocabulary * Hybrid Systems * Compatibility & Convertibility * Multilingual Aspects * Automatic Approaches to Thesaurus Construction * Some Cost-effectiveness Aspects of Vocabulary Control * Bibliography * Index. "The publisher's announcement claims that the original edition is an information science classic that has emerged as the 'bible' of indexing & retrieval vocabularies, & (is the) first definitive monograph devoted exclusively to controlled vocabularies in information retrieval. ..

An associative interactive dictionary for online searching 1978

An Associative Interactive Dictionary for Online Searching, 1978

  • Title: AID, an Associative Interactive Dictionary for Online Searching.

  • Authors: Doszkocs, Tamas E.

  • Descriptors:

  • Dictionaries - Information Retrieval - Online Systems - Search Strategies -  Tables (Data) - Word Frequency

  • Source: On-Line Review, v2 n2 p163-73 Jun 1978, Jun78

  • AID meta-searched MEDLINE, TOXLINE and the Hepatitis Databank and displayed result clusters of keywords and MeSH headings

Cite 1979

CITE, 1979

  • Doszkocs T. E., Rapp B. A. Searching Medline in English: A prototype user interface with natural language query, ranked output and relevance feedback. Proc. ASlS Annu. Meet. Vol 16 pp 131-137 1979.

  • Automatic suggestion of Medical Subject Headings

  • Used as NLM’s OPAC 1979-1984

Webline 1994

WebLine, 1994

  • The first Web interface to an online retrieval system

    • Associative Concept Navigation in MEDLINE and other NLM Databases via a Mosaic - Forms - WWW Interface Combining Natural Language Processing, Expert Systems and (un)Conventional Information Retrieval Techniques; Tamas E. Doszkocs, Seth B. Widoff, Bruno M. VastaNational Library of Medicinein Proceedings of the Second World Wide Web Conference , Chicago 1994

    • http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/Searching/doszkocs/doszkocs.html

  • see also WebCrawler (Brian Pinkerton)

  • The Open Web and the Hidden Web

Jerry s guide to the web 1994

Jerry’s Guide to the Web, 1994

  • Jerry Yang and David Filo’s Yahoo! 1995

    • a directory of web sites, organized in a hierarchy of subject descriptors

    • Librarians at Yahoo

      • Surfing is to Yahoo! what the Dewey Decimal System is to libraries. In other words, Surfing is the categorization of websites. It also happens to be how Yahoo! began. Today our Surfing team continues its passion for finding, evaluating, and organizing information on the Internet. They have a voracious appetite for learning about new topics. They are curious individuals who are skilled at intuitively and efficiently analyzing and classifying diverse, unstructured pieces of information across the Yahoo! network. Surfers are critical to the relevance and intuitive nature of information presented on Yahoo!. http://careers.yahoo.com/job_descriptions.html

The remains of the yahoo directory

The Remains of the Yahoo Directory

Open directory project

Open Directory Project

Transparent query mapping to controlled vocabulary terms

Transparent Query Mapping to Controlled Vocabulary Terms

Spell checking as a controlled vocabulary application

Spell Checking as a “Controlled Vocabulary” Application

Correct spelling correct results

Correct spelling, correct results

Folksonomies and social tagging

Folksonomies and Social Tagging

Tagging in flickr

Tagging in Flickr

Query refinement with phrases

Query Refinement with Phrases

Query refinement with subject headings

Query Refinement with Subject Headings

Focusing in search results with topical clusters

Focusing in Search Results with Topical Clusters

Clustering of search results with phrases

Clustering of Search Results with Phrases

Clustering and search refinement with natural language and controlled vocabularies

Clustering and Search Refinement with Natural Language and Controlled Vocabularies

Clustering with multiple criteria

Clustering with Multiple Criteria

Analyzing search results

Analyzing Search Results

Visualizing search results

Visualizing Search results

Multi faceted clustering in an opac

Multi-faceted Clustering in an OPAC

Allplus web 2 0 content mashup

AllPlus Web 2.0 Content Mashup

Allplus dynamic cluster visualization

AllPlus Dynamic Cluster Visualization

Tamas doszkocs ph d computer scientist doszkocs nlm nih

Controlled Vocabularies in Searching

Tamas Doszkocs, Ph.D.Computer Scientistdoszkocs@nlm.nih.gov

  • Login