1 / 33

Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih

Controlled Vocabularies in Searching. Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih.gov. Definition Purpose and Role A Brief History Who is in Control? Spell Checkers. Folksonomies Tagging Search Focus Search refinement Web X.Y. Controlled Vocabularies.

natan
Download Presentation

Tamas Doszkocs, Ph.D. Computer Scientist doszkocs@nlm.nih

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Controlled Vocabularies in Searching Tamas Doszkocs, Ph.D.Computer Scientistdoszkocs@nlm.nih.gov

  2. Definition Purpose and Role A Brief History Who is in Control? Spell Checkers Folksonomies Tagging Search Focus Search refinement Web X.Y Controlled Vocabularies

  3. Related Topics(that we won’t talk about)

  4. Definition and Purpose • Controlled vocabulary is a list of terms that have been enumerated explicitly. • In Library and Information Science Controlled vocabulary is a carefully selected list of words and phrases, which are used to tag units of information so that they may be more easily retrieved by a search. The terms are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document's text. Fully developed controlled vocabulary systems, such as the Library of Congress Subject Headings, are often published in a reference work that is called a thesaurus. Controlled vocabularies form part of a larger universe of nomenclatural approaches to data classification called metadata. (Wikipedia)

  5. More Information • Bridging the gap between languages used by authors, search systems and users: • http://sky.fit.qut.edu.au/~middletm/cont_voc.html • http://www.controlledvocabulary.com/ • http://php.iupui.edu/~kcmcreyn/su03/control.html • http://www.hsl.creighton.edu/hsl/Searching/c-vocab1.html • http://www.dlese.org/Metadata/vocabularies/term_expln.htm

  6. A Brief History • The 1970’s and 1980’s: bloody battles and casualties • Controlled vocabularies vs. natural language • Command languages vs. free-form queries • CVs vs. abstracts vs. full text • Librarians vs. end users • The 1990’s and the Web: natural language for the masses • The 21st Century: the best of both worlds

  7. Vocabulary Control for Information Retrieval, 1972 • by F. Wilfrid Lancaster • About this title: Contents- * Why Vocabulary Control? * Pre-coordinate & Post-coordinate Systems * Vocabulary Structure & Display * Gathering the Raw Material * Standards & Guidelines * Organization of Terms: The Hierarchical Relationship * Organization of Terms: The Associative Relationship * Terms: Form & Compounding * The Entry Vocabulary * Homography & Scope Notes * Thesaurus Display * Vocabulary Growth Updating * The Role of the Computer * Identifiers & Checklists * The Influences of Vocabulary on the Performance of a Retrieval System * Evaluation of Thesauri * Natural-language Searching & the Post-controlled Vocabulary * Hybrid Systems * Compatibility & Convertibility * Multilingual Aspects * Automatic Approaches to Thesaurus Construction * Some Cost-effectiveness Aspects of Vocabulary Control * Bibliography * Index. "The publisher's announcement claims that the original edition is an information science classic that has emerged as the 'bible' of indexing & retrieval vocabularies, & (is the) first definitive monograph devoted exclusively to controlled vocabularies in information retrieval. ..

  8. An Associative Interactive Dictionary for Online Searching, 1978 • Title: AID, an Associative Interactive Dictionary for Online Searching. • Authors: Doszkocs, Tamas E. • Descriptors: • Dictionaries - Information Retrieval - Online Systems - Search Strategies -  Tables (Data) - Word Frequency • Source: On-Line Review, v2 n2 p163-73 Jun 1978, Jun78 • AID meta-searched MEDLINE, TOXLINE and the Hepatitis Databank and displayed result clusters of keywords and MeSH headings

  9. CITE, 1979 • Doszkocs T. E., Rapp B. A. Searching Medline in English: A prototype user interface with natural language query, ranked output and relevance feedback. Proc. ASlS Annu. Meet. Vol 16 pp 131-137 1979. • Automatic suggestion of Medical Subject Headings • Used as NLM’s OPAC 1979-1984

  10. WebLine, 1994 • The first Web interface to an online retrieval system • Associative Concept Navigation in MEDLINE and other NLM Databases via a Mosaic - Forms - WWW Interface Combining Natural Language Processing, Expert Systems and (un)Conventional Information Retrieval Techniques; Tamas E. Doszkocs, Seth B. Widoff, Bruno M. VastaNational Library of Medicinein Proceedings of the Second World Wide Web Conference , Chicago 1994 • http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/Searching/doszkocs/doszkocs.html • see also WebCrawler (Brian Pinkerton) • The Open Web and the Hidden Web

  11. Jerry’s Guide to the Web, 1994 • Jerry Yang and David Filo’s Yahoo! 1995 • a directory of web sites, organized in a hierarchy of subject descriptors • Librarians at Yahoo • Surfing is to Yahoo! what the Dewey Decimal System is to libraries. In other words, Surfing is the categorization of websites. It also happens to be how Yahoo! began. Today our Surfing team continues its passion for finding, evaluating, and organizing information on the Internet. They have a voracious appetite for learning about new topics. They are curious individuals who are skilled at intuitively and efficiently analyzing and classifying diverse, unstructured pieces of information across the Yahoo! network. Surfers are critical to the relevance and intuitive nature of information presented on Yahoo!. http://careers.yahoo.com/job_descriptions.html

  12. The Remains of the Yahoo Directory

  13. Open Directory Project

  14. Transparent Query Mapping to Controlled Vocabulary Terms

  15. Spell Checking as a “Controlled Vocabulary” Application

  16. Correct spelling, correct results

  17. Folksonomies and Social Tagging

  18. Tagging in Flickr

  19. Query Refinement with Phrases

  20. Query Refinement with Subject Headings

  21. Focusing in Search Results with Topical Clusters

  22. Clustering of Search Results with Phrases

  23. Clustering and Search Refinement with Natural Language and Controlled Vocabularies

  24. Clustering with Multiple Criteria

  25. Analyzing Search Results

  26. Visualizing Search results

  27. Multi-faceted Clustering in an OPAC

  28. AllPlus Web 2.0 Content Mashup

  29. AllPlus Dynamic Cluster Visualization

  30. Controlled Vocabularies in Searching Tamas Doszkocs, Ph.D.Computer Scientistdoszkocs@nlm.nih.gov

More Related