1 / 39

Taxonomies and Indexing: A Technical Strategy

Taxonomies and Indexing: A Technical Strategy. Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc. Context. Techniques and approaches developed by & for libraries and other institutions responsible for preserving the human record Broad scope

Download Presentation

Taxonomies and Indexing: A Technical Strategy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taxonomies and Indexing: A Technical Strategy Diane Vizine-Goetz Office of Research OCLC Online Computer Library Center, Inc.

  2. Context • Techniques and approaches developed by & for libraries and other institutions responsible for preserving the human record • Broad scope • Long tradition of information organization

  3. Why organize information? • For • Search and retrieval • Use • Preservation & disposition

  4. Why Organize Information by Subject? • Find information on a particular subject • Only and all relevant information • precision • recall • Find related information

  5. How? • Subject analysis • Conceptual analysis--Determining what an information object is “about” • Translate concepts into knowledge organization (KO) scheme • e.g., Subject indexes • Thesauri • Classification scheme • Automated, Semi-automated, Human/Intellectual

  6. Automation & Subject Analysis

  7. Automated Concept Identification • Automated Indexing • Ranges from simply identifying words in a document, to • Sophisticated analyses that identify key names, words, and phrases • WordSmith Project http://orc.rsch.oclc.org:5061/ • Automated Classification • Automated assignment of documents to categories or classes

  8. Political News Concepts Extracted by WordSmith fair housing fair housing act family planning family planning programmes family planning programs family planning services federal government federal government deficit federal reserve federal reserve bank federal reserve board federal reserve chairman alan greenspan federal reserve system

  9. Advantages of automatic concept identification • Inexpensive • Suitable for indexing/categorizing large quantities of text • Can identify popular and emerging concepts and terminology

  10. Why use knowledge organization schemes? • Knowledge organization schemes such as subject heading lists, thesauri, & classification schemes are specialized languages designed for retrieving information • Goal--to reduce ambiguities that cause precision & recall failures

  11. WordSmith family planning family planning programmes family planning programs family planning services Library of Congress Subject Headings (LCSH) Birth control clinics UF Family planning services Planned parenthood services BT Clinics 19860211 Free text v.s. controlled subject retrieval language

  12. Family Planning Note: Programs or services designed to assist the family in controlling reproduction by either improving or diminishing fertility. Entry Term Birth Control Planned Parenthood Basal Body Temperature Method Birth Limiting Births Averted Family Planning Surveys ... Birth control(19880919) UF Family planning Planned parenthood Population control Pregnancy--Prevention BT Hygiene, Sexual Sexual ethics RT Contraception Family size NT Abortion Birth Intervals Childlessness ... MeSH Heading vs. LCSH

  13. Characteristics of subject retrieval languages • Terminology is often domain specific • Medicine > MeSH; Engineering > INSPEC; Agriculture > Agrovoc • Control vocabulary (synonyms & homonyms) • Express relationships between terms

  14. Ei Thesaurus TM Bank protection UF Coastal engineering--Bank protection Inland waterways--Bank protection SN Protection of river banks and lake shores. For seacoasts, use SHORE PROTECTION DT January 1993 BT Protection RT Banks (bodies of water) Coastal engineering Environmental engineering Erosion Inland waterways River control Shore protection Slope protection Soil conservation MC 407.2; 407.3 OC 914.1 Within a domain, terms are context independent

  15. Controlled Vocabulary • Preferred way of expressing a concept • e.g., Popular vs. technical • Heart attack vs. Myocardial infarction • Non-used vocabulary often included • Synonyms • Current/Outdated terms > Disabled/Handicapped • Lexical variants • Phrase/Inverted forms > Bilingual education/Education, Bilingual • Quasi-Synonyms • Synonyms/Antonyms > Literacy/Illiteracy

  16. Relationships • Equivalence • Synonymous terms • Hierarchy • Generic relationship (kind) • Whole-part relationship • Instance relationship (example) • Association

  17. Subject Retrieval using a controlled vocabulary

  18. Related Terms in LCSH

  19. Classification / Categorization System • A systematic arrangement of knowledge into useful categories • General schemes & special schemes • DDC, LCC, UDC & AGRIS, MSC • Present a generalized view of knowledge at varying levels of depth • May be enumerative or synthetic

  20. Some Advantages of Traditional Schemes • Meaningful notation • Well-developed hierarchies • Well-defined categories • Rich network of relationships

  21. Meaningful Notation (DDC) 005.1 Programming 005.1 Programmation 005.1 Программирование 005.1 Programación

  22. DDC Notation Indicates Hierarchy 600 Technology 630 Agriculture 633 Field and plantation crops 633.1 Cereals 633.11 Wheat 633.12 Buckwheat 633.13 Oats

  23. Well-developed Hierarchies

  24. Hierarchies & Categories • Hierarchical from general to specific • Categories have superordinate, coordinate, subordinate relationships in hierarchy • Subcategories must be mutually exclusive

  25. Hierarchies & Categories • Top > Recreation > Automotive > Driving > Road Rage • Social Problems > Public Safety > Traffic Hazards > Highways > Road Rage

  26. Hierarchies, Categories, Relationships 500 Science 510 Mathematics 512 Algebra, number theory 512.3 Fields Class here field theory, Galois theory Class linear algebra in 512.5; class number theory in 512.7

  27. Advantages of Category Schemes • Facilitate retrieval based on concepts not simply keywords • Provide context for search terms (disambiguates) • Facilitate browsing & search refinement

  28. Advantages & Disadvantages of Formal KO Schemes + • Bring like items together • Provide context & show relationships • Support browsing • May accommodate multilingual usage - • Reactive to emerging topics • Terminology may not match users • Not practical to apply to everything

  29. Advantages & Disadvantages of Free Text + • Latest terminology • Application not an issue - • User must to produce synonyms and relationships • Limited browsing • Little multilingual support

  30. Other Solutions • Combine approaches • Map among KO schemes • Map free text terms to KO schemes • Produce supplemental browsable indexes from free text

  31. Resources • ANSI/NISO Z39.19-1993 (Revision of ANSI Z39.19-1980) Guidelines for the Construction, Format, and Management of Monolingual Thesauri <http://www.niso.org/stantech.html#z3919> • Controlled vocabularies, thesauri and classification systems available in the WWW. DC Subject <http://www.lub.lu.se/metadata/subject-help.html> • The Intellectual Foundation of Information Organizationby Elaine Svenonius. MIT Press; ISBN: 0262194333 • List of Web Subject Resources <http://www.loc.gov/catdir/pcc/saco/resources.html> • The Organization of Information (Library and Information Science Text Series) by Arlene G. Taylor. Libraries Unlimited; ISBN: 1563084988 • Resources for Indexers <http://www.asindexing.org/asires.shtml>

More Related