1 / 27

Taxonomies, Lexicons and Organizing Knowledge

Taxonomies, Lexicons and Organizing Knowledge. Wendi Pohs, IBM Software Group. Agenda. Benefits, business and technical A few definitions Planning Issues Measuring value Futures Q&A. The Mantra.

mahala
Download Presentation

Taxonomies, Lexicons and Organizing Knowledge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taxonomies, Lexicons and Organizing Knowledge Wendi Pohs, IBM Software Group

  2. Agenda • Benefits, business and technical • A few definitions • Planning • Issues • Measuring value • Futures • Q&A IBM Software Group

  3. The Mantra • Knowledge is in the eye of the beholder, but reflecting end user needs is as critical as representing texts....and it takes work! IBM Software Group

  4. If only I could find information to help me do my job better ... Business Benefits Mergers and acquisitions Research and development Industries: Consulting Pharmaceuticals Financial services Legal IBM Software Group

  5. Technical Benefits • Site creation • Navigation/search • Personalization • Defining areas of expertise IBM Software Group

  6. Definitions: Taxonomy • “The science, laws or principles of classification” (From the Greek: rules of arrangement) • Biology (Linnaeus) • Education (Bloom) • A hierarchical collection of categories and documents • Structure and content IBM Software Group

  7. More general than taxonomy • Natural structure • Wide vs deep • Category structure less controlled • File system • Yahoo (http://www.yahoo.com) • Yellow Pages • Corporate Web sites (http://www.ibm.com) Definitions: Directory IBM Software Group

  8. Definitions: Thesaurus • Controlled vocabulary • Subject headings, labels • Synonyms (U, UF) • Relation types (TT, BT, NT,SN, HN, RT, SA) • Examples: http://www.loc.gov/flicc/wg/taxonomy.html IBM Software Group

  9. Meta-data • Properties, attributes: information describing types of data [Crandall] • The ‘energy’ required to keep things organized [Earley] • Tagging • <META>, <Source> • Document Properties Definitions: Meta-data and tagging IBM Software Group

  10. Definitions: Classification • Analyzing documents and assigning them to predefined categories • Rule-based vs natural • Classification schemes • Dewey • Library of Congress • Industry-specific IBM Software Group

  11. Clustering • Automatically generating groups of similar documents based on distance or proximity measures • "Bags of words" • Vector analysis determines boundaries • Adaptive, but not abstract Definitions: Clustering IBM Software Group

  12. Determine user information needs • Information audit, Content audit • Select appropriate sources • Create initial taxonomy • Edit categories • Categorize new documents • Test the UI • Train the taxonomy Develop a Plan IBM Software Group

  13. What is the objective of the system? • Who owns the project? • What do users need? • What do content creators need? • What do system managers need? Plan: Information audit IBM Software Group

  14. Is there an existing taxonomy? • How clean is the meta-data? • Is the content suited to automatic classification techniques? • Good example: Notes discussion databases • Not-so-good example: Web site with little text, lots of links • Is a subset of a source better than the whole? Plan: Content audit IBM Software Group

  15. Which sources? • Who owns them? • Which sources do users access most often? • How do users access these sources? • What is the lifecycle of the content? • Who identifies the most current content? Plan: Select sources IBM Software Group

  16. Plan: Maintenance • Resources • Centralized or department-level • Who decides when new content is added? • Term approval process • How do new concepts get into the taxonomy? IBM Software Group

  17. Getting user involvement and buy-in • Maintenance resources • Directory versus taxonomy • Meta-data • Globalization and regionalization • Hidden vs published taxonomies Identify issues IBM Software Group

  18. Organizational “perfection complex” [Chait] • Multiple taxonomies • Automated versus manual categorization Understand the BIG issues IBM Software Group

  19. Many editors • Term approval process, synonyms • Standard tools across the enterprise • Federated taxonomies • Taxonomy links, “cross-connections,” facets, views • Taxonomy mapping Multiple taxonomies IBM Software Group

  20. IBM Software Group

  21. NCR Corporation - Support Organization • Needed to convince organization of the value of captured content • Managers resisted diverting resources to maintaining content • Current measure: Time per incident • How could the value of a knowledge classification system be demonstrated? Measuring value IBM Software Group

  22. NCR developed a new parameter: • Knowledge helpful (the answer was in the support database and was used to solve the problem) • Knowledge not effective (the answer sent them in the wrong direction, did not help to address the issue) • Knowledge not available (nothing available to assist in solving the problem) • Knowledge not required (problem solved without the use of the knowledge base) Measuring value IBM Software Group

  23. Methods: • Feature extraction, statistical analysis, rules-based, label generation • Starter taxonomies, imports • Taxonomy mapping • Interfaces: Visualization, better training tools Futures IBM Software Group

  24. ? Q&A IBM Software Group

More Related