1 / 35

Thesaurus Management Tools

jeffery
Download Presentation

Thesaurus Management Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Thesaurus Management Tools Introduction Who am I and what do I do --taxonomy development --information retrieval Why this presentation --needed a tool --not enough engineering staff for development, so given go-ahead to purchase out-of-the-box software --created spreadsheet comparing characteristics of candidate products Plan --discuss controlled vocabulary concepts --why buy? --what to consider --product taste testing Introduction Who am I and what do I do --taxonomy development --information retrieval Why this presentation --needed a tool --not enough engineering staff for development, so given go-ahead to purchase out-of-the-box software --created spreadsheet comparing characteristics of candidate products Plan --discuss controlled vocabulary concepts --why buy? --what to consider --product taste testing

    2. Definitions

    3. Why control? People just don’t say or spell things the same way People just don’t say or spell things the same way

    4. From a recent gathering of variants in our query logs From a recent gathering of variants in our query logs

    5. Relationships

    6. Hierarchical

    7. Equivalent

    8. Associative

    9. classification

    10. classification

    11. classification IE Call letters IE Controlled Vocabularies IE Call letters IE Controlled Vocabularies

    12. Controlled Vocabularies

    13. Authority Lists

    14. Gazetteers

    15. Gazetteers

    16. Glossaries

    17. Taxonomies vs. Thesauri Both are hierarchical (trees) and usually have associative and equivalent relationships as well Both have applications for indexing, navigation, and search Both typically are built with a specific topic area or collection in mind Both are hierarchical (trees) and usually have associative and equivalent relationships as well Both have applications for indexing, navigation, and search Both typically are built with a specific topic area or collection in mind

    18. Taxonomies vs. Thesauri

    19. Information Access

    20. Information Access

    21. Information Retrieval

    22. Trees & Webs

    23. Topic Maps Based on traditional indexing concepts. --Knowledge structures: topics and relations/associations --Information Resources: occurences Just like SGML was originally developed for print publishing (this is a header, this is body text), topic maps originally conceived for representing indexes for complex information. Evolved into a navigational aid that encompasses the characteristics of taxonomies and thesauri, with particular utility for electronic documentation. Topic map sits above the occurrences. Is not built in response to a body of documents. It is a stand-alone structure to which occurrences attach. Nodes. Based on traditional indexing concepts. --Knowledge structures: topics and relations/associations --Information Resources: occurences Just like SGML was originally developed for print publishing (this is a header, this is body text), topic maps originally conceived for representing indexes for complex information. Evolved into a navigational aid that encompasses the characteristics of taxonomies and thesauri, with particular utility for electronic documentation. Topic map sits above the occurrences. Is not built in response to a body of documents. It is a stand-alone structure to which occurrences attach. Nodes.

    24. Topic Maps

    25. Topic Maps Variety of relationships/associations available. Not limited by three traditional.Variety of relationships/associations available. Not limited by three traditional.

    26. Why buy? Government and Politics vs. Politics and Government Classic Rock 1970s Music Classic Rock Government and Politics vs. Politics and Government Classic Rock 1970s Music Classic Rock

    27. Why buy?

    28. Why buy? Variety of products starting at less than $500 Average full-time worker: --$50,000 and $100,000 per year or $4167 to $8333 per month. --Bureau of Labor Statistics National Compensation Survey: $28-$39 per hour (plus benefits, capital expenses, and other forms of compensation) Quickly exceed budget Can be painful as you have to live with developing productVariety of products starting at less than $500 Average full-time worker: --$50,000 and $100,000 per year or $4167 to $8333 per month. --Bureau of Labor Statistics National Compensation Survey: $28-$39 per hour (plus benefits, capital expenses, and other forms of compensation) Quickly exceed budget Can be painful as you have to live with developing product

    29. Choices --Automated: *little or no human intervention, usually uses rules or training sets, derives vocabulary from collection itself *Sometimes comes with its own built-in vocabulary – very broad. --Manual *You do all the work *Only automatic characteristic is cross-checking references, global changes, report-generating, and sometimes spell-checking, etc. --Bundled *Vocabulary module as part of a larger classification/management package. *However, sometimes the vocabulary module can be purchased separately. --Stand-alone *The product does vocabulary management only. --Single: *Can mean only one workstation (or client) *Can mean data generally is stored on that workstation (although it can be stored on a server * means that only one user at a time can use the tool (no collision monitoring available) --multi-user *many users at one time (collisions detected and managed)--Automated: *little or no human intervention, usually uses rules or training sets, derives vocabulary from collection itself *Sometimes comes with its own built-in vocabulary – very broad. --Manual *You do all the work *Only automatic characteristic is cross-checking references, global changes, report-generating, and sometimes spell-checking, etc. --Bundled *Vocabulary module as part of a larger classification/management package. *However, sometimes the vocabulary module can be purchased separately. --Stand-alone *The product does vocabulary management only. --Single: *Can mean only one workstation (or client) *Can mean data generally is stored on that workstation (although it can be stored on a server * means that only one user at a time can use the tool (no collision monitoring available) --multi-user *many users at one time (collisions detected and managed)

    30. Tasks Vocabulary construction and maintenance Obvious Editing, creating Reporting Term usage Term history Search and indexing Exposed to end users for querying and browsing Exposed to indexers for term assignment Candidate term suggestionVocabulary construction and maintenance Obvious Editing, creating Reporting Term usage Term history Search and indexing Exposed to end users for querying and browsing Exposed to indexers for term assignment Candidate term suggestion

    31. Criteria Technical *Operating system, platform *database software or off-site storage *Technical support: availability? *Who is the developer? Are there IS people on staff? Pricing and licenses *one time purchase or yearly *maintenance fees *price of new versions or other updates? *Extra services for cost? Customization? formatting and importing existing thesaurus? Acceptance *who uses it? Widely adopted? *is it a new product? Well –tested? *product reviews *can you contact current users? Technical *Operating system, platform *database software or off-site storage *Technical support: availability? *Who is the developer? Are there IS people on staff? Pricing and licenses *one time purchase or yearly *maintenance fees *price of new versions or other updates? *Extra services for cost? Customization? formatting and importing existing thesaurus? Acceptance *who uses it? Widely adopted? *is it a new product? Well –tested? *product reviews *can you contact current users?

    32. Criteria Documentation *Printed? *Online? Searchable? *call center? 24/7? User experience *interface: can you look at it all day? *usability: easy to use, not needing a million clicks to accomplish a task, navigation *input style: drag and drop? All manual typing? *accessibility for disabled persons? *error and feedback messaging understandable? Cryptic? *confirmation messages before major changes? Data integrity *backup copies to roll back? *administrative access levels: read only, limit who can add and delete? Documentation *Printed? *Online? Searchable? *call center? 24/7? User experience *interface: can you look at it all day? *usability: easy to use, not needing a million clicks to accomplish a task, navigation *input style: drag and drop? All manual typing? *accessibility for disabled persons? *error and feedback messaging understandable? Cryptic? *confirmation messages before major changes? Data integrity *backup copies to roll back? *administrative access levels: read only, limit who can add and delete?

    33. Criteria Structural *field character limits and data types *pre-defined fields and relationship types *user defined fields and relationships? *Notation? *limit levels (depth)? *polyhierarchical or multiple relationships between terms, such as a term being synonymous to more than one preferred term? Editing *how easy to change status or relationships of a term? *deletion. Global? Is term archived or completely removed? *automatic relationship validation? spell-checking? Importing, Exporting, Reports *special import format? *mapping for heterogeneous or multilingual vocabularies *import/export formats: proprietary or standard? MARC? ASCII? XML? *report configurations: KWIC & KWOC? Alpha, Hierarchical? By dated added or last edited? By notation? *user/use statistics? Structural *field character limits and data types *pre-defined fields and relationship types *user defined fields and relationships? *Notation? *limit levels (depth)? *polyhierarchical or multiple relationships between terms, such as a term being synonymous to more than one preferred term? Editing *how easy to change status or relationships of a term? *deletion. Global? Is term archived or completely removed? *automatic relationship validation? spell-checking? Importing, Exporting, Reports *special import format? *mapping for heterogeneous or multilingual vocabularies *import/export formats: proprietary or standard? MARC? ASCII? XML? *report configurations: KWIC & KWOC? Alpha, Hierarchical? By dated added or last edited? By notation? *user/use statistics?

    34. Products Referenced in This Presentation

    35. The End

More Related