1 / 30

Taxonomies in Electronic Records Management Systems

Taxonomies in Electronic Records Management Systems . May 21, 2002. Terms. Controlled Vocabulary A collection of preferred terms that indicates which terms are preferred and which are variants of the preferred terms. Thesaurus

steffi
Download Presentation

Taxonomies in Electronic Records Management Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taxonomies in Electronic Records Management Systems May 21, 2002

  2. Terms • Controlled Vocabulary • A collection of preferred terms that indicates which terms are preferred and which are variants of the preferred terms. • Thesaurus • A type of controlled vocabulary that shows the hierarchical (parent-child), associative (related terms) and equivalent (synonymous) relationships among terms. • Taxonomy • Hierarchical classification of elements within a domain. One type of taxonomy is a File Plan. • Ontology • A hierarchical classification that is more complex and subtle than a taxonomy. It explains relationships between objects by mapping relationships, such as “part of” or “located in”. Also called knowledge mapping.

  3. Why Use a Taxonomy • Management of Records • Structure for Classification • Navigational Tool • Reduced Burden on Users • More Consistent Than Humans • Sheer Volume of Information • Document Level Vs Folder Level • High Speed Processing • More Than 80% of All Information Is Unstructured

  4. Example: FirstGov.gov

  5. Example: File Plan

  6. Example: Visual Map

  7. How Do Taxonomy Tools Work? • General • Understand Relevancy to Categories • Create Knowledge Clusters • Enable Types to Be Combined • Training Based • Require Representative Samples • Identify Patterns • Create Statistical Models • Rule Based • Process Rules Devised and Hand-coded by Humans • Contain Keywords and Logical Relationships • Linguistics Based • Use Algorithms • Understand Linguistic and Semantic Elements

  8. Taxonomy Uses in Electronic Recordkeeping Systems • Auto Categorization • Searching and Browsing • File Plan Creation and Maintenance

  9. Auto Categorization

  10. Auto Categorization Case Studies • National Archives and Records Administration • 12,000 Documents • Granular File Plan • Single Repository • University of Nevada for Department of Energy • 150,000 Documents • 99.5% Accuracy in Identifying Non Records • Less Than 1 in 20 Documents Required Human Intervention • Department of Education • 90,000 Documents • Accuracy Enhanced by Narrowing Categories • 100% Accuracy Categorizing to Retention Periods

  11. Auto Categorization Anecdotes • Factiva • 1500 Topics • Target of 45% Accuracy • Achieving 60-80% Accuracy • Gartner Group Findings • Typical Accuracy Is 80-95% When Broad Non-overlapping Categories Are Used • One Vendor’s Literature • 75-80% Accuracy Is Typical

  12. Common Themes • Mutually Exclusive Categories Increase Accuracy • Big Bucket Theory • Easy Retrieval Vs Easy Filing • Stove Piping Vs Open System • Human Effort Necessary • Select Training Set • Quality Control • Fine Tune

  13. Comments on Accuracy • No Case Study Achieved 100% in Categorization • Accuracy Rises With Fewer Categories • Short Documents Can Have Too Little Content • Long Documents Can Cover Too Many Topics • Fly in Ointment • Accuracy Diminishes at Each Level Down in the File Plan • In a System Where Auto Categorization Is 80% Accurate, the Expected Accuracy for the Proper Assignment of a Document At the Third Level Down Would Be About 51% • Critical Element - Records Management • Control of File Plan • Understanding of Technology

  14. Searching and Browsing

  15. Searching and Browsing The only thing harder than finding something is finding it again. • Searching • Looking For Something You Know About • Generally Easy in Electronic Documents • The Document Comes to You • Browsing • Looking Through a Collection to See What Is There • Generally Difficult in Electronic Documents • You Go to the Document(s) • Contextual Browsing • Accessing Other Relevant Content Related to the Content Being Viewed. • Other Objects May Not Have Been Grouped Together • Prospective Navigation

  16. The Beauty of a Taxonomy Tool • Delivers Information You Did Not Know You Had • Identifies Unknown Associations Between Documents • Summarizes or Abstracts Content • Uses Visual Maps • Does Not Require User to Know Location of the Information

  17. Visual Map

  18. Visual Map Drilled to Document Level

  19. File Plan Creation

  20. File Plan Creation Using a Taxonomy Tool • Information Architecture Based on Content • Electronically Generated File Plan “It is possible to produce affinities through automatic categorization without a pre-existing taxonomy. These categories can then be edited and renamed. Once categories have been created by humans, documents and other information objects can be automatically assigned to those categories.” Gartner Group

  21. Feasibility of Using Taxonomy Software for File Plan Creation • Feasible to Develop a True Records Management File Plan Using Software • Feasible to Populate an RMA With Electronically Generated File Plan • Feasible to Compile a Quantity of Quality Documents to Mine for Creating the Taxonomy

  22. Then Why Hasn’t It Been Done? • Existing Retention Schedules Not Built This Way • Map Required File Plan Elements to Appropriate Retention Classification OR • Re-Engineer Retention Schedules • Usability for File Plan Development Untested • Statistically Correct BUT • May Not Appear Natural to Users

  23. Scenario • Humans Create Top Level of File Plan • Software Mines Data - Free Categorization • Software Forms Category Patterns • Humans Use Results to Create One Subsidiary Level in File Plan • Humans Associate Retention Schedules at Secondary Level of File Plan • Software Auto Categorizes Documents Into File Plan

  24. Hybrid Solution

  25. Conclusion • Use for Support – Not Full Automation • Ongoing Human Commitment to Plan, Create, and Maintain • Consider Portfolio Approach – Mixing Products • Very Effective for Searching and Browsing • Capture and Search Legacy Documents That Otherwise Would Be Too Costly to Process • Integrate With Document Imaging System • Potential Is Huge

  26. Resources

  27. Web Sites With Energy Glossaries/Thesauri • www.eia.doe.gov • http://www.nerc.com/glossary/ • http://www.eren.doe.gov/consumerinfo/glossary/ • http://www.naruc.org/resources/glossary.shtml • www.powermarketers.com/glossary.htm • http://hilt.cdlr.strath.ac.uk/Sources/thesauri.html

  28. Cool Stuff • Thesaurus Management Tools • www.multites.com • www.synaptica.com • www.pmei.com/lexico.html • Books • Content Management Bible, Bob Boiko • Information Architecture for the World Wide Web, Louis Rosenfeld & Peter Morville • Free Search Engine for Your Web Site • http://www.freefind.com/

  29. More Cool Stuff • DOE Related Use of Taxonomy Tool for Searching and Browsing • www.lsnnet.gov • Controlled Vocabularies, Thesauri and Classification Systems Available on the Web • www.lub.lu.se/metadata/subject-help.html • http://sky.fit.qut.edu.au/~middletm//cont_voc.html • Information Architecture White Papers and Publications • http://argus-acia.com/index.html • Virtual Library • www.vlib.org/overview.html

  30. THANK YOU! Angela Tayfun, CRM AT&T Government Solutions, Inc. 1900 Gallows Road Vienna, VA 22182 Ph: 703.506.5562 E-mail: atayfun@att.com

More Related