text analytics workshop evaluation of software
Download
Skip this Video
Download Presentation
Text Analytics Workshop Evaluation of Software

Loading in 2 Seconds...

play fullscreen
1 / 22

Text Analytics Workshop Evaluation of Software - PowerPoint PPT Presentation


  • 190 Views
  • Uploaded on

Text Analytics Workshop Evaluation of Software. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Features, Varieties, Vendors Enterprise Context Start with Self-Knowledge Text Analytics Team Evaluation Process

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Text Analytics Workshop Evaluation of Software' - yale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
text analytics workshop evaluation of software

Text Analytics WorkshopEvaluation of Software

Tom ReamyChief Knowledge Architect

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

agenda
Agenda
  • Features, Varieties, Vendors
  • Enterprise Context
    • Start with Self-Knowledge
    • Text Analytics Team
  • Evaluation Process
    • Features and Capabilities – Filter
    • Proof of Concept / Pilot
text analytics software features
Text Analytics Software – Features
  • Entity Extraction
    • Multiple types, custom classes – entities, concepts, events
  • Auto-categorization – Taxonomy Structure
    • Training sets – Bayesian, Vector space
    • Terms – literal strings, stemming, dictionary of related terms
    • Rules – simple – position in text (Title, body, url)
    • Boolean– Full search syntax – AND, OR, NOT
    • Advanced – NEAR (#), PARAGRAPH, SENTENCE
  • Advanced Features
    • Facts / ontologies /Semantic Web – RDF +
    • Sentiment Analysis
varieties of taxonomy text analytics software
Varieties of Taxonomy/ Text Analytics Software
  • Taxonomy Management
    • Synaptica, SchemaLogic
  • Full Platform
    • SAP-Inxight, Clear Forest, SAS- Teragram, Data Harmony, Concept Searching, IBM
  • Content Management
    • Nstein, Interwoven, Documentum, etc.
  • Embedded – Search
    • FAST, Autonomy, Endeca, Exalead, etc.
  • Specialty
    • Sentiment Analysis - Lexalytics
vendors of taxonomy text analytics software
Vendors of Taxonomy/ Text Analytics Software
  • Attensity
  • Business Objects – Inxight
  • Clarabridge
  • ClearForest
  • Data Harmony / Access Innovations
  • GATE (Open Source)
  • IBM Content Analyst
  • Lexalytics
  • Multi-Tes
  • Nstein
  • SAS - Teragram
  • SchemaLogic
  • Smart Logic
  • Synaptica
  • Wikionomy
  • Wordmap
  • Lots More
evaluating taxonomy text analytics software start with self knowledge
Evaluating Taxonomy/Text Analytics Software Start with Self Knowledge
  • Strategic and Business Context
  • Info Problems – what, how severe
  • Strategic Questions – why, what value from the taxonomy/text analytics, how are you going to use it
  • Formal Process - KA audit – content, users, technology, business and information behaviors, applications - Or informal for smaller organization,
  • Text Analytics Strategy/Model – forms, technology, people
    • Existing taxonomic resources, software
  • Need this foundation to evaluate and to develop
evaluating taxonomy text analytics software start with self knowledge7
Evaluating Taxonomy/Text Analytics Software Start with Self Knowledge
  • Do you need it – and what blend if so?
  • Taxonomy Management Only
    • Multiple taxonomies, languages, authors-editors
  • Technology Environment – ECM, Enterprise Search – where is it embedded
  • Publishing Process – where and how is metadata being added – now and projected future
    • Can it utilize auto-categorization, entity extraction, summarization
  • Is the current search adequate – can it utilize text analytics?
  • Applications – text mining, BI, CI, Alerts?
design of the text analytics selection team
Design of the Text Analytics Selection Team
  • Traditional Candidates - IT
  • Experience with large software purchases
    • Search/Categorization is unlike other software
  • Experience with needs assessments
    • Need more – know what questions to ask, knowledge audit
  • Objective criteria
    • Looking where there is light?
    • Asking IT to select taxonomy software is like asking a construction company to select the design of your house.
  • They have the budget
    • OK, they can play.
design of the text analytics selection team9
Design of the Text Analytics Selection Team
  • Traditional Candidates - Business Owners
  • Understand the business
    • But don’t understand information behavior
  • Focus on business value, not technology
    • Focus on semantics is needed
  • They can get executive sponsorship, support, and budget.
    • OK, they can play
design of the text analytics selection team10
Design of the Text Analytics Selection Team
  • Traditional Candidates - Library
  • Understand information structure
    • But not how it is used in the business
  • Experts in search experience and categorization
    • Suitable for experts, not regular users
  • Experience with variety of search engines, taxonomy software, integration issues
    • OK, they can play
design of the text analytics selection team11
Design of the Text Analytics Selection Team
  • Interdisciplinary Team, headed by Information Professionals
  • Relative Contributions
    • IT – Set necessary conditions, support tests
    • Business – provide input into requirements, support project
    • Library – provide input into requirements, add understanding of search semantics and functionality
  • Much more likely to make a good decision
  • Create the foundation for implementation
evaluating text analytics software process
Evaluating Text Analytics Software – Process
  • Start with Self Knowledge
  • Eliminate the unfit
    • Filter One- Ask Experts - reputation, research – Gartner, etc.
      • Market strength of vendor, platforms, etc.
      • Feature scorecard – minimum, must have, filter to top 3
    • Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus
    • Filter Three – Focus Group one day visit – 3-4 vendors
  • Deep pilot (2) / POC – advanced, integration, semantics
  • Focus on working relationship with vendor.
evaluating text analytics software feature checklist and score basic features admin
Evaluating Text Analytics SoftwareFeature Checklist and Score: Basic Features, Admin
  • New, copy, rename, delete, merge
    • Branches not just nodes
  • Scope Notes
  • Spell check
  • Search – all parts and selected (only taxonomy nodes)
  • Names and Identifiers for terms and nodes
  • Check for duplicates
  • Versioning, multiple authors
  • Analytical reports – structure, application to documents
evaluating text analytics software feature checklist and score usability
Evaluating Text Analytics SoftwareFeature Checklist and Score: Usability
  • Ease of use – copy, paste, rename, merge, etc.
  • User Documentation, user manuals, on-line help, training and tutorials
  • Visualization
    • file structure, tree, Hierarchy and alphabetical
  • Automatic Taxonomy/Node & Rule Generation
    • Nonsense for Taxonomy
    • Node – suggestions for sub-categories, rules
  • Variety of node relationships – child-parent, related
evaluating text analytics software feature checklist and score additional features
Evaluating Text Analytics SoftwareFeature Checklist and Score: Additional Features
  • Language support – international - If you have need for it
  • Scalability – Size of taxonomy rarely important
    • More important for auto-categorization
  • Import-Export – XML and SKOS
  • Support standards – NISO, etc., Mapping between taxonomies
  • API / SDK
  • Security, Access Rights, Roles
  • Advanced Features – future growth
    • Facts / ontologies /Semantic Web – RDF +
    • Sentiment Analysis
evaluating text analytics software advanced features text analytics as platform
Evaluating Text Analytics SoftwareAdvanced Features – Text Analytics as Platform
  • Entity Extraction
    • Multiple types, custom classes
  • Summarization
    • Customizable rules, map to different content
  • Auto-categorization
    • Training sets
    • Terms – literal strings, stemming, dictionary of related terms
    • Rules – simple – position in text (Title, body, url)
    • Advanced – saved search queries (full search syntax)
    • NEAR, SENTENCE, PARAGRAPH
    • Boolean – X NEAR Y and Not-Z
evaluating taxonomy software poc
Evaluating Taxonomy SoftwarePOC
  • Quality of results is the essential factor
  • 6 weeks POC – bake off / or short pilot
  • Real life scenarios, categorization with your content
  • Preparation:
    • Preliminary analysis of content and users information needs
    • Set up software in lab – relatively easy
    • Train taxonomist(s) on software(s)
    • Develop taxonomy if none available
  • Six week POC – 3 rounds of development, test, refine / Not OOB
  • Need SME’s as test evaluators – also to do an initial categorization of content
evaluating taxonomy software poc18
Evaluating Taxonomy SoftwarePOC
  • Majority of time is on auto-categorization
  • Need to balance uniformity of results with vendor unique capabilities – have to determine at POC time
  • Risks – getting software installed and working, getting the right content, initial categorization of content
  • Elements:
    • Content
    • Search terms / search scenarios
    • Training sets
    • Test sets of content
  • Taxonomy Developers – expert consultants plus internal taxonomists
evaluating taxonomy software poc19
Evaluating Taxonomy SoftwarePOC

Test Cases:

Auto-categorization to existing taxonomy – variety of content

Clustering – automatic node generation

Summarization

Entity extraction – build a number of catalogs – design which ones based on projected needs – example privacy info (SS#, phone, etc.)

Entity example –people, organization, methods, etc.

Evaluate usability in action by taxonomists

evaluating taxonomy software poc issues
Evaluating Taxonomy SoftwarePOC - Issues
  • Quality of content
  • Quality of initial human categorization
  • Normalize among different test evaluators
  • Quality of taxonomists – experience with text analytics software and/or experience with content and information needs and behaviors
  • Quality of taxonomy
    • General issues – structure (too flat or too deep)
    • Overlapping categories
    • Differences in use – browse, index, categorize
    • IMPORTANT!!!
conclusion
Conclusion
  • Start with self-knowledge – what will you use it for?
    • Current Environment – technology, information
  • Basic Features are only filters, not scores
  • Integration – need an integrated team (IT, Business, KA)
    • For evaluation and development
  • POC – your content, real world scenarios – not scores
  • Foundation for development, experience with software
    • Development is better, faster, cheaper
  • Categorization is essential, time consuming
  • Categorization essential issue is complexity of language
  • Entity Extraction essential issue is scale
questions

Questions?

Tom [email protected]

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

ad