text analytics summit text analytics evaluation l.
Skip this Video
Loading SlideShow in 5 Seconds..
Text Analytics Summit Text Analytics Evaluation PowerPoint Presentation
Download Presentation
Text Analytics Summit Text Analytics Evaluation

Loading in 2 Seconds...

play fullscreen
1 / 33

Text Analytics Summit Text Analytics Evaluation - PowerPoint PPT Presentation

  • Uploaded on

Text Analytics Summit Text Analytics Evaluation. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Features, Varieties, Vendors Evaluation Process Start with Self-Knowledge Text Analytics Team

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Text Analytics Summit Text Analytics Evaluation' - milo

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
text analytics summit text analytics evaluation

Text Analytics SummitText AnalyticsEvaluation

Tom ReamyChief Knowledge Architect

KAPS Group

Knowledge Architecture Professional Services


  • Features, Varieties, Vendors
  • Evaluation Process
    • Start with Self-Knowledge
    • Text Analytics Team
    • Features and Capabilities – Filter
  • Proof of Concept/Pilot
    • Themes and Issues
    • Case Study
  • Conclusion
kaps group general
KAPS Group: General
  • Knowledge Architecture Professional Services
  • Virtual Company: Network of consultants – 8-10
  • Partners – SAS, SAP, FAST, Smart Logic, Concept Searching, etc.
  • Consulting, Strategy, Knowledge architecture audit
  • Services:
    • Taxonomy/Text Analytics development, consulting, customization
    • Technology Consulting – Search, CMS, Portals, etc.
    • Evaluation of Enterprise Search, Text Analytics
    • Metadata standards and implementation
    • Knowledge Management: Collaboration, Expertise, e-learning
    • Applied Theory – Faceted taxonomies, complexity theory, natural categories
introduction to text analytics text analytics features
Introduction to Text AnalyticsText Analytics Features
  • Noun Phrase Extraction
    • Catalogs with variants, rule based dynamic
    • Multiple types, custom classes – entities, concepts, events
    • Feeds facets
  • Summarization
    • Customizable rules, map to different content
  • Fact Extraction
    • Relationships of entities – people-organizations-activities
    • Ontologies – triples, RDF, etc.
  • Sentiment Analysis
    • Statistical, rules – full categorization set of operators
introduction to text analytics text analytics features5
Introduction to Text AnalyticsText Analytics Features
  • Auto-categorization
    • Training sets – Bayesian, Vector space
    • Terms – literal strings, stemming, dictionary of related terms
    • Rules – simple – position in text (Title, body, url)
    • Semantic Network – Predefined relationships, sets of rules
    • Boolean– Full search syntax – AND, OR, NOT
    • Advanced – NEAR (#), PARAGRAPH, SENTENCE
  • This is the most difficult to develop
  • Build on a Taxonomy
  • Combine with Extraction
    • If any of list of entities and other words
varieties of taxonomy text analytics software
Varieties of Taxonomy/ Text Analytics Software
  • Taxonomy Management
    • Synaptica, SchemaLogic
  • Full Platform
    • SAS-Teragram, SAP-Inxight, Clarabridge, Smart Logic, Linguamatics, Concept Searching, Expert System, IBM, GATE
  • Embedded – Search or Content Management
    • FAST, Autonomy, Endeca, Exalead, etc.
    • Nstein, Interwoven, Documentum, etc.
  • Specialty / Ontology (other semantic)
    • Sentiment Analysis – Lexalytics, Lots of players
    • Ontology – extraction, plus ontology
evaluating taxonomy text analytics software start with self knowledge
Evaluating Taxonomy/Text Analytics Software Start with Self Knowledge
  • Strategic and Business Context
  • Info Problems – what, how severe
  • Strategic Questions – why, what value from the taxonomy/text analytics, how are you going to use it
  • Formal Process - KA audit – content, users, technology, business and information behaviors, applications - Or informal for smaller organization,
  • Text Analytics Strategy/Model – forms, technology, people
    • Existing taxonomic resources, software
  • Need this foundation to evaluate and to develop
evaluating taxonomy text analytics software start with self knowledge14
Evaluating Taxonomy/Text Analytics Software Start with Self Knowledge
  • Do you need it – and what blend if so?
  • Taxonomy Management Only
    • Multiple taxonomies, languages, authors-editors
  • Technology Environment – Text Mining, ECM, Enterprise Search – where is it embedded
  • Publishing Process – where and how is metadata being added – now and projected future
    • Can it utilize auto-categorization, entity extraction, summarization
  • Is the current search adequate – can it utilize text analytics?
  • Applications – text mining, BI, CI, Alerts?
design of the text analytics selection team
Design of the Text Analytics Selection Team
  • Traditional Candidates - IT
  • Experience with large software purchases
    • Search/Categorization is unlike other software
  • Experience with needs assessments
    • Need more – know what questions to ask, knowledge audit
  • Objective criteria
    • Looking where there is light?
    • Asking IT to select taxonomy software is like asking a construction company to select the design of your house.
  • They have the budget
    • OK, they can play.
design of the text analytics selection team16
Design of the Text Analytics Selection Team
  • Traditional Candidates - Business Owners
  • Understand the business
    • But don’t understand information behavior
  • Focus on business value, not technology
    • Focus on semantics is needed
  • They can get executive sponsorship, support, and budget.
    • OK, they can play
design of the text analytics selection team17
Design of the Text Analytics Selection Team
  • Traditional Candidates – Library, KM, Data Analysis
  • Understand information structure
    • But not how it is used in the business
  • Experts in search experience and categorization
    • Suitable for experts, not regular users
  • Experience with variety of search engines, taxonomy software, integration issues
    • OK, they can play
design of the text analytics selection team18
Design of the Text Analytics Selection Team
  • Interdisciplinary Team, headed by Information Professionals
  • Relative Contributions
    • IT – Set necessary conditions, support tests
    • Business – provide input into requirements, support project
    • Library – provide input into requirements, add understanding of search semantics and functionality
  • Much more likely to make a good decision
  • Create the foundation for implementation
evaluating text analytics software process
Evaluating Text Analytics Software – Process
  • Start with Self Knowledge
  • Eliminate the unfit
    • Filter One- Ask Experts - reputation, research – Gartner, etc.
      • Market strength of vendor, platforms, etc.
      • Feature scorecard – minimum, must have, filter to top 3
    • Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus
    • Filter Three – Focus Group one day visit – 3-4 vendors
  • Deep pilot (2) / POC – advanced, integration, semantics
  • Focus on working relationship with vendor.
evaluating text analytics software feature checklist and score
Evaluating Text Analytics SoftwareFeature Checklist and Score
  • Basic Features, Taxonomy Admin
    • New, copy, rename, delete, merge, node relationships
    • Scope Notes, spell check, versioning, node ID
    • Analytical reports – structure, application to documents
  • Usability, user documentation, training
  • Visualization – taxonomy structure
  • Language support
  • API/SDK, Import-Export – XML & SKOS
  • Standards, security, access roles & rights
  • Clustering – taxonomy node generation, sentiment
initial evaluation example outcomes
Initial Evaluation Example Outcomes
  • Filter One:
    • Company A, B – sentiment analysis focus, weak categorization
    • Company C – Lack of full suite of text analytics
    • Company D – business concerns, support
    • Open Source – license issues
    • Ontology Vendors – missing categorization capabilities
  • 4 Demos
    • Saw a variety of different approaches, but
    • Company X – lacking sentiment analysis, require 2 vendors
    • Company Y – lack of language support, development cost
evaluating taxonomy software poc
Evaluating Taxonomy SoftwarePOC
  • Quality of results is the essential factor
  • 6 weeks POC – bake off / or short pilot
  • Real life scenarios, categorization with your content
  • Preparation:
    • Preliminary analysis of content and users information needs
    • Set up software in lab – relatively easy
    • Train taxonomist(s) on software(s)
    • Develop taxonomy if none available
  • Six week POC – 3 rounds of development, test, refine / Not OOB
  • Need SME’s as test evaluators – also to do an initial categorization of content
evaluating taxonomy software poc23
Evaluating Taxonomy SoftwarePOC
  • Majority of time is on auto-categorization and/or sentiment
  • Need to balance uniformity of results with vendor unique capabilities – have to determine at POC time
  • Risks – getting software installed and working, getting the right content, initial categorization of content
  • Elements:
    • Content
    • Search terms / search scenarios
    • Training sets
    • Test sets of content
  • Taxonomy Developers – expert consultants plus internal taxonomists
evaluating taxonomy software poc test cases
Evaluating Taxonomy SoftwarePOC: Test Cases
  • Auto-categorization to existing taxonomy – variety of content
  • Clustering – automatic node generation
  • Summarization
  • Entity extraction – build a number of catalogs – design which ones based on projected needs – example privacy info (SS#, phone, etc.)
    • Entity example –people, organization, methods, etc.
  • Sentiment – Best Buy phones
  • Evaluate usability in action by taxonomists
  • Integration – with ontologies
  • Output – XML, API’s
evaluating taxonomy software poc issues
Evaluating Taxonomy SoftwarePOC - Issues
  • Quality of content
  • Quality of initial human categorization
  • Normalize among different test evaluators
  • Quality of taxonomists – experience with text analytics software and/or experience with content and information needs and behaviors
  • Quality of taxonomy
    • General issues – structure (too flat or too deep)
    • Overlapping categories
    • Differences in use – browse, index, categorize
  • Categorization essential issue is complexity of language
    • Good sentiment is based on categorization
  • Entity Extraction essential issue is scale and disambiguation
case study telecom service
Case Study: Telecom Service
  • Company History, Reputation
  • Full Platform –Categorization, Extraction, Sentiment
  • Integration – java, API-SDK, Linux
  • Multiple languages
  • Scale – millions of docs a day
  • Total Cost of Ownership
  • Ease of Development - new
  • Vendor Relationship – OEM, etc.
  • Expert Systems
  • IBM
  • SAS - Teragram
  • Smart Logic
  • Option – Multiple vendors – Sentiment & Platform
poc design discussion evaluation criteria
POC Design Discussion: Evaluation Criteria
  • Basic Test Design – categorize test set
    • Score – by file name, human testers
  • Categorization – Call Motivation
    • Accuracy Level – 80-90%
    • Effort Level per accuracy level
  • Sentiment Analysis
    • Accuracy Level – 80-90%
    • Effort Level per accuracy level
  • Quantify development time – main elements
  • Comparison of two vendors – how score?
    • Combination of scores and report
text analytics poc outcomes vendor comparisons
Text Analytics POC OutcomesVendor Comparisons
  • Categorization Results – both good, edge to SAS on precision
    • Use of Relevancy to set thresholds
  • Development Environment
    • IBM as toolkit provides more flexibility but it also increases development effort
  • Methodology – IBM enforces good method, but takes more time
    • SAS can be used in exactly the same way
  • SAS has a much more complete set of operators – NOT, DIST, START
text analytics poc outcomes vendor comparisons functionality
Text Analytics POC OutcomesVendor Comparisons - Functionality
  • Sentiment Analysis – SAS has workbench, IBM would require more development
    • SAS also has statistical modeling capabilities
  • Entity and Fact extraction – seems basically the same
    • SAS and use operators for improved disambiguation –
  • Summarization – SAS has built-in
    • IBM could develop using categorization rules – but not clear that would be as effective without operators
  • Conclusion: Both can do the job, edge to SAS
poc and early development risks and issues
POC and Early Development: Risks and Issues
  • CTO Problem –This is not a regular software process
  • Semantics is messy not just complex
    • 30% accuracy isn’t 30% done – could be 90%
  • Variability of human categorization
  • Categorization is iterative, not “the program works”
    • Need realistic budget and flexible project plan
  • Anyone can do categorization
    • Librarians often overdo, SME’s often get lost (keywords)
  • Meta-language issues – understanding the results
    • Need to educate IT and business in their language
  • Start with self-knowledge – what will you use it for?
    • Current Environment – technology, information
  • Basic Features are only filters, not scores
  • Integration – need an integrated team (IT, Business, KA)
    • For evaluation and development
  • POC – your content, real world scenarios – not scores
  • Foundation for development, experience with software
    • Development is better, faster, cheaper
  • Categorization is essential, time consuming
  • Sentiment / VOC without categorization will fail


Tom Reamytomr@kapsgroup.com

KAPS Group

Knowledge Architecture Professional Services