1 / 46

Information Quality in Practice: The Good, the Bad and the Ugly

Information Quality in Practice: The Good, the Bad and the Ugly. November 16, 2005. Leon Schwartz, Ph.D. Informed Decisions Group. What Me Worry?. Business Champions for TDQM Programs are scarce , because Data Quality is difficult to define & measure,

chandler
Download Presentation

Information Quality in Practice: The Good, the Bad and the Ugly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Quality in Practice: The Good, the Bad and the Ugly November 16, 2005 Leon Schwartz, Ph.D. Informed Decisions Group

  2. What Me Worry? Business Champions for TDQM Programs are scarce, because Data Quality is difficult to define & measure, even though Poor Data Quality costs Billions of dollars.

  3. Information Quality in Practice • Prolog: Poor Data Costs $Billions • The Good: You Can Clean it Up • The Bad: The Cost of Avoidance • The Ugly: The Pogo Effect • Epilogue: What is Data Quality, anyway?

  4. Poor Data Quality Costs $Billions • Data quality problems cost U.S. businesses $611 billion a year. • 40% of firms have suffered losses. • 2% of customer records are obsolete in one month. • Customer duplication rates range 5 to 20%. • The Web is increasing data entry errors. Source: Data Warehouse Institute Study, 2002

  5. Effects of Bad Customer Data • Low credibility among customers & suppliers • Poor decision making • Lost customers/clients • Unnecessary printing & postage • Poor customer service • Lost business opportunities • Inefficient utilization of staff

  6. Data Affects Your Success Relative influence of P eople rocess olitics DATA A lgorithm on an OR/MS project

  7. Room for Improvement • Only 11% have implemented a DQ program* • 48% have no plan for a program • 26% purchased a data quality tool* • 52% have no plans • Still very far from 6 Sigma! • Easy to improve Quality, if….. *Source: Data Warehouse Institute Study, 2002

  8. …..You Can Answer the Following • How good is good enough? • How often is often enough? • How much is it worth? Information must be Useful

  9. Information Quality in Practice • Prolog: Poor Data costs $billions • The Good: You can Clean it up • The Bad: The Cost of Avoidance • The Ugly: The Pogo Effect • Epilogue: What is Data Quality, anyway?

  10. Data Quality Starts with Access • Data does not exist anywhere • Exists, but you can’t find it • You found it, but you can’t get to it • You can get to it, but you don’t have authority to use it • You can use it, but it is a total MESS “I never realized HOW BAD!” • Data Warehouse NIRVANA! It’s dirty, but useful. CDI

  11. Data Quality & the Data Warehouse “I never realized HOW BAD our data is!” Integrating data can improve Quality, if you… • Quality Control the Match • Measure & Improve Integrity • Flag “out of range” Values • Manually examine BIG “leftovers” • Audit a random sample of Customers

  12. Matching Improves Quality • Group ID • Account ID • Account ID • Duns Name Address • Operations • Cleanse • Transform • Consolidate Phone Rules

  13. Establish Q. A. Procedures • Use a common sample • Establish replicable process • Document carefully • Realize the subjectivity • Train the Vendor • Audit the Vendor

  14. Quality Control Your Match

  15. Document Integrity Rules and UPDATE

  16. Measure & Reduce Violations

  17. Looking at counts saves the day Flag “out of range” Values

  18. Manually Examine BIG “Leftovers” Pareto

  19. Begins and ends with the CUSTOMER Ensuring Data Quality • Focus on the PROCESS (TQM) • Define Quality Metrics (KPIs) • Use Data Cleansing Tools • NCOA • Type “data cleansing” in Google for list • Document everything • Audit regularly • Test, test, test • Who is using? How?

  20. Information Quality in Practice • Prolog: Poor Data costs $billions • The Good: You can Clean it up • The Bad: The Cost of Avoidance • The Ugly: The Pogo Effect • Epilogue: What is Data Quality, anyway?

  21. Who’s Cleaning Up? CDI • Data Quality Software Vendors • IBM (acquired Ascential who acquired Vality) • SAS (acquired DataFlux) • Harte-Hanks (acquired Trillium) • Firstlogic, Unitech, Innovative Systems • Similarity Systems (ACQUIRED Evoke SW) • Address Matching & Cleansing Vendors • Pitney Bowes acquired Group 1 (4/05) and Firstlogic (???) • Plus 100s of service bureaus • Specialty houses • I.e., Comanage for telcomm companies ….and the data is still dirty.

  22. Information Requirements are Relative • Strategic objectives or goals • Who are the clients (THEY) • What THEY need • When they need it • Where they need it • How they need it "...I'll know it when I see it."

  23. Data Quality Programs are Rare Scope the Effort Data Discovery - Information Inventory - ”As-is” processes - Information Priorities • - Data Description • - Simple Data Checks • - Data Mining Develop DQ rules Categorize Data Defects - Integrity, retention, refresh, reliability - Classify defects & causes Define DQ Program Launch & Track - Metrics, KPIs

  24. Dealing with DENIAL is Daunting • Expose shoddy business processes • Change business practices • Agree on common definitions, rules, roles • Train employees • Tackle political/cultural issues

  25. Information Quality in Practice • Prolog: Poor Data costs $billions • The Good: You can Clean it up • The Bad: The Cost of Avoidance • The Ugly: The Pogo Effect • Epilogue: What is Data Quality, anyway?

  26. Sources of Errors • Technical • Careless calculations • Poor programming • Process • Human error • Negligence • Intent (policy) • Political

  27. KPI Target – level 1 Fix the Basics: Customer Master Cleanse 6.9 million root records Eliminate duplicate customer records (est x %) Eliminate inactive customer records (est x%) Reduce business processes creating incorrect Customer Information Populate and interface SAP Customer Master Customer Master live by Dec. 31, 2002 Target – level 3 Customer Master live by 1Q 03 Actual / Forecast • To-Be Business processes complete • Data quality activities • 1.6 MM obsolete identified and purged • 2.3 MM duplicates identified, 325K identified for elimination (Customers confirmed need for 1.95 MM duplicates, based upon current capabilities) • Cleansed 3.6 M U.S. records (via Finalist, Customer Contact) • D&B DUNS linkage in process. Identified 577 K duplicates, 2.9 M unique DUNS customers • Analyzed and Improving processes which create bad data • Identified and documented sources of create / update / delete to legacy customer records. • Removed change authorization from 2,940 employees, primarily Sales, Service, Product Supply, and PBCC New Business Operations • Identified and corrected 4 significant (and numerous minor) legacy systems problems creating incorrect and/or duplicate customer information • Conversion to SAP environment • Production environment complete • 34 interface and conversion development activities • Customer Master Live (Converted from IMS to SAP) on track for December 6 • User Training • User and Power user training developed • Power User Boot Camp training completed November 22 • End user training (1,300 users) scheduled for January

  28. Avoiding Errors “This customer already is in our database.” • Technical • Error Trapping • CMM program • Process • Edit checks • Training • Streamlining • Political • Culture change

  29. Unreliable Cancellation Data Creates a “Lose-Lose-Lose” • Suspect cancellations identified • Audit reports sent to field • VP, Sales fired • Customer Retention • Executive focus • The Pogo Effect • Fix the Basics • “Software enhancement” • Order to Cash • “All fixed for 2005”

  30. Taking I.Q. to the Next Level • Merge/Purge/Address Hygiene no longer good enough • Move from Repair to Correct to Prevent • Organizational Change, Compromise and Accountability impact program budget • How to JUSTIFY $$ when I.Q. is so fuzzy??

  31. Information Quality in Practice • Prolog: Poor Data costs $billions • The Good: You can Clean it up • The Bad: The Cost of Avoidance • The Ugly: The Pogo Effect • Epilogue: What is Data Quality, anyway?

  32. It’s All About Perception • We’ve had this problem for 20 years. • We know we had this problem for 10 years • Every organization has the problem • We know it will cost to improve it • How much of an improvement can I buy? • What is the ROI? • Can I believe what you tell me? "...I'll know it when I see it."

  33. Wang & Strong ID 179 I.Q. Attributes Ponniah defines 17 Redman defines 27 Marakas defines 11

  34. Where to Start? • Too many definitions: no clarity • Need to focus! • Most include ACCURACY as one dimension • Even Accuracy is a fuzzy concept • What are ‘errors”? • What are “true” values? “false”? “suspect”? • Can we even measure accuracy “accurately”?

  35. Even the Lexicon of Terms is Fuzzy Quality>>Accuracy>>Error-free • Direct observation of “errors” • Subjective • Unreliable • Impractical even with moderate size data sets • High cost • Automated error reports • Who creates the rules? • Needs to be audited • Misses subtleties • Lower cost A Major Research Challenge

  36. You be the JUDGE Find the Errors

  37. The Impact of Context is Clear

  38. What about Cognition?

  39. The 3 C’s Content Performance Preference Perception Functional Experience Analytical Context Cognition SME Aptitude

  40. Thank you for your time Leon Schwartzwww.informeddecisionsgroup.com The Data Quality Perception Research Website http://www.xkimo.com/dqpresearch/

  41. Omit the Analysts

  42. Research Design • Samples created with 0-15 errors (17% max) • Samples randomly presented (see website) • Practice session (6 samples) • Respondents asked to rate 16 samples on 1-30 scale (modified Magnitude Estimation) • Double anchors used • 63 students (grad & undergrad) attempted

  43. The Simple Task Please examine the data/report above, and estimate the accuracy of the information by placing your cursor and clicking on the line below: Error Prone Error Free (Too many mistakes to be useful) (No discernable mistakes) Low Accuracy High Accuracy Anchor Study Fiasco!

  44. The Perceptual Transfer Function 3 2 Perceived accuracy (subjective) 1 Number of errors (objective) Error rate (objective)

  45. All Graduate Students

  46. Business Professionals

More Related