Project Driven
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Ron Forino DAMA - Washington, DC September 1999 PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

Project Driven Data Quality Improvement. Ron Forino DAMA - Washington, DC September 1999. Examples. According to DM Review, one European company discovered through an audit that it was not invoicing 4% of its orders. With $2 billion in revenues, that meant $80 million went unpaid.

Download Presentation

Ron Forino DAMA - Washington, DC September 1999

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ron forino dama washington dc september 1999

Project Driven

Data Quality Improvement

Ron Forino

DAMA - Washington, DC

September 1999


Examples

Examples

  • According to DM Review, one European company discovered through an audit that it was not invoicing 4% of its orders. With $2 billion in revenues, that meant $80 million went unpaid.

  • Electronic data audits show that the invalid data values in the typical customer database average around 15 - 20%. Physical audits suggest that this number may be closer to 25 - 30%.

  • In 1992, 96,000 IRS tax refund checks were returned “undeliverable” due to incorrect addresses.

  • This year, incorrect price data in retail databases will cost American consumers as much as $2.5 billion in overcharges.

  • According to organizations like the Data Warehouse Institute, the Gartner Group and MetaGroup - Data Quality is one of the top 1-3 success factors to Data Warehousing.

  • The average mid-sized company may have 30,000 - 50,000 fields in files, tables, screens, reports, etc. [Platinum Technology]


  • Agenda

    Agenda

    • Definitions

    • What is Data Quality?

    • Tactics and the End Game

    • Building Blocks to Data Quality

      • Tactical Initiatives

      • Strategic Initiatives

    • Tactical Data Quality

      • Rule Disclosure

      • Data Quality Measurement, Analysis and Certification

      • Meta Data Creation

      • Validation

      • Quality Improvement


    Ron forino dama washington dc september 1999

    Definitions


    Definitions

    Definitions

    • Data Transformation - Changing data values to a format consistent with integrity and business rules agreed to by data stakeholders.

    • Data Cleansing - Consolidation of redundant customer records. Term used to describe the process of “merging and purging” of customer lists in an effort to reduce duplicate or inaccurate customer records.

    • Data Quality Improvement - The process of improving data quality to the level desired to support the enterprise information demand.

    • Data Quality -definition to follow….


    Data quality improvement decision tree

    Data Quality Improvement Decision Tree

    TaskProcess

    Transform

    Conform to Business Rule

    Data Reengineering

    Data Quality

    Improvement

    Process Reengineering

    Standardize

    Validate

    Match

    Dedupe

    Integrate

    Enrich

    Match & Dedupe

    Data

    Cleansing


    Tactics and the end game

    Tactics and The End Game

    “We need better data quality...”

    Enterprise Initiative

    Select Project

    Data Quality Assessment

    Report & Recommendations

    Source System Clean-up Initiative


    Tactics and the end game1

    Tactics and The End Game

    “We need better data quality...”

    Data Warehouse

    Enterprise Initiative

    Select Project

    Data Quality Assessment

    Data Quality Assessment

    Staging Specifications

    Report

    Report & Recommendations

    Source System Clean-up Initiative

    Source System Clean-up Initiative


    Ron forino dama washington dc september 1999

    What is [Good] Data Quality?


    How can we know good data quality

    How Can We Know Good Data Quality?

    • Column 1

    • 321453

    • 212392

    • 093255

    • 214421

    • .

    • .

    • .

    Is this Good Data Quality?

    What can we conclude?


    What is data quality

    What is Data Quality?

    • Information Quality = f(Definition + Data + Presentation)

    • Definition

      • Defines Data

      • Domain Value Specification

      • Business Rules that Govern the Data

      • Information Architecture Quality

    • Data Content

      • Completeness

      • Validity/Reasonability

    • Data Presentation

      • Accessible

      • Timely

      • Non-ambiguous


    Common data quality problems

    Data Content

    Missing Data

    Invalid Data

    Data Outside Legal Domain

    Illogical Combinations of Data

    Structural

    Record Key Integrity

    Referential Integrity

    Cardinality Integrity

    Migration/Integration

    Rationalization Anomalies

    Duplicate or Lost Entities

    Definitions and Standards

    Ambiguous Business Rules

    Multiple Formats for Same Data Elements

    Different Meanings for the Same Code Value

    Multiple Codes Values with the Same Meaning

    Field Used for Unintended Data

    Data in Filler

    Y2K Violation

    Common Data Quality Problems


    Ron forino dama washington dc september 1999

    Building Blocks

    to Data Quality


    Building blocks of a data quality program

    Building Blocks of a Data Quality Program

    Benefits Realization

    Strategic

    Defect Prevention

    DQ Requirements

    Quality Reengineering

    Enterprise Cultural Shift

    QC/Process Auditing

    Data Stewardship

    Tactical

    Validation

    Quality Improvement

    Meta Data Creation

    Measure

    Analyze & Certify

    Rule Disclosure


    Ron forino dama washington dc september 1999

    Tactical Data Quality


    Steps to tactical data quality

    Steps to Tactical Data Quality

    Rule

    Disclosure

    Measure

    Quality

    Analyze &

    Certify

    Meta Data

    Creation

    Validation

    Quality

    Improvement


    Ron forino dama washington dc september 1999

    Rule Disclosure


    Sources of meta data

    Sources of Meta Data

    • Legacy Meta Data

      • Data Models, Process Models

      • Data Dictionary, Definitions, Aliases

      • Glossary of Terms

    • Transformation Meta Data

      • Data Mapping

      • Transformation Rules

      • Error Handling Rules

    • Access Meta Data

      • Data Directory

      • Data Definitions

    • The Subject Matter Expert

    • Database Directory

    • Domain Values, Range of Values

    • Run Books

    • Derived Data Calculations

    • Audit Statistics

    • Source & Transformation


    Acquiring good meta data is essential

    Acquiring good Meta Data is Essential

    Collect

    Documentation

    Validate the

    Meta Data

    Assess the

    Data

    Report

    Findings

    Collect

    Documentation

    Assess the

    Data

    Report

    Findings

    Validate

    Findings

    Preferred

    Collect Valid

    Meta Data

    Assess the

    Data

    Report

    Findings

    Meta Data can be gathered before, during or after the Assessment

    “You can pay me now, or you can pay me later…”


    Ron forino dama washington dc september 1999

    • Measuring

    • Data Quality

    • Techniques

    • Tools

    • Methods


    How can data quality be measured

    How can Data Quality be Measured?

    “One accurate measurement is worth a thousand expert opinions”

    [Grace Hopper, Admiral, US Navy]

    • Customer Complaints

    • User Interviews & Feedback

    • Customer Satisfaction Survey

    • Data Quality Requirements Gathering

    • Data Quality Assessments


    Measuring data quality tools

    Measuring Data Quality - Tools

    • Analysis Tools

      • Specifically designed assessment tools

        • Quality Manager, Migration Architect

        • N & A: Trillium, Group-1, ID Centric, Finalist, etc.

      • Improvisations

        • SAS, Focus, SQL, other query tools

    • Other Necessary Tools

      • File Transfer

      • Data Conversion


    Assessment measurements

    Assessment Measurements

    • Level 1: Completeness

      • Nulls or Blanks

      • Misuse (or overuse) of Default Values

    • Level 2: Validity

      • Data Integrity Anomalies

      • Invalid Data based on Business Rule

    • Level 3: Structural Integrity

      • Primary Key Uniqueness

      • Key Structure (Cardinality, Referential Integrity, Alternate Keys)

    • Level 4: Business Rule Violations

      • Relationship between two or more fields

      • Calculations

    Field Integrity

    Intuitive Integrity Rules

    Business Rule Integrity

    Requiring Meta Data


    Ron forino dama washington dc september 1999

    Report Card

    • Analyze

    • and Certify

    • Identifying Problems

    • Sizing up Problems

    • “To Certify or Not to Certify…”


    Template field level

    Template - field level

    • Value - the domain occurrence

    • Frequency- the number of occurrences within the data set

    • Percent- the % of the whole set

    • 88 Info- the copybook definition for the value

    • Analysis- comments about our findings


    Identifying problems

    Identifying Problems

    1

    2

    3

    • Analysis (and Discovery)

    • 1. Is the field required? If so, blanks indicate an anomaly.

    • 2. Are the values “ID206” and “STANG” allowed? (Is this a problem with the data or the Meta Data?

    • 3.Some values occur in only 1.3% of the records. Is this telling us there is a problem?


    Data quality scoring

    Data Quality Scoring


    Example poor data quality

    Example: Poor Data Quality


    Field analysis

    Field Analysis

    In a range of values, in the absence of domain rules,

    investigate the first and last .2%

    Bell curve distribution


    Management reporting short engagement

    Management Reporting- Short Engagement


    Management reporting status

    Management Reporting - Status


    Management reporting anomalies

    Management Reporting - Anomalies


    Management reporting productivity

    Management Reporting - Productivity


    Ron forino dama washington dc september 1999

    Meta Data Creation


    Example data quality repository

    Example: Data Quality Repository

    Newly

    Discovered Rules


    Meta data supply chain

    Meta Data Supply Chain

    Definition &

    Domain

    Meta Data Gathering

    Data Quality

    Statistical

    Reports

    Knowledge

    Management

    DQ Assessment

    Transformation & Edit

    Recommendations

    Data Quality

    & Definition

    Validation

    SME Validation

    Data Cleansing

    Update

    Meta Data

    Field Name

    Data Inventory

    Meta Data

    Work Groups

    Data Requirements


    Ron forino dama washington dc september 1999

    Results Validation


    Ron forino dama washington dc september 1999

    Report Validation

    SME validation… an opportunity to improve Meta Data

    1. Supply a clear name for the field.

    2. Is there a good definition?

    3. Make the business rules public?

    4. Will the SME initiate a data cleansing initiative?

    5. Does the SME recommend edit or data transformation rules?

    6. Are the findings consistent with the SMEs expectations?

    Report Sections

    Identification

    1

    2

    3

    Field Definition & Rules

    4

    5

    Score & Explanation

    Statistical Reports & Analysis

    6


    Ron forino dama washington dc september 1999

    Quality

    Improvement


    Next steps

    Next Steps

    Continued

    Monitoring

    Legacy

    Data

    Extractions

    Monthly

    Reports

    Information

    Management

    Objectives

    Initiatives

    Data Clean-up

    Legacy System

    Enhancements &

    Re-engineering

    Management

    Report &

    Recommendations

    Perform

    Baseline

    Assessment

    (Discovered

    Business

    Rules)

    Steering

    Committee

    Data Migration

    Transformation

    & Cleansing

    Specifications

    Metadata,

    Models,

    Reports, etc.


    Lessons learned data cleanup

    Lessons Learned- Data Cleanup

    $$

    (More complete,

    more error prone)

    (Most complete,

    most accurate,

    most costly,

    most timely)

    (More accurate,

    less data)

    100%

    Completeness

    100%

    Accuracy


    Summary

    Summary

    • We made the distinction between:

      - Data Migration

      - Data Quality

      - Data Cleansing

    • We defined what “good” data quality is.

    • We discussed that there could be 10 or more processes that could take place in building a comprehensive data quality program for the enterprise.

      - Tactical should precede the Strategic [or be the 1st step of ]

    • There are 6 steps to an effective tactical data quality initiative:

      - Rule Disclosure

      - Quality Measurement

      - Analyze and Certify

      - Meta Data Creation

      - Validation

      - Quality Improvement


    Reference material

    Reference Material

    • The Demings Management Method (Total Quality Management), Mary Walton

    • Data Quality for the Information Age, Tom Redman

    • The Data Warehouse Challenge: Taming Data Chaos, Michael Brackett

    • Improving Data Warehouse and Business Information Quality, Larry English

    • DM Review Magazine, Information Quality series by Larry English


    Ron forino dama washington dc september 1999

    Ron Forino

    Director, Business Intelligence

    DMR Consulting Group

    (732)549-4100 X-8292

    [email protected]

    [email protected]


  • Login