slide1
Download
Skip this Video
Download Presentation
Ron Forino DAMA - Washington, DC September 1999

Loading in 2 Seconds...

play fullscreen
1 / 45

Ron Forino DAMA - Washington, DC September 1999 - PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on

Project Driven Data Quality Improvement. Ron Forino DAMA - Washington, DC September 1999. Examples. According to DM Review, one European company discovered through an audit that it was not invoicing 4% of its orders. With $2 billion in revenues, that meant $80 million went unpaid.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Ron Forino DAMA - Washington, DC September 1999' - jemma


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Project Driven

Data Quality Improvement

Ron Forino

DAMA - Washington, DC

September 1999

examples
Examples
    • According to DM Review, one European company discovered through an audit that it was not invoicing 4% of its orders. With $2 billion in revenues, that meant $80 million went unpaid.
    • Electronic data audits show that the invalid data values in the typical customer database average around 15 - 20%. Physical audits suggest that this number may be closer to 25 - 30%.
    • In 1992, 96,000 IRS tax refund checks were returned “undeliverable” due to incorrect addresses.
    • This year, incorrect price data in retail databases will cost American consumers as much as $2.5 billion in overcharges.
    • According to organizations like the Data Warehouse Institute, the Gartner Group and MetaGroup - Data Quality is one of the top 1-3 success factors to Data Warehousing.
  • The average mid-sized company may have 30,000 - 50,000 fields in files, tables, screens, reports, etc. [Platinum Technology]
agenda
Agenda
  • Definitions
  • What is Data Quality?
  • Tactics and the End Game
  • Building Blocks to Data Quality
    • Tactical Initiatives
    • Strategic Initiatives
  • Tactical Data Quality
    • Rule Disclosure
    • Data Quality Measurement, Analysis and Certification
    • Meta Data Creation
    • Validation
    • Quality Improvement
definitions
Definitions
  • Data Transformation - Changing data values to a format consistent with integrity and business rules agreed to by data stakeholders.
  • Data Cleansing - Consolidation of redundant customer records. Term used to describe the process of “merging and purging” of customer lists in an effort to reduce duplicate or inaccurate customer records.
  • Data Quality Improvement - The process of improving data quality to the level desired to support the enterprise information demand.
  • Data Quality -definition to follow….
data quality improvement decision tree
Data Quality Improvement Decision Tree

Task Process

Transform

Conform to Business Rule

Data Reengineering

Data Quality

Improvement

Process Reengineering

Standardize

Validate

Match

Dedupe

Integrate

Enrich

Match & Dedupe

Data

Cleansing

tactics and the end game
Tactics and The End Game

“We need better data quality...”

Enterprise Initiative

Select Project

Data Quality Assessment

Report & Recommendations

Source System Clean-up Initiative

tactics and the end game1
Tactics and The End Game

“We need better data quality...”

Data Warehouse

Enterprise Initiative

Select Project

Data Quality Assessment

Data Quality Assessment

Staging Specifications

Report

Report & Recommendations

Source System Clean-up Initiative

Source System Clean-up Initiative

how can we know good data quality
How Can We Know Good Data Quality?
  • Column 1
  • 321453
  • 212392
  • 093255
  • 214421
  • .
  • .
  • .

Is this Good Data Quality?

What can we conclude?

what is data quality
What is Data Quality?
  • Information Quality = f(Definition + Data + Presentation)
  • Definition
    • Defines Data
    • Domain Value Specification
    • Business Rules that Govern the Data
    • Information Architecture Quality
  • Data Content
    • Completeness
    • Validity/Reasonability
  • Data Presentation
    • Accessible
    • Timely
    • Non-ambiguous
common data quality problems
Data Content

Missing Data

Invalid Data

Data Outside Legal Domain

Illogical Combinations of Data

Structural

Record Key Integrity

Referential Integrity

Cardinality Integrity

Migration/Integration

Rationalization Anomalies

Duplicate or Lost Entities

Definitions and Standards

Ambiguous Business Rules

Multiple Formats for Same Data Elements

Different Meanings for the Same Code Value

Multiple Codes Values with the Same Meaning

Field Used for Unintended Data

Data in Filler

Y2K Violation

Common Data Quality Problems
slide13

Building Blocks

to Data Quality

building blocks of a data quality program
Building Blocks of a Data Quality Program

Benefits Realization

Strategic

Defect Prevention

DQ Requirements

Quality Reengineering

Enterprise Cultural Shift

QC/Process Auditing

Data Stewardship

Tactical

Validation

Quality Improvement

Meta Data Creation

Measure

Analyze & Certify

Rule Disclosure

steps to tactical data quality
Steps to Tactical Data Quality

Rule

Disclosure

Measure

Quality

Analyze &

Certify

Meta Data

Creation

Validation

Quality

Improvement

sources of meta data
Sources of Meta Data
  • Legacy Meta Data
    • Data Models, Process Models
    • Data Dictionary, Definitions, Aliases
    • Glossary of Terms
  • Transformation Meta Data
    • Data Mapping
    • Transformation Rules
    • Error Handling Rules
  • Access Meta Data
    • Data Directory
    • Data Definitions
  • The Subject Matter Expert
  • Database Directory
  • Domain Values, Range of Values
  • Run Books
  • Derived Data Calculations
  • Audit Statistics
  • Source & Transformation
acquiring good meta data is essential
Acquiring good Meta Data is Essential

Collect

Documentation

Validate the

Meta Data

Assess the

Data

Report

Findings

Collect

Documentation

Assess the

Data

Report

Findings

Validate

Findings

Preferred

Collect Valid

Meta Data

Assess the

Data

Report

Findings

Meta Data can be gathered before, during or after the Assessment

“You can pay me now, or you can pay me later…”

slide20

Measuring

  • Data Quality
  • Techniques
  • Tools
  • Methods
how can data quality be measured
How can Data Quality be Measured?

“One accurate measurement is worth a thousand expert opinions”

[Grace Hopper, Admiral, US Navy]

  • Customer Complaints
  • User Interviews & Feedback
  • Customer Satisfaction Survey
  • Data Quality Requirements Gathering
  • Data Quality Assessments
measuring data quality tools
Measuring Data Quality - Tools
  • Analysis Tools
    • Specifically designed assessment tools
      • Quality Manager, Migration Architect
      • N & A: Trillium, Group-1, ID Centric, Finalist, etc.
    • Improvisations
      • SAS, Focus, SQL, other query tools
  • Other Necessary Tools
    • File Transfer
    • Data Conversion
assessment measurements
Assessment Measurements
  • Level 1: Completeness
    • Nulls or Blanks
    • Misuse (or overuse) of Default Values
  • Level 2: Validity
    • Data Integrity Anomalies
    • Invalid Data based on Business Rule
  • Level 3: Structural Integrity
    • Primary Key Uniqueness
    • Key Structure (Cardinality, Referential Integrity, Alternate Keys)
  • Level 4: Business Rule Violations
    • Relationship between two or more fields
    • Calculations

Field Integrity

Intuitive Integrity Rules

Business Rule Integrity

Requiring Meta Data

slide24

Report Card

  • Analyze
  • and Certify
  • Identifying Problems
  • Sizing up Problems
  • “To Certify or Not to Certify…”
template field level
Template - field level
  • Value - the domain occurrence
  • Frequency - the number of occurrences within the data set
  • Percent - the % of the whole set
  • 88 Info - the copybook definition for the value
  • Analysis - comments about our findings
identifying problems
Identifying Problems

1

2

3

  • Analysis (and Discovery)
  • 1. Is the field required? If so, blanks indicate an anomaly.
  • 2. Are the values “ID206” and “STANG” allowed? (Is this a problem with the data or the Meta Data?
  • 3.Some values occur in only 1.3% of the records. Is this telling us there is a problem?
field analysis
Field Analysis

In a range of values, in the absence of domain rules,

investigate the first and last .2%

Bell curve distribution

meta data supply chain
Meta Data Supply Chain

Definition &

Domain

Meta Data Gathering

Data Quality

Statistical

Reports

Knowledge

Management

DQ Assessment

Transformation & Edit

Recommendations

Data Quality

& Definition

Validation

SME Validation

Data Cleansing

Update

Meta Data

Field Name

Data Inventory

Meta Data

Work Groups

Data Requirements

slide38

Report Validation

SME validation… an opportunity to improve Meta Data

1. Supply a clear name for the field.

2. Is there a good definition?

3. Make the business rules public?

4. Will the SME initiate a data cleansing initiative?

5. Does the SME recommend edit or data transformation rules?

6. Are the findings consistent with the SMEs expectations?

Report Sections

Identification

1

2

3

Field Definition & Rules

4

5

Score & Explanation

Statistical Reports & Analysis

6

slide39

Quality

Improvement

next steps
Next Steps

Continued

Monitoring

Legacy

Data

Extractions

Monthly

Reports

Information

Management

Objectives

Initiatives

Data Clean-up

Legacy System

Enhancements &

Re-engineering

Management

Report &

Recommendations

Perform

Baseline

Assessment

(Discovered

Business

Rules)

Steering

Committee

Data Migration

Transformation

& Cleansing

Specifications

Metadata,

Models,

Reports, etc.

lessons learned data cleanup
Lessons Learned- Data Cleanup

$$

(More complete,

more error prone)

(Most complete,

most accurate,

most costly,

most timely)

(More accurate,

less data)

100%

Completeness

100%

Accuracy

summary
Summary
  • We made the distinction between:

- Data Migration

- Data Quality

- Data Cleansing

  • We defined what “good” data quality is.
  • We discussed that there could be 10 or more processes that could take place in building a comprehensive data quality program for the enterprise.

- Tactical should precede the Strategic [or be the 1st step of ]

  • There are 6 steps to an effective tactical data quality initiative:

- Rule Disclosure

- Quality Measurement

- Analyze and Certify

- Meta Data Creation

- Validation

- Quality Improvement

reference material
Reference Material
  • The Demings Management Method (Total Quality Management), Mary Walton
  • Data Quality for the Information Age, Tom Redman
  • The Data Warehouse Challenge: Taming Data Chaos, Michael Brackett
  • Improving Data Warehouse and Business Information Quality, Larry English
  • DM Review Magazine, Information Quality series by Larry English
slide44
Ron Forino

Director, Business Intelligence

DMR Consulting Group

(732)549-4100 X-8292

[email protected]

[email protected]

ad