Data cleanup unlock the potential at a corporate scale
This presentation is the property of its rightful owner.
Sponsored Links
1 / 27

Data Cleanup: Unlock the potential at a corporate scale PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on
  • Presentation posted in: General

Data Cleanup: Unlock the potential at a corporate scale. Thibault Dambrine. IT professional for 25 years Network Designer ETL Data Warehouse Analyst Interface Specialist ERP Developer Data Quality Experience:

Download Presentation

Data Cleanup: Unlock the potential at a corporate scale

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data cleanup unlock the potential at a corporate scale

Data Cleanup: Unlock the potential at a corporate scale


Thibault dambrine

ThibaultDambrine

  • IT professional for 25 years

    • Network Designer

    • ETL Data Warehouse Analyst

    • Interface Specialist

    • ERP Developer

  • Data Quality Experience:

    • Tasked to work on the pre-conversion data cleanup project during the Shell JDE to SAP transition


Introduction

Introduction

  • Premise:

    “What would a Company-wide Data Quality Initiative look like?”

  • Base:

    • Experience setting up a data cleanup team, prior to a JDE to SAP data conversion

    • Realizing the potential for increasing the data value within the corporation


Defining data quality

Defining Data Quality

  • Intrinsic Data Quality:

    • Accuracy, Completeness, Uniqueness

    • Reliability, Security and Accessibility

    • Contextual Data Quality:

    • Timeliness, Relevance

    • Inter-operability, consistency of identifiers

  • Accessibility and Representational Data Quality

    • Ease of understanding

    • Consistency of identifiers

    • Consistency in structures


Quantifying the cost of quality the 1 10 100 rule additional cost less competitive business

$

Quantifying the cost of Quality: The 1-10-100 Rule - Additional cost = Less Competitive Business

$

Prevention Cost

$

Correction Cost

Failure Cost

The 1-10-100 Quality Cost Rule


The 1 10 100 rule a data quality example

The 1-10-100 Rule A Data Quality Example:

The 1-10-100 Quality Rule Applied to mailing Data:

It costs:

  • $1.00 to verify data at data entry time

  • $10.00 to clean the data after the fact

  • $100.00 to mop up errors caused by bad data

    • Packages mailed in the wrong address

    • Lost revenue

    • Lost customers

    • Bad (sloppy) reputation

    • Additional carrying cost for bad data


Data quality the up side

Data Quality: The Up Side!

Trust

  • Consistent data inquiry results build confidence in information systems

  • Tractability across Business and IT domains

  • Consistent data identifiers

    • promotes internal cross-department reporting

    • Consistency

    • Confidence in results

      Productivity

  • Removing redundant or near-redundant data

  • Maximizes re-use of data

  • Reduces the amount of data being processed

  • Reduces errors

    Reliability

    Consistently good quality data is data you can count on!


A data cleanup initiative

A Data Cleanup Initiative

  • Where to start?

  • Who will enforce such quality initiatives?

  • How will the data quality be maintained on on-going basis?


Dqb task 1 identify sponsor and data quality boss

DQB Task 1: Identify Sponsor and Data Quality Boss

To Identify sponsor:

  • Communicate clear understanding of the cost of bad data

  • Use the 1-10-100 rule

  • Initiative has to be backed with

    • Money

    • Authority

    • Responsibility


Identify data quality boss dqb

Identify Data Quality Boss – DQB

  • Must be knowledgeable on data quality

  • Must be knowledgeable on the Business

  • Will be responsible for data quality

  • Will have authority to make changes

    Note:

    Responsibility without authority will not work


Dqb task 2 identify data sets

DQB Task 2: Identify Data Sets

  • Identify/inventory high-level data sets e.g.

    • CMDB

    • ERP

      • Master Data e.g.

        • Customer Master

        • Item Master

      • Transaction Data e.g.

        • PO’s & Invoices

        • Inventory movements

  • Assign data sets to departments, potential lists of Data owners

  • Note: The final data owners may not be the one initially penciled in at this stage


Dqb task 3 identify business side data owners

DQB Task 3: Identify Business Side Data Owners

  • Data Owners will effectively be the local, more granular, Data Quality Bosses. Again, they will need to

    • Be responsible for the data at their level

    • Have authority to request changes at their level

    • Have bottom up knowledge of the data, understand what “should be there”

  • Setup meetings with every department, in line with the Data Sets identified, with aim of coming up with a set of Data Owners

    • Have a presentation ready

    • Look for individuals who have been in the Business for a long time, who are well respected, who understand the data, the dependencies, and know who to talk to, to get answers, from the bottom up


Dqb task 4 request from data owners the data quality specification or dqs

DQB Task 4: Request from Data Owners the “Data Quality Specification” or DQS

  • DQS is a document that spells out the data quality rules e.g.

    • No duplicates or near-duplicates

    • Data older than x years should be purged or archived

    • Data Dependencies such as no detail without a header or no invoice without a PO

    • No duplicates

    • Consistency e.g. data format

    • Quality audit e.g. postal code matches address

    • More…

  • Note: Some rules will apply in all DQS Documents

  • There is value in sharing, reviewing and updating the DQS over time.

  • Data quality issues are not always apparent until a first cut of data is cleaned up


Dqs part of task 4 also look for

DQS (part of Task 4 ) Also look for:

  • Data Islands

    • Lack of consistent identifiers inhibit a single view of the big picture

  • Data Opportunity

    • Could correlated data sets be more useful to the Business?

  • Data Surprises

    • Misplaced Data

    • Information buried in free-form fields


Data cleanup unlock the potential at a corporate scale

DQB Task 5: Build IT Data Quality Team

  • Data Quality

    • Cannot be a “side job” or a part-time task

    • Must be staffed with individuals who understand data. Best candidates

      • Proficient in SQL, data extract techniques

      • Understand ETL tools and techniques

      • Are detailed-oriented

      • Experience: Data Warehouse staff is good fishing grounds for such individuals


Mid presentation recap all the ingredients are now in place the real work can start

Mid-Presentation Recap: All the Ingredients are now in placeThe real work can start!

  • Name Data Quality Sponsor & Data Quality Boss (DQB)

  • Identify Data Sets

  • Data Quality Owners

  • Data Quality Specifications (the DQ Roadmaps)

  • IT Data Cleansing Team


Introducing the data quality cycle

Introducing: the Data Quality Cycle

  • We now have

    • a sponsor

    • Identified data sets and data owners

      • They have produced Data Quality Specifications

    • An IT Team ready to work on the first Data Quality measurements, based on the DQS

  • Next step: Initiate the cleanup

    • Not a single iteration but one that will be repeated in a cyclic fashion


The data quality cycle simple view

The Data Quality Cycle: Simple View

DATA


Data quality cycle corporate version

Data Quality Cycle - Corporate Version

Analyze Data

Improve Data

Continuous Improvement

Monitor Progress

Formalize Schedule

Make Progress VISIBLE


Step 1 identify bad data bad data definition does not adhere to dqs

Step 1: Identify Bad DataBad Data Definition: Does not adhere to DQS

  • Coordinate meetings to translate DQS documents into a suite of repeatable data cleansing procedures

  • Very important that these procedures should be repeatable, schedulable on regular basis

  • Initial Focus: Identify Bad Data

    • Bad data(does not adhere to DQS),

    • Inconsistent data

    • Old Data

    • Note: DQS will spell out rules for “old” and “inconsistent”

  • Ensure results are reported in a format readable by Management at executive level. This initiative has to be VISIBLE


Step 2 data cleansing

Step 2: Data Cleansing

  • Data Cleansing can be done in two ways:

    • Automated, IT based cleanup

    • Business-based, manual cleanup

  • Once the bad data is identified, determine who must do what

    • Business-based,

      • manual cleanup appropriate for more subtle tasks, e.g. to determine which of two duplicates identified should be kept. These tasks may require additional research, phone calls etc.

    • Automated,

      • IT based cleanup good for simpler tasks e.g. making telephone number formats consistent

      • Can be also sub-contracted to specialized data quality companies


It based data cleansing and outsource considerations

IT-based Data Cleansing and Outsource Considerations

  • Data cleansing may take valuable time from the Business, which is not available – Data Cleanup effort may suffer as a result

  • Not all data cleansing is a simple SQL

  • Not all data is most confidential

    When considering data cleansing tasks, look at all possible options

  • Outsourcing some data cleansing tasks may be more economical than doing it all in-house


Step 3 measure progress

Step 3: Measure Progress

  • All programs, procedures written with the aim of identifying data quality issues should

    • Be stored, like any other programming assets

    • Be repeatable and be schedulable

    • Provide aggregate measures to describe the data cleanup status e.g.

      • X duplicates

      • Y old records

      • Z invoices without PO

    • Progress

      • Has to be measured in a published dashboard

      • Has to be visible by the entire organization to provide a sense of value


Step 4 data hygiene schedule the cleanup review tasks ensure results are visible

Step 4: Data Hygiene: * Schedule the Cleanup/Review Tasks * Ensure results are visible

  • Bad data is created EVERY DAY

  • Data quality is an on-going effort

  • Establish, publish a schedule, part of the dashboard

  • Ensure there is visibility and accountability to ensure the levels of bad data

    • are going down with time

    • Or are kept at a minimal level


Step 5 sharpen the saw

Step 5: Sharpen the Saw

Once the data cleanup cycle is established

  • Review Results

  • Review DQS documents periodically (setup schedule)

  • Get Business input

    • Improve process

    • Give input on improvements to be made

  • Ask the Business to come up with performance improvement measures born from the Data Quality initiative


Conclusion

Conclusion

Two sets of Five activities best define the Data Quality

The Foundation Setup

  • Identify Data Quality Sponsor & Data Quality Boss (DQB)

  • Identify Data Sets

  • Identify Business Side Data Owners

  • Define Data Quality Specifications (DQS) - the DQ Roadmaps

  • Appoint IT Data Cleansing Team

    The Data Quality Cycle – Ongoing

  • Identify Bad Data

  • Initiate Data Cleansing

  • Measure Progress

  • Initiate Data Hygiene, Data Cleanup Cycle

    • Schedule Cleanup/Reviews

    • Ensure progress visible

  • Sharpen the Saw


Links

Links

  • How to improve Data Qualityhttp://www.informit.com/articles/article.aspx?p=399325&seqNum=3

  • Predefined data quality rule definitions

    http://pic.dhe.ibm.com/infocenter/iisinfsv/v9r1/index.jsp?topic=%2Fcom.ibm.swg.im.iis.ia.quality.doc%2Ftopics%2Fpdr_predef.html

  • Creating Effective Business Rules: Interview with Graham Witthttp://dataqualitypro.com/data-quality-pro-blog/how-to-create-effective-business-rules-graham-witt

  • Gartner Magic Quadrant on Data Quality Tools – “Demand for data quality tools remains strong”

    http://www.citia.co.uk/content/files/50_161-377.pdf


  • Login