1 / 33

Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality Jay Zaidi Bonnie O’Neil

Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality Jay Zaidi Bonnie O’Neil (Fannie Mae) Data Governance Winter Conference Ft. Lauderdale, Florida November 16-18, 2011. Agenda. 1. Introduction. 2. Data Quality Challenges and Opportunities. 3.

osborn
Download Presentation

Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality Jay Zaidi Bonnie O’Neil

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enterprise Data Quality Dashboards and Alerts: Holistic Data Quality Jay Zaidi Bonnie O’Neil (Fannie Mae) Data Governance Winter Conference Ft. Lauderdale, Florida November 16-18, 2011

  2. Agenda 1 Introduction 2 Data Quality Challenges and Opportunities 3 Holistic Data Quality (HDQ) 4 Enterprise Data Quality Solutions Architecture 5 Enterprise Data Quality Dashboard Example

  3. Meet the Authors – Jay Zaidi • Enterprise Data Quality Program Lead, Fannie Mae • 15+ years in Enterprise Data Management and Solution Architecture • Specialized in Financial Services and Healthcare domains • Contact: 202-590-3131 • jayzaidi@gmail.com

  4. Meet the Authors – Bonnie O’Neil • Technical Data Architect, Fannie Mae • 20+ years as a Data Architect • Author: 3 books • Most recent: Business Metadata • Author, over 50 articles & white papers

  5. Data Quality Management – Challenges and Opportunities “Holistic Data Quality (HDQ)” Data Silos Data Volumes and Velocity Data Optimization and Scalability Simplify Data Architecture Complex Data Architectures Real Time Data Quality Monitoring Real Time Enterprise Requirements Strong Data Governance Lack of Accountability Proactive Data Quality Controls Reactive Mode Automated controls and monitoring Lack of Straight Through Processing Leverage “Big Data” Solutions Structured and Unstructured Data (email, video, logs, system events etc) High level of maturity in Data Quality Management is required to address operational challenges.

  6. The Data Quality Maturity Journey STEP THREE STEP TWO STEP ONE CONSTRUCTING THE RAILROAD FOUNDATION & FRAMEWORK EXECUTION • Tool Deployment • Reporting Capabilities • Training • Communication • Change Management • Awareness • Proactive DQ Controls • DQ Continuous Improvement • DQ Services • DQ Use Cases • Solution Architecture • Industry Tool Selection • Consistent DQ Definitions Robust data quality management is required to support Regulatory Compliance, Risk Management, Accounting, Financial reporting and other business functions.

  7. The Data Architecture Spaghetti Department Two Operational Data Store Transactional Store Data Mart Transactional Store Data Warehouses Data Mart Operational Data Store Department Three Department One Diagram by Arnon Rotem-Gal-Oz, April 2007 How do you manage the quality of business critical data in a dynamic and highly complex environment?

  8. The Information Supply Chain Transparency into quality across supply chain Diagram by George Marinos - The Information Supply Chain: Achieving Business Objectives by Enhancing Critical Business Processes, April 2005 Each link of the information supply chain is dependant on the other – strong controls are needed to manage business critical data.

  9. Guiding Principles • Identify and address data quality issues at point on entry into eco-system • Externalize data quality rules from code (rules engine, calculation libraries, derivation logic, etc with governance and controls) • Manage enterprise critical data at the enterprise level (ent. Dg, ent. Dq group) and line of business data at local level (local dg and dq) • Measure quality of data at systems of record and critical stores, compare against thresholds and tolerances and remediate proactively • EDQ team will monitor and manage

  10. Data Quality Maturity

  11. Data Quality Use Cases • Process Externally supplied data • Reconcile data between data stores or data store and files • Certify the quality of data • Score the quality of data • Identify data anomalies in data (db, files, xml, etc.)

  12. Data Quality Toolkit • DQ Standards and Policies • DQ Methodology • DQ Dimensional Framework • DQ Development and Support Model (roles, responsibilities, deliverables by team across the SDLC life cycle) • DQ Best Practices • Data Quality Requirements Template • Data Quality Metrics Template • DQ tasks inside SDLC Methodology • DQ Solution Architecture • DQ Training Documentation • DQ Business Case Deck with elevator speech • Governance structure – custodians, trustees, stewards, business data lead • Map of critical data, SOR’s, custodian, trustee, bdl, • Project plan activities related to a DQ project • On-boarding documentation for tools, dashboards etc • DQ Deployment Model (Centralized vs Federated vs. Hybrid) • Lessons Learned/Challenges you will hit • Change Management Plan • Stakeholder Communication Plan • DQ Charter, Strategy, Approach, Sponsorship • DQ Case Studies – business value add • Synergy between DQ and DG • Organizational structure

  13. Conceptual Solution Architecture

  14. Deployment Models • Central vs Federated

  15. Challenges You Will Face and Your Response

  16. Typical Business Scenario Identify anomalies and remediate issues (Data Quality Tool and EDQ Dashboard) Analyze Data and Conduct Forensics (Data Quality Tool) Implement Real Time Data Quality using DQ Services (Data Quality Tool) Reports & Executive Dashboards Internally or Externally Supplied data Enterprise Applications Enterprise Data Stores (Transactional, Operational, Marts and Warehouses) The Enterprise Data Quality Platform provides the tools, methodologies and best practices to identify and remediate data quality issues.

  17. Issue Logging and Resolution

  18. Holistic Data Quality Our focus should be on addressing systemic issues. This requires a switch from “reactive” to “proactive” approaches to data quality andquality that is not evaluated or managed in silos, but addressed using aholistic cross-silo approach. “Holistic Data Quality (HDQ)” is the term that I have coined to address this need. – Jay Zaidi Implementing HDQ at the enterprise level is a strategic, multi-year effort for mid to large-sized firms. If done right - the return on investment is many fold.

  19. Identify data critical for the enterprise Do Not Boil The Ocean Narrowing the scope of the effort will ensure success General population of data elements* 10,000 to 20,000 Critical data for a line of business* (“LOB Critical”) 2,000 to 3,000 Critical data for the enterprise* (“Enterprise Critical”) 400 to 500 Initial Focus should be on “Enterprise Critical” data * Estimates Only Enterprise level governance and quality efforts should focus on Enterprise Critical data. Lines of business should govern and manage the quality of their business critical data.

  20. Dimensions of Data Quality • The concept of Dimensions of Data Quality has been established by many authors in the industry, such as David Loshin and Danette McGilvray: “To be able to correlate data quality issues to business impacts, we must be able to both classify our data quality expectations as well as our business impact criteria.” -David Loshin • Dimensions are facets or specific measurements of data quality, pertaining to specific data elements • The authors propose many variations but the main ones that most agree on are: • Accuracy • Conformity • Completeness • Consistency/Duplication • Timeliness (sometimes called Currency) • Integrity Data Quality Dimensions facilitate the consistent definition of data quality requirements and metrics across various organizations.

  21. Data Quality Development and Support Model

  22. Business Intelligence for Enterprise Data Quality • Business intelligence tool (COTS) • Data quality Commercial-off-the-shelf (COTS) product • Data quality data mart (custom) • Data quality issue management system • Extract Transform and Load (ETL) product • Enterprise Service Bus (SOA and Data Quality Services) SOLUTION COMPONENTS Enterprise Dashboard Data Quality Tool (Profiling/Rule Execution) Data Quality Rules Business Intelligence Tool Data Quality Mart Data Quality Results ETL Data Stores Files

  23. Replace Paper Reports with Business Intelligence Operational Incidents Audit Findings Data Quality Issues Report Regulatory Compliance Issues Weekly Data Management Status Reports Replace mounds of paper with a business intelligence solution – gain access to summary and detailed information on key quality indicators on-demand.

  24. ENTERPRISE DATA QUALITY DASHBOARD (Enterprise View) CRITICAL DATA BREAKDOWN QUALITY BY LINE OF BUSINESS DATA QUALITY MATURITY RELEASE 1 WHOLESALE RETAIL COMMERCIAL WHOLESALE RETAIL COMMERCIAL RELEASE 2 TRENDING OF DATA QUALITY REGIONAL TREND PRODUCT DATA CUSTOMER DATA OVERALL HEALTH HEALTH INDICATORS QUALITY RATING FOR EACH DATA ELEMENT

  25. ENTERPRISE DATA QUALITY DASHBOARD (Retail Business View) OVERALL HEALTH HEALTH INDICATORS CRITICAL DATA BREAKDOWN RELEASE 2 RELEASE 1 TRENDING OF DATA QUALITY DATA STORE TREND BORROWER DATA LOAN DATA QUALITY RATING FOR EACH LOB DATA ELEMENT DATA QUALITY SERVER UTILIZATION

  26. Continuously Measure and Improve Quality Step 1 - Define Step 2 - Measure Define the scope, goal, budget, duration and the data quality problem to be addressed. All relevant data quality statistics and measures important to the enterprise are collected at this stage. Step 3 - Analyze and Improve Step 4 - Control Analysis of the data collected in the previous phase is conducted and root cause(s) identified. Data remediation is implemented to improve the quality of data. Monitor the quality after remediation to ensure that data is defect free. If there are any further changes to be made, the team makes changes and again measures the quality. The Enterprise Data Quality dashboard provides transparency into data quality hotspots that must be addressed proactively.

  27. Lessons Learned • Changing behavior is hard – so use a carrot and stick approach to get people to change • Recognize team members that display the expected behavior and highlight what they did • Roll out the data quality platform (tools, methodologies, best practices) in a phased manner • Educate team members at all levels of the enterprise on the value of strong governance and data quality • Facilitate adoption of the tools and business intelligence offerings by providing them to all organizations free of cost or at a very low cost • Highlight the fact that data is “owned” by the enterprise and not by a particular individual or line of business • Hold people accountable by using operational metrics, data quality metrics and compliance metrics to make your case • Measure the hard savings and business value added by the program and communicate up and down the chain on a regular basis (KPI Dashboard)

  28. Summary • Effective data management provides order out of chaos • Implementing “Holistic Data Quality” provides transparency into data quality issues across the information supply chain and helps in identifying systemic issues • Focus must be on “Enterprise Critical” data initially. Do not try to boil the ocean. • The solution architecture’s core components are the data quality COTS product, a data quality Data Mart and a Business Intelligence tool • Proactive monitoring and measurement of data quality, coupled with an alerting mechanism, significantly reduces operational incidents • Implementing HDQ is a strategic initiative and requires C-level sponsorship and support

  29. Questions!!

  30. Typical Current State Data Flow Transactional and Operational Stores External Data Feeds Data Warehouse External Data Feeds Data Marts Potential data quality problem The current siloed approach to data management is wasteful and doesn’t provide transparency into systemic issues.

  31. Future State Data Flow: Continuous Data Quality Monitoring Transactional and Operational Stores External Data Feeds Data Warehouse External Data Feeds Data Marts DQ Monitoring Enterprise Data Architecture should enable straight through processing and offer operational efficiencies.

  32. Key Process Steps • For each data element that you will monitor do the following (use a template): • Identify the trustee and custodian (DG/DQ) • Identify the system of record (DG/DQ) • Identify the dimensions of data quality that apply (Custodian/Trustee/DQ) • Capture the data quality rules per dimension (Custodian/Trustee/DQ) • Capture the frequency of rule execution (Custodian/Trustee) • Capture the data quality thresholds and tolerances for Red/Yellow/Green status (Custodian/Trustee) • Capture the key metrics that you wish to capture (Custodian/Trustee) • Conduct the logical to physical data mapping (to the data source) for the data element (DQ/Technology)

  33. Dimensions of Data Quality - Explanation Accuracy: How much does the data conform to the real world? Completeness:How much required data is missing? Duplication: Does the same data exist in multiple systems? If so, is it represented the same? Conformity: How much does the data conform to formats and domain values? Integrity: Does the data conform to integrity rules appropriately? Are relationships between elements retained? Currency: How current is the data? When was it last entered or refreshed? There are a dozen or more Data Quality Dimensions that can be defined, but organizations should pick the ones that best meet their needs.

More Related