Practical Meta Data Solutions
Download
1 / 43

Practical Meta Data Solutions For the Large Data Warehouse DAMA - MN October 16, 2002 - PowerPoint PPT Presentation


  • 165 Views
  • Uploaded on

Practical Meta Data Solutions For the Large Data Warehouse DAMA - MN October 16, 2002. Agenda. Introduction Enterprise meta data strategy Data warehousing meta data strategy Project approach for a practical solution Meta data architecture Defining ROI Tools/options for moving forward

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Practical Meta Data Solutions For the Large Data Warehouse DAMA - MN October 16, 2002' - blaze


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Practical Meta Data Solutions

For the Large Data Warehouse

DAMA - MN

October 16, 2002


Agenda

Introduction

Enterprise meta data strategy

Data warehousing meta data strategy

Project approach for a practical solution

Meta data architecture

Defining ROI

Tools/options for moving forward

Meta data summary

The data quality cycle

Questions


Everyone knows

Meta data is:

  • Valuable

  • The right thing to do

  • Important for long-term success

Meta

Data

So why isn’t everyone doing it?

Introduction


Why isn’t meta data being addressed?

Don’t know how to demonstrate the ROI

Too complex or we don’t know where to begin

Can’t agree on what should be done

Market is not mature enough – we’ll wait until it settles down

We’ve tried and failed

Where do we start?

How do we justify it?

Introduction


What is the cost of dirty data?

“The cost of poor data may be 10-25 percent of total revenues” - Larry English

Data quality issues torpedoed a $38 million CRM project - Fleet Bank 1996

The Data Warehouse Institute (TDWI) estimates that poor quality customer data costs U.S. businesses a staggering $611 billion a year in postage, printing, and staff overhead

Real Life Insurance Example - $10 million annually

  • 2 million annual claims with 377 data items each

  • Error rate of .001 generates more than 754,000 errors per month and over 9.04 million annually

  • If 10% are critical to fix, there are still over 1 million errors to correct

  • Even at a conservative estimate of $10 per error, the companies risk exposure to poor claim information is $10 million a year

Source: TDWI Data Quality and the Bottom Line

Introduction


Limited exposure to the data

Greatly increased exposure to the data

The data warehouse and Operational Data Stores have revealed the impact of data quality problems on the business

Meta data is the component that ties the data warehouse and ODS to the legacy systems and exposes the data quality problems

Understanding Data Quality Issues

Legacy applications

Introduction


Setting boundaries

You can’t do it all at once!

Enterprise meta data strategy


Data Warehouse and Business Intelligence

Enterprise Architecture

  • EAI

  • ERM

CDISC

HL7

Clinical Trials

Clinical

Patient Care

Component Management

Document Management

Content Management

and

Portals

Business Rules

Establishing a meta data strategy

Select a practical starting point and build on your success!

Enterprise meta data strategy


Starting with the data warehouse

Data Quality

Systems

A practical strategy with real business benefits!

Data Warehousing meta data strategy


Why is the DW a good starting point?

DW typically focuses on the data that most needs to be shared

DW presents the greatest need to understand the data because it is cross-functional

Real business benefit can be obtained for a practical investment

Existing DW are being re-architected

Meta data standards and tools are beginning to have an impact in this area

Challenges are created in a best-of-breed development environment

Built for today - architected for tomorrow

Data Warehousing meta data strategy


MD integration challenges in the DW architecture

Data Warehousing meta data strategy


Build and

Administer

Usage

Key Attributes

Mappings

Performance

Integrity

Scalability

Operator

Modeler

Designer

ETL Jobs Statistics

Data Model

ETL Tool

Navigate

Alias

Canned Reports

Refresh Data

Location

Expedience

Accuracy

New/Casual User

Executive

Frequent User

Data Model

Business Intelligence

Job Schedules / logs

Defining meta data for the DW

The formal approach to managing the processes and information needed by both business and technical associates to define, build, administer and navigate the DW

Building

Blocks

Example

Benefit

User

Source

Define

Business Meaning

Calculations

Lineage

Recognition

Understanding

Trust

Casual User

Power User

New User

Heads and documents

Spreadsheets

ETL mappings

Robust, integrated meta data solutions will aid in using, developing and operating the data warehouse source: META Group

Meta data solutions are also referred to as: Information Catalogs or digital DNA

Data Warehousing meta data strategy


The need for a complete project lifecycle

  • Document today’s meta data environment

    • Identify meta data users and its sources

    • Identify the business drivers

      • Problems, opportunities and associated costs

    • Identify requirements

      • How does meta data address the business drivers

      • What are the savings

    • Define objectives and benefits

  • Develop a meta data architecture

    • Processes and disciplines

    • Integration requirements

    • Delivery and usage requirements

    • Technology component / tool

    • Change management process priorities

  • Develop project plan and cost/benefits

  • Build with an Iterative release approach

Building an Architecture

Process Layer

Processes and disciplines required to generate and sustain complete and accurate meta data

Integration Layer

Exchange of meta data between multiple tools across the data warehousing framework

Technology Layer

Automate processes and integration and provide a single Web based view consolidated across tools

Project Approach for a practical solution


1

2

3

4

What are successful projects addressing?

  • Lineage

    • What data is available, where it came from and how it’s transformed

    • Including definition, currency and accuracy

  • Appropriate information by user type

    • Easy access to meaningful meta data

    • “How is it different from what I’m used to seeing?”

  • Impact analysis across tools and platforms

    • Impossible to do without a formalized meta data technology solution

  • Versioning

    • How has it changed over time?

    • Moving from development, to test, to production

  • Live meta data

    • Meta data is a natural part of the process

    • A function fails if the meta data is not complete and accurate

Project Approach for a practical solution


6

Object Model

File

Systems

Modeling

ETL

RDBMS

Mappings

5

7

Bus Req

Meta data architecture for the DW

Collection Points:

Processes and disciplines

Live meta data concepts

System of Record

Auditing

Balancing and controls

Data to support lineage

Data Cleansing

Householding

Data to support lineage

Change management

Source Control

Configuration management

Source systems

Data warehouse

Staging area / ODS

Data Validation

Data Profiling

DW & ETL

Design

Job

Execution

Migration

Mgmt

Data

Quality

BI

1.Collection

2.Integration

Business and technical

user Interface

Meta Data

Repository

Engine

WEB

3.Usage

Consolidated view across tools

Centralized Meta Data Repository

  • Manage redundancy

  • Provide one view across tools

Physical Repository

RDBMS

Oracle, DB2, others

Meta Data Architecture


Components of a well-architected DW solution

  • Short- and long-term requirements are well-defined displaying clear business benefits

  • Meta tags required to support lineage, balancing and controls, etc., are built into the DW architecture

  • Live meta data concepts are rigorously followed

  • A plug-and-play architecture is used

    • Support for multiple tools in a category, i.e., Informatica and Ab Initio for ETL

    • Simplifies future transitions to new technology and tools

Built for today - architected for tomorrow

Meta Data Architecture


8

How is a meta data investment in the DW justified

  • Reduced total cost of ownership

    • Impact analysis -- 50 percent of development efforts are spent assessing what is impacted by the change

    • Configuration and migration management

    • Eliminate redundant work

  • Improved user acceptance

    • What’s available and how to access it

    • Business user understands the data and where it came from -- how is it different from the operation informational systems?

  • Risk avoidance

    • What’s the impact of not delivering the business benefits used to justify the DW

  • Industry or government regulations

  • Best practices

Defining ROI


The CWM standard as an enabler

  • The Common Warehouse Metamodel

  • Model driven architecture

    • Based on object oriented modeling and development

    • Building blocks: UML, MOF, XMI and OCL

    • The model generates:

      • Repository data structure changes

      • APIs for models to interoperate

      • APIs to load and retrieve meta data

      • CORBA components

  • Meta data exchange format

    • Based on the CWM and Standard DTDs

    • XML data streams following the XMI standard

CWM is supported by:

  • Oracle

  • IBM

  • SAS

  • Adaptive

  • Hyperion

Tools / Options for moving forward


Meta data management repositories

Selection Criteria

  • Manage associations across tools

  • Search and retrieval across tools – ability to display a single consolidated view

    • Lineage

    • Impact analysis

    • Plain English definitions

    • Subject areas

  • Ease of Extensibility

  • Customizable user interface using an industry standards web solution

  • Reduce integration cost in a best-of-breed development environment

  • Enable Plug-and-play tool strategy

  • Available automated bi-directional meta data exchange bridges / adapters

  • Template driven retrieval of meta data

  • Group / role based security

  • Interoperability between metamodels

  • Support of industry standards – CWM, HL7, etc.

  • Support of Federated repositories

  • Ability to expand beyond the DW

  • Versioning – extracting a time slice

Tools / Options for moving forward


Types of Meta data management repositories

  • Enterprise – Supports a broad range of functionality including enterprise architecture, data warehousing, business intelligence, component management and others

    • CA – Advantage(formerly Platinum)

    • ASG - Rochade

    • Adaptive Foundation(formerly Unisys)

  • DW Suite solutions – meta data solutions that are integrated into a suite of tools primarily from a single vendor designed to build and maintain the complete data warehouse framework

    • Microsoft Repository

      • DWSoft – Navigator web browser

    • SAS –Warehouse Administrator

    • Oracle Warehouse Builder

      • OWB Repository

  • Enterprise Data Warehousing and ETL – supports data warehousing, ETL and business intelligence activities typically in a best-of-breed toolset environment

    • Ab Initio

    • Informatica

      • Data Advantage Group – MetaCenter

    • Ascential – Meta Stage

  • Modeling – supports development and versioning activates for ER and object modeling

    • ERwin Suite

    • Rational Rose

    • Oracle Designer

    • Popkin System Architect

Tools / Options for moving forward


Options for moving forward with a DW solution

  • Build from scratch using an RDBMS and custom web delivery application

  • Implement a repository tool and extend it as needed

    • Enterprise repository tool

    • Warehouse suite solutions

    • ETL tool repository

No complete solution exists today. Establish a foundation and gradually develop a complete solution through a series of iterative releases!

Tools / Options for moving forward


Development lifecycle

Strategy

Development

Architecture

Development

Design and

Construction

4 - 6 weeks

3 - 4 weeks

6 - 10 months

Define business objectives, requirements and benefits

Understand standards

Research repository tools

Define technical objectives/requirements

Document capabilities

Relate standards, tools, requirements, architecture

Develop scope and priorities

Define meta data sources

Define repository

Define hardware platform and software requirements

Define meta data integration ETL process

Define meta data delivery/display mechanisms

Iterative release approach

Design and construction of meta data integration ETL

Design and construct logical and physical meta data models

Hardware platform implementation

Test

Rollout

Tools / Options for moving forward


Pitfalls to avoid

Selecting a repository tool without defining requirements first

Expecting the repository tool to solve process problems

Selecting an architecture that is not extensible or can’t scale

Underestimating the effort

Relying too much on manual entry

Selecting an initial project that does not deliver adequate business benefit

Selecting an initial project that is too complex to be practical

Many meta data projects fail because they are either too big to be practical or too small to deliver real benefits

Meta data summary


Web Based Access

DWSoft Web Browser

XML and ASP

Microsoft Meta Data Services

SQL Server 2000

One example of a hybrid solution

Cubes, Hierarchy and Levels

Field-to-Field Derivation

Source-to-Target Mappings, Code Mappings, Business Rules, Data Quality, Contacts and Document Definitions

DATA Modeling

Business

Intelligence

OLAP Cubes

Business Definitions for

DW Attributes

Logical & Physical DB Models

Access Basic

Oracle 8i

Code & Data

Mapping

SQL Server

2000

ER/Win 3.5

Pro Clarity

Custom API

Interfaced

3rd Party/ API

API

Staging

Area

Meta Model

API

Data

Warehouse

Integrated API

Physical RDBMS

Info through OLE DB

Codes Frequency

Min/Max/Averages

Custom API

Oracle 8i

ETL

DML Information

Fast, inexpensive, low-risk approach

implemented at a major insurance company

Ab Initio

Meta data summary


Taxonomy

Meta data

Taxonomy

Meta data

The DW meta data solution within the enterprise meta data architecture

Enterprise Architecture

EAI

ERM

Oracle

Data Warehouse Platform

  • Operational data store

  • Customer information

  • Historical information

  • Security

Document/Email Platforms

  • Documents

  • Catalogs/digital content

  • Email

KM Vendors

  • Autonomy

  • Intraspect

  • Documentum

  • BRS Rule Track

Oracle

Oracle

Metadata

Repository

Oracle

Data

Unstructured

Data

Unstructured text and digital assets

Data

Knowledge Management Engine

  • Business rules

  • Content management

  • Automatic taxonomy generation

  • Neural net search engine/adaptive learning

  • Text mining

  • Personalization

  • XML, Java, HTML LDAP:support

Data

Oracle

Operational

Data

Click stream capture

Customer profile information

Metadata

Repository

Catalogs, content, documents, web links…

ETL Data Management Platform

  • Centralized meta model

  • Complex data transformations

  • Meta data repository

  • Real-time extractions

  • XML, Java, HTML, LDAP

Metadata

Click stream capture

EIP Vendors

  • Viador

  • Hummingbird

  • TopTier

Flat Files

Oracle

Analytical data/application launch

Oracle

MicroStrategy BI Platform

  • Analytical processing

  • Graphical visualization

  • Reports

Personalized

Portal Access

EIP Engine

  • Browser-based access

  • Personalization

  • Common Authentication Proxy

  • Automatic taxonomy generation

  • Structured/unstructured info integration

  • XML, Java, HTML LDAP support

Meta data summary


Increasing the odds of success

A data warehouse project without a formalized meta data facility has only a one in four chance of being highly successful; still, in the heat of the DW battle, rarely is meta data seen on the front line!

Is it worth the risk not to do it?

Source: META Group’s industry study: Data Warehouse Scorecard: Cost of Ownership and Successes in Application of Data Warehouse Technology

Meta data summary


The data quality cycle

  • Identify business drivers

  • Set scope

  • Define metrics

Detect

communicate

communicate

Correctness Defects

Integrity Defects

Presentation Defects

Application Defects *

Correct

/ Repair

Measure /

Make visible

communicate

communicate

Prevent

* How the user applies the data

Data quality cycle


9

Data quality rules progression

Content and structure

Correctness

(value based)

Integrity

(structured based)

Focus on data content

Focus on data structure

Data Quality

Rules

Governing the Processes

Data Cleansing

and

Transformation

Rules

Examine and understand data

Change the data

Inductive Rules

Deductive Rules

Data Profiling

Column profiling Dependency profiling Redundancy profiling

Data quality cycle


The data quality cycle

Additional activities to deliver data quality

  • Consolidate source systems to reduce collection points and minimize system interfaces

    • Consolidate multiple non-integrated legacy application

    • Source independent data marts from the data warehouse

  • Consolidate shared data

    • Use an ODS to shared data across systems

    • Use reference tables and keys to logically integrate data that must remain distributed

  • Implement a hub / ODS for data integration

    • Provide a single source of clean data

    • Reduce system interfaces

Data quality cycle


How meta data supports data quality

  • The meta data solution manages many components needed to support a sustained data quality effort

    • Navigation meta data

    • Lineage

    • Plain English Definitions

    • Calculations and transformations

    • Currency

    • Identify owners and stewards

    • Business rules

    • Help identify redundency

  • The repository captures and exposes data quality statistics to a wide audience

  • The repository web interface can provide a mechanism for soliciting feedback

Data quality cycle


What is the Value?

“Companies that manage their data as a strategic resource and invest in its quality are already pulling ahead in terms of reputation and profitability from those that fail to do so.”

Source: Global Data Management Survey 2001

PricewaterhouseCoopers

Data quality cycle



Easily drill to:

  • Target data

  • Transformations

  • Additional source data

Meta data usage – development samples

Data lineage – from source or target

List the fields in a source file

Example 1a


Meta data usage – development samples

Data lineage – data quality review

Example 1b


Atlas Source System

1,016,575 Monthly Billing Codes

57% - Atlas Billing Codes

24% - Total Data Warehouse

Cyberlife IL

437,296 Monthly Billing Codes

57% - Cyberlife IL Billing Codes

10% - Total Data Warehouse

Meta data usage – development samples

Data lineage – understanding the data

Example 2


Meta data usage – development samples

Impact analysis

Display all the transformations For a Column in the warehouse or a field from a source system

Example 3


DDL

RDBMS

Tables

Columns

ETL

ETL Design

Codes and data mapping application

DDL

RDBMS

Tables

Columns

Code mappings

ETL

Invalid or missing code mappings

Suspense

Meta data usage – development samples

Live meta data

Example 4


Meta data usage – development samples

Integrated view of meta data from multiple sources

Extracted from the repository

Includes data from Oracle

Includes data from ERwin

Example 5a


Meta data usage – development samples

Integrated view of meta data from multiple sources

Example 5b


When is meta data not meta data?

Column

Column level meta data should be incorporated into the centralized meta data solution for easy display to a wide audience

Column definition

% of records from each source system

Counts and % of unique values for a code

Audit and Meta Tags

Row Claim Source System Date Source System

Transaction Identifier Updated Code Value

Row level information should be captured, managed and displayed by the application, i.e., the Data warehouse, data mart or other collection points

Example 6


Limited exposure to the data

An awareness of data quality issues

Impact of problems not easy to see

Few sustained activities to correct the problems

Greatly increased exposure to the data

Impact of problems clearly felt

Few sustained activities to correct data problems

Awareness of meta data problems

Few sustained activities to correct meta data

Greatly increase the exposure to meta data

Navigation meta data increases data access

Lineage

Plain English Definitions

Calculations and transformations

Identify owners and stewards

Business rules

The importance of a meta data repository

Legacy applications

Example 7


ROI for meta data

Issue

Benefit

Business Analysts told us they spend up to 70% of their time locating the right information source and resolving multiple versions of the truth

At least 20% improvement in the time required to locate and validate information

Technical analyst’s and developer’s told us they spend up to 80% of their time finding the data needed to satisfy a request

At least 20% improvement of application development and maintenance activities

Business and technical analysts told us it requires six to nine months for a new associate to become proficient in using data

Reduced learning curve for new associates by 3 months and lower the mentoring required from key resources

Lack of complete automated Impact analysis within and across tools creates a serious risk when implementing changes

At least 20% reduction in the time required to perform a complete impact analysis and reduce risk of errors when migrating changes to production

Quote from the META Group’s series of white papers addressing application delivery strategies

“meta data interchange will improve Application Development and maintenance efficiency by up to 30%, and real-time meta data interoperability will enable up to 50% improvement in Application Development and maintenance efficiency.”

Example 8


What does correct mean

Correctness

Rules

  • Accuracy

  • Consistency

  • Completeness

  • Balancing

  • Continuity

  • Precedence

  • Currency

  • Duration

  • Retention

  • Precision

  • Granularity

Activities

Validation

Does it match the rules

Verification

Does it make sense as applied to other reliable sources

Accurate

Data can be valid but still not be accurate

Inspection

Can be as simple as spot checking or as thorough data-driven discovery inspection using techniques like: pattern recognition, classification and probability

Example 9


ad