1 / 55

room 1

room 3. room 2. garage. room 1. kitchen. Metadata. The Information Blueprint. Definition of Metadata. Metadata is Data about Data The map of the Data Warehouse Defines the construction, health and descriptive information Metadata is not The data itself Master data

Download Presentation

room 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. room 3 room 2 garage room 1 kitchen Metadata The Information Blueprint

  2. Definition of Metadata • Metadata is • Data about Data • The map of the Data Warehouse • Defines the construction, health and descriptive information • Metadata is not • The data itself • Master data • External data (depending on the type of data!)

  3. Hmmm, I wonder what this information really means? Management of Data Metadata: the information “yellow pages” What information is available? What does it mean? How was it derived? What was its source? How current is it? Who uses it? How often is it used?

  4. Types of Metadata

  5. Types of Metadata • Technical • Business • Contextual

  6. Technical Metadata • Technical metadata is data about data needed by the technology folks to do their work correctly. This includes the "good ole days" metadata, but adds much more. Technical metadata is used by the IT side to understand how the data warehouse/data mart was constructed. • What is the system of record for a specific piece of data, • What transformations were performed on what source data to produce data in the data warehouse/data mart, • What are the columns in the data warehouse/data mart and what do they mean, • What is used to reconcile the data with the source system, and • When was the last date and time the data was loaded into the data warehouse/data mart.

  7. Business Metadata • Business metadata is data about data needed by the business community to do their work better. Business metadata is used by the business to understand what is available in the data warehouse/data mart and how, intheir terminology, is it built. Business metadata include • Who is the data steward, • What is the confidence level of the dataand its quality, • What algorithm is used to create the values, • What is the definition of this data, and • What reports are available.

  8. Contextual Metadata • Contextual metadata is data that sets "context" of your data. It reallyisn't metadata in the typical sense of the word, but is classified asmetadata nonetheless. Examples of contextual metadata are • Weather reports, • Headlines of the day, and • Social, economic, and political issues. • Contextual metadata is the hardest to collect. • Possible sources are newswire feeds (like AP, Wall Street Journal, Christian ScienceMonitor), Internet sites (http://www.weather.com, http://www.wsj.com),or just plain manual input (which is probably the least desirable). • How does contextual metadata help? • Let's say that your organization is the Department of Energy and you noticed a major jump in spending on security during the late 1990s. Now, in 2009, the spending seems to be trending downward. How do you know why that might be happening? Duringthe late 1990s (see I don't remember the year or the name of the person already!), there was believed to be some breach of security and that classified data was being "stolen". If that information was captured, then when the trend is discovered, we could look at the context of what was happening in the late 1990s to see if it can help understand thetrend.

  9. Importance of Metadata

  10. room 3 room 2 garage room 1 kitchen Importance of Metadata Data Warehouse META DATA Metadata provides the blueprint of the Data Warehouse

  11. Importance of Technical Metadata • Serves the IT community with operational detail about information systems. However • Metadata is not the primary focus of IT • Looked upon as a documentation exercise of minimal value • Often relegated to “nice to have” status

  12. Importance of Business Metadata • Serves the Business Community as a source to discover what and where information exists • Business meaning takes precedence over technical detail (look for commonality) • Looked upon as a key source for knowledge on Operational processes

  13. Importance of Metadata • Serves the Data Warehouse as a key enabler • Of primary importance to DSS Analysts • Metadata is critical to tracking the content and validity of data in the Warehouse • Provides context to the data • Issue: “Knowledge Gap” between OLTP and DW • can affect the success of Data Warehousing • Implementations

  14. Importance of Metadata • What it does • Describes data in operational systems which facilitates • mapping data elements to the DW • data conversion • aggregation & summarization logic • coordinating naming conventions • managing anomalies between physical characteristics of common data across information systems

  15. Importance of Metadata • Metadata provides a new dimension • It allows • Data to be managed over time ( 5 - 10 years) • Data to be managed by context (business meaning and business value will change over time) • Manages structural changes to the DW database (versioning of metadata) • Allows Operational Systems to reinvent themselves by discovering corporate data which exists across systems

  16. Where Do You Find Metadata

  17. Metadata Exists Everywhere! Data Modeling Tool Data Dictionary External Sources Data Warehouse Data Model Physical Database Definition Metadata Manager Operational Data Sources Database Definitions Database Definitions File Definitions COBOL Copybooks Summarization DSS Tool Business Catalog Internal Non-Operational Data Sources Data Extraction Tool Data Dictionary DSS Tool Data Definitions And this is just part of technical metadata…

  18. Metadata and the Data Warehouse Business Rules / Derived Measures Data Access Interface, Transformation, and Load Data Warehouse Data Model Data Aggregation Data Quality Data Warehouse Operations

  19. Interface, Transform, and Load • Describes the interface location and content. • Describes information about the transformation from source system codes to reference data codes. • Describes information about custom transformation, such as, using subsets of the data. • Describes information about the Data Warehouse destination

  20. ETL Metadata Points Data Sources Filter Cleanse Extract Data Warehouse Transform Log/QA

  21. Activity Description Outcomes Extract Pull Data from Operational Systems Raw Data Filter Discard “Noise” data from data set Dirty Data Cleanse Analyze data quality and make corrections Clean Data Transform Rearrange and Summarize Data Useful Data Log/QA Perform Final Check and Build “Yellow Pages” Verified Data Metadata Information Sourcing Activities

  22. DW Data Model • A description of each attribute and entity of the data model. • This is an extract from the CASE tool that manages the data model or has been the output of a data dictionary.

  23. Aggregation • Rules based engine for Aggregation. • States which fields from which DW tables are combined and the algorithm that aggregates the data. • Used to create code or to suggest stored procedures for aggregation.

  24. Data Access - Reporting • Report Generation Metadata • A rules based reporting tool that describes a report format from the header to column and rows. • Report Menu Metadata • Describes the reports that are available. • Describes the Menu that the user is shown for accessing the available reports.

  25. Data Access - Query • Defines canned queries available. • Defines public and private queries. • Allows queries to be combined.

  26. Data Access - End User Data Warehouse Data Model End User Application Interface, Transformation, and Load Application Help files Derived Business Measures

  27. Data Quality • DW Load Statistics • The use of control numbers from the source system, compared to the load data. • DW Quality Rules • Rules that tracked known data trends for report checking

  28. Metadata Components

  29. Metadata Components • Storage in the Warehouse • Operational Mapping • Extract History • Volumetrics • Algorithms • Relationship History • Ownership/Stewardship • External/Reference Data • Business Meaning (Data Models) Storage Mapping History Volumetrics Algorithms Relationships Ownership Reference Data Models

  30. Metadata Components • Storage Requirements • Database Schema • Table Spaces • Database Tables(Dimensions, Facts) • Keys and Indexes • Facts (Attributes) • Information Access (Data Topology) • PC’s • EIS • DSS • Operational Storage Mappin History Volumetrics Algorithms Relationships Ownership Reference Data Models

  31. Metadata Components • Operational Mapping • Location of data sources • Data Element conversion • Physical characteristic conversions • naming changes • default values • encoding • Data Key changes • Logic & Algorithms Mapping Storage History Volumetrics Algorithm Relationships Ownership Reference Data Models

  32. Metadata Components • Extract History • Logged history of data extracts and transformations • Audit logs • Job Scheduling (Batch, On-line) History Mapping Storage Volumetrics Algorithms Relationships Ownership Reference Data Models

  33. Metadata Components • Volumetrics • Number of Tables • Number of Rows • Usage Characteristics • Table Indexing • Aging Criteria History Mapping Storage Volumetrics Algorithms Relationships Ownership Reference Data Models

  34. Metadata Components • Algorithms • Levels of Summarization • Criteria applied to Data Aggregation • Data Derivation Algorithms History Mapping Storage Volumetrics Relationships Ownership Reference Data Models

  35. Metadata Components • Relationship History • Relationship Artifacts • Relationship History • Tables included • Effective Dates • Constraints in Effect • Cardinality in Effect • Description Relationships History Mapping Storage Algorithms Volumetrics Ownership Reference Data Models

  36. Metadata Components • Ownership/Stewardship • Operational Ownership • Updates • Recovery • Accuracy • Data Warehouse Stewardship • Data consistency • Loading • Access History Mapping Storage Algorithms Relationships Volumetrics Reference Data Models Ownership

  37. Metadata Components • External/Reference Data • Location, type and content of external data • Encoded values and changes • Audit log of changes • Date/Time stamps History Mapping Storage Algorithms Relationships Volumetrics Ownership Data Models Reference

  38. Metadata Components • Business Meaning (Data Models) • Data Warehouse Data Model (Logical) • Mapping to Data Warehouse Database Design (Physical) • Mapping to Operational Systems Data Models (Corporate & Business Area) • Mapping to other DW architecture Metadata • EIS/DSS • Data Mining/Data Journalism Mapping History Storage Algorithms Volumetrics Relationships Reference Ownership Data Models

  39. How to use Metadata

  40. How to Use Metadata • Metadata Manager Requirements • Required features include: • GUI • Data Model Management • Model/Data Versioning • Data Access & Security • Integration with the DW DBMS • Integration with DW Architecture • Unstructured Reference Data Management (futures) Storage Mapping History Volumetrics Algorithms Relationships Ownership Reference Data Models

  41. How to Use Metadata However... • Most current Repositories are not extensible • Few specialized tools are available for • Metadata Mining (Data Re-engineering) • There is no standard way to exchange metadata • between various Meta Manager tools • between EIS/DSS tool sets • between OLTP DBMS and DW DBMS • OLTP CASE repositories which manage business models are not geared for Data Warehousing

  42. How to use Metadata • How to support it (Metadata Maintenance) • Care and feeding of Metadata is just as important as the data itself • Other Considerations. • How to get IT and the Business Client to use metadata • Have a single point of contact • Always do it at their terminal (DW or Client) • Always let them do it with your help

  43. What To Look For in a Product

  44. What to Look for in Products • From David Marco’s book Building and Managing the Meta Data Repository: A Full Lifecycle Guide

  45. Vendor Background • Full name and business address of vendor. • Parent Company. • Number of years company has been in business. • Company structure. Is it a corporation, partnership, or privately held? List names associated with structure if different from question # 1. • Public or privately held company. If public, which exchange is company traded on, and what is the company's market symbol? • When did the company go public, or when is it expected to go public? • Total number of employees worldwide? • Total number of U.S. employees? • Web site URL • Number of developers supporting proposed product solution? • Company profit/loss for the last three years (if available).        

  46. Proposed Solution Overview • Summary of the vendor's proposed solution and explain how it meets the needs • What are the names and versions of the product(s) component(s) comprising the vendor's proposed solution? • The repository architect and infrastructure architect need to carefully review all the components in the proposed solution and compare them with the target technical environment and support structure. How do the components communicate? What hardware platforms, DBMS's, Web servers and communications protocols do the components require? How is security and migration handled among the various components? • Number of worldwide production installations using precisely this proposed solution configuration. • Be sure to consider the hardware, DBMS, Web server, etc. • How many other companies are using the same confiruration? Is your company going to be the first? • What hardware, operating system, DBMS and web browser limitations do each of the product(s) component(s) have in the proposed solution on client and server platforms? • Be mindful of any requirements to download. Java applets and/or ActiveX controls to the client. This might be in conflict with your company's web policy or if deployed externally your clients. • What is the release date and version number history of each of the product(s) component(s) for the past 24 months • What was the anticipated release date and new feature list for each of the product(s) features and component(s) over the next 12 months?

  47. Cost of Solution • Total cost of proposed solution. • Cost of consulting services required for installation. • Negotiate consulting time up front to complete staff training and get the repository up and running as quickly as possible. • Cost of consulting services for initial project setup • What is the vendor's daily rate for consulting services without expenses? • Annual maintenance cost/fee • This should range any where from 14 percent to 18 percent of solution price? • Are all new product component releases/upgrades provided while under an annual maintenance agreement? If not, please explain in detail.

  48. Technical Requirements • Are there any database schema design requirements for the DSS data model in order to function with the repository product? • Does the proposed solution require a change in the existing DSS schema design in order to function? • How does the tool control the various versions of the meta data (development, quality assurance and production) stored in the repository? • How is meta data from multiple DSS projects controlled and separated? How can the various projects share meta data? • The answer to this question will determine how you administer the product and provide security. • Describe how meta data repository contents are migrated from one system engineering phase to the next (development, quality assurance and production)? How does this processing sequence differ when dealing with multiple projects on various time lines? • In particular how is meta data migrated through the various design phases? Can a single project or portion of a project be migrated forward? How? • What DBMS privileges does the product support (e.g., roles, accounts, and views)? • Can DBMS-specific SQL statements be incorporated into queries?

  49. Technical Requirements • Describe the security model used with the product? • How does the product use existing infrastructure security systems? • Does the product use any type of single sign-on authentication (e.g., LDAP)? • Where are user security constraints for the product stored? • Can a user have access to the repository tool for one project but no access for another project? • Can a user view the SQL generated by the product? • Is the product Web-enabled? Describe.

  50. Implementation • Describe sequence of events and level of effort recommended for clients to consider in planning their implementation strategy. • What is the typical duration of the implementation cycle? • How many DSS database schema dimensions and facts can the proposed product solution handle? • Provide a sample project plan for implementation of your proposed solution for a single DSS project. • What client resource skill sets need to be in place for installation and implementation?

More Related