1 / 69

Dimensional Modeling

Dimensional Modeling. Dr. Jerry Rosenbaum jrosenba@ix.netcom.com. The Rose Tree Group 410-764-8443. Myriad Solutions 301-476-9190. Agenda. Dimensional Example The Bigger Picture Steps to Build a Dimensional Model. An Understanding of the Problem is key to the solution.

Download Presentation

Dimensional Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dimensional Modeling Dr. Jerry Rosenbaum jrosenba@ix.netcom.com The Rose Tree Group 410-764-8443 Myriad Solutions 301-476-9190

  2. Agenda • Dimensional Example • The Bigger Picture • Steps to Build a Dimensional Model

  3. An Understanding of the Problemis key to the solution • To rewire the Empire State Building you must • Understand the current wiring • Understand the goals for the new wiring • Design the new wiring system • Execute your design • Monitor the results • The “Fire, Ready, Aim” approach • can lead to a hole in your foot

  4. Business User Perspective on Data • In the view of the business user, there are only two important things about data • Accessibility • Can I get the data that I need • Am I allowed to get the data I need (security) • Quality • When I get the data, can I trust it • Without quality, business intelligence devolves into business stupidity

  5. Part IDimensional Model Example

  6. Up Front Points • Dimensional Models are generally a star schema, but may be a snow flake • Provides a slice of the total available data and is focused about the needs of a single department or user • Contains a fact table and multiple dimension tables • Attributes may be source data or derived data • A dimensional model is one type of data mart.

  7. Sales Analysis Star Schema From Len Silverston Universal Data Models

  8. Dimensional Model • One Fact table • Customer Sales • Fact is either one data item (or a group of tightly coupled data items of the same granularity) • Multiple Dimension Tables (each with one or more dimensions of a similar type) • Sales Rep • Product • Time by Day • Address • Customer • Customer Demographics • Internal Organization

  9. Business Points • Data in Fact Table, and the Dimension Tables is based on • Business requirements • Data extracted from other databases • Internal or external data • Allows business users to slice and dice the data by any combination of dimensions to produce a business report

  10. Business Points (2) • Design of the Dimensional Model is only one aspect, how about the data to be loaded • Are there any data quality issues with the data that will be loaded • Are there alternate sources of the data that differ from the data source we used • Are we violating any security or privacy issues

  11. Questions • Where did the data come from • How often do we update the data • Is it an update or a complete refresh • Are the rules different for internal and external data sources • Why didn’t we just use the original databases • Was any of the data transformed • Why did we choose those 7 dimensions • What is the quality of the data

  12. Truly Answering These Questions • We must look at • Business drivers • Where dimensional modeling fits into the bigger data picture • How do we do dimensional modeling • In other words, it helps if we understand the bigger picture so we can build the right solution the first time

  13. Some Hidden Issues • This dimensional model can not reasonably help us answer • What was the typical total check out amount for all purchases by a customer • How many items did a typical customer purchase • Adding these items to every row of the fact table is not a good solution This requires a second, but related dimensional model.

  14. Hidden Issues (2) • How does one keep a set of dimensional models in synch • How does one ensure that for a given “fact” and set of dimensions that the sourcing of the data is consistent across multiple models • If a transformation (the T in ETL) is used, how can one ensure that everyone uses a consistent transformation • What about the data quality issues – were they resolved the same way every time.

  15. Part IIThe Bigger Picture

  16. Sagely Advice “Where should I begin your majesty” “Begin at the beginning”, the king said gravely - Louis Carroll Alice’s Adventure in Wonderland

  17. Business Intelligence • On of the earliest applications in the history of computing was a program to generate reports on operating system performance • FAST FORWARD TO TODAY • Today we call this process Business Intelligence (BI) – we gather and analyze data to increase business process efficiency. From J. O’Conner

  18. Business Intelligence • C - Level – Considerations • Costly ERP and Major systems are implemented to provide information to management to make the best decisions relating to improving the “Bottom Line.” • IT Customer satisfaction does not seem to meet expectations – manage expectations • IT takes the blame for bad decisions due to the information available to the executives • UNWARRANTED ???– From J. O’Conner

  19. Business Intelligence • C - Level Executive Considerations • ERP or Major System’s Value is dependent on how the information is disseminated to management (regions, functions, divisions, products, etc.) – but IT needs to understand what data is available and assure its accuracy. • IT must establish test and check points to assure data accuracy using standard data integrity methods. • IT must provide training and support in development of reports for accuracy – become detectives – look for anomalies, bringing them to the attention of the clients. From J. O’Conner

  20. Business Intelligence • C - Level Executive Considerations Information Business Leaders need: • Customer Information – trends in product selection, billing management, order tracking, fulfillment visibility, marketing planning, campaign management , telemarketing, lead generation, lead generation, and custom segmentation. From J. O’Conner

  21. Business Intelligence • C - Level Executive Considerations Information Business Leaders need: • Supplier Information – cost of purchased goods and services, optimization of supplier selection, compress cycle times, align the purchase of goods with the corporate strategy. From J. O’Conner

  22. Business Intelligence • C - Level Executive Considerations Information Business Leaders need: • Product Lifecycle Management data • Supply Chain Management data • Financial Data that links Business controls to Finance and comply with Sarbanes Oxley (SOX404). From J. O’Conner

  23. Key Business Driver: The Need to Improve Business Intelligence • Nine out of 10 executives from the largest U.S. companies say they need stronger business intelligence capabilities that provide better analysis of, and insight into, their operations if they are to grow successfully in an uncertain economic and political environment • Accenture survey of 150 senior executives of Fortune 1000 firms. From J. O’Conner

  24. Top Needs • 91% selected stronger analytical and business intelligence • 84% selected an organizational culture that better accommodates change • 74% selected a more robust information technology infrastructure

  25. Building a Business and IT Foundation for BI • Organizations have never been so eager to adopt business intelligence (BI) technology. Unfortunately, lack of alignment between people, process, and technology has led to many misguided business intelligence deployments. Using a business-centric methodology and process improvement type of approach, organizations can leverage BI efficiently to enable Performance Management • M.A.Smith – Data Management Review

  26. Zachman Framework

  27. DoDAF

  28. Data – What List of Subject Areas about which data is stored Often presented as a taxonomy tree or a multilevel outline Process - How Business functional areas Often presented as a set of functional decompositions Row 1 - Planner • Also important to know • Relationship between data and process • Present, and proposed data and process • Transition plan / Road map

  29. What - Data Conceptual (or Business) Data Model. Has two components Business data objects (forms, reports, etc) and their decomposition into data components and A high level organization of the business data components and their relationships How - System Business work flow End to end (incl. people) Includes all aspects of the target system The business workflow is the structure for organizing the business steps There are rules for moving along the steps in the workflow The business steps communicate to each other via the data Row 2 – Business View/Owner

  30. What - Data Logical Data Model Fully normalized (through fifth normal form) Determine which services use which data elements (entity level and attribute level) How - System For each business object Develop a set of services to meet the business need Search for potential common services User interactions Row 3 – Logical View / Designer Metadata for both data and systems should be collected, Organized, and maintained

  31. What is in a Logical Data Model • Graphic that depicts entities, attributes, primary keys, etc (Data items + structural rules + relationships) Plus • Metadata for Entities, Attributes and Relationships (CRUCIAL FOR USING DATA MARTS) • Definition • Data Domain values set (including possible representation) • Units of Measure • Cardinality (and optionality) • Management of synonyms and antonyms • Semantic rules • Status of an entity, or attribute

  32. Logical Data Model (2) • Primary key, foreign key, uniqueness, use of nulls and default values • Data integrity and business rules (Data Quality Rules) • Originating Data Source • System of Record/Authoritative Source • Lead and steward business domain • Usage of data in an information exchange • Security • Notes for physical data model designer • Example • A data modeler may use abstract design templates (generalization and specialization) and bottom up design based current systems plus new requirements to build a complete model • All other models (especially the process models) are used as potential sources of data elements

  33. Some Data Model Notes • The Conceptual Data Model tends to be a very wide scope, but limited detail (sets the context for data sharing). • Logical Data Model is developed in detail as needed • Physical Data Model is based on all or part of the Logical Data Model AND it may have a similar or very different data structure • Structure is based on planned usage

  34. What - Data Physical Database Design Transaction Path Analysis Analysis of the need for indexes to improve performance Determine Physical Data Structure Determine if any services will be stored procedures How - System Transformation of the business flow to a physical flow Determine groupings of services for implementation Build physical flow based on business flow and services SOA Row 4 – Physical View/Builder Add physical details to the metadata and discovery services

  35. What - Data Determine the layout of the database tables across the disk farm Develop the DCL for the physical database structure Determine backup & recovery strategy Determine SAN strategy Row 5 – Detail Representation / Subcontractor How - System • Determine detail specifications for each element of the physical flow • Write the programs for implementing the flow as well as each element of the flow

  36. What Performance monitoring and adjustments Business continuity Archiving and retrieval SPC & Audit Data Stewardship Row 6 – Information System(Actual physical system How • “Help Desk” • Change management (including data)

  37. Corporate Information Factory Metadata Operational Systems Reporting & BI Systems Data Mart for Accounting AR ETL ETL Data Warehouse AP Data Mart for Sales Order Entry Etcetera Etcetera System Measurements Based on work of C. Imhof

  38. Data Quality Points • You must measure your actual data quality • Quality must start in the production systems • Your ETL processes along with the data warehouse and data marts are not meant to be a sewerage treatment plant for bad data • If the quality is not very high in the operational systems, then • Your quest for business intelligence devolves into producing business stupidity • Solve your DQ problem in the operational systems (root cause) and then go forward. • Bad data can be created faster than you can correct it.

  39. System Measuresfor Data and Systems • Frequency of use • Pattern of use (monthly, weekly, daily, hourly) • Resources consumed • CPU, Disk, Network • For Data both volume and rate of growth • Performance Metrics • Utilization Metrics

  40. Metadatafor Data and Systems • Need Metadata about Data and Systems • Written in Business Terms + Technical Terms • Metadata for data includes information about Entities, Attributes, and Relationships • Definition • Data Domain values set (including possible representation) • Units of Measure • Cardinality (and optionality)

  41. Metadata (2) • Management of synonyms and antonyms • Semantic rules • Status of an Entity and Attribute • Primary key, foreign key, uniqueness, use of nulls and default values • Data integrity and business rules • Originating Data Source • System of Record / Authoritative Source • Lead and steward business domain • Usage of data in an information exchange • Example • AND Business Rules for Data Validity

  42. Metadata (3) • Without Metadata, • You may not be sure what you are looking at • For example what does a length of 6.2 mean (a Mars Lander crashed because of this problem) • You may not be sure what process you should use to execute a business process • Etc

  43. Key Question • Where do you get the data for the data marts. • Inmon: from the data warehouse • Kimball: directly from the operational systems • The problem is that source data may be available from several sources • What do you do if the sources do not have identical data • For derived data, does everyone use the exact same method (including data sources) • Are the semantics the same

  44. A man with one watch knows what time it isA man with two watches is never sure -Louis Carroll

  45. These are the systems that run the business on a daily basis Most Customer interactions are with operational systems Operational Systems Operational Systems AR AP Order Entry Etcetera

  46. Data Warehouse • Serves as the single, officially accepted and approved, valid source for all data needed for business analysis /intelligence • Helps insure that all Data Marts are reading from the same book and using the same rules. • Much easier change management • Fully normalized RDB with summary data added Data Warehouse ---------------- Archive

  47. Geared to meet the business users need Uses range from Simple reports to Data delivered in a manipulability form Word Excel Small Data Warehouse Star Schema Delivery media depends upon planned usage Data Mart Reporting & BI Systems Reporting & BI Systems Data Mart for Accounting Data Mart for Sales Etcetera

  48. Getting the Data In • Vendors promise us that the ETL process for moving data from operational systems to the Data Warehouse is “simple” • But we must deal with • Conflict resolution • Data Quality Issues • Duplicate data • etc

  49. Generally use SQL queries Care must still be taken to keep dimensions consistent Marts sourced from a single Data Warehouse can be merged Getting the Data Out

  50. Other Important Big PictureData Issues

More Related