1 / 43

BI4ALL Demonstration 02 Data Models

BI4ALL Demonstration 02 Data Models. www.InstantBI.com. 1/7/2012. Agenda. Some names you should know Key issues in modeling Traditional DW Design mechanisms Star Schemas Time Variance + Stability Analysis Business Intelligence Modeling Best Practice Terminology Naming Standards

smilner
Download Presentation

BI4ALL Demonstration 02 Data Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BI4ALLDemonstration 02Data Models www.InstantBI.com 1/7/2012

  2. Agenda • Some names you should know • Key issues in modeling • Traditional DW Design mechanisms • Star Schemas • Time Variance + Stability Analysis • Business Intelligence Modeling Best Practice • Terminology • Naming Standards • Each Type of Table • Some details on BI4ALL • Live demonstration of data models • Next Steps

  3. Some Names You Should Know • Bill Inmon • ‘Father of Data Warehousing’ • Prolific writer • Was the ‘public pioneer’ of data warehousing • Pioneers have arrows in their backs!!! • Originator of time variant + stability analysis data models • Data modeling background • Ralph Kimball • Co-inventor of the Xerox Star workstation • This is a whole other story • Co-Founder of Metaphor Computer Systems • Worlds best DSS Software (until IBM killed it) • Solved the problem of DSS on motorola 68020 with 16MB memory and 320 MB disk drive • By implementing dimensional models on an RDBMS • Engineering background

  4. Key Issues in Modelling • Ease of use for the end user • Speed of response for end user • Ease of extensions/expansions for IT • Ease of understanding for IT • Ability to store large amounts of history • Lowest cost development/maintenance of ETL to populate the model • Could list many others…

  5. BI Design Methods • Three different methods widely used • Choice of use depends on many factors • Star Schema • Transaction and ROLAP • Relational Online Analytical Processing • Time Variance + Stability Analysis • Changes to non-txn data eg. Customer • Third Normal Form • Volumes of changes are low • Until recently all ODSs have been implemented as 3NF

  6. Why Use Star Schemas? • Business people understand them • Matches the business model • Often intuitively obvious • Easy to query • Supports complex questions easily • No wrong answers due to join problems • Excellent performance on star schema aware databases • Oracle, Informix, DB2

  7. What Does a Star Schema Look Like?

  8. Why Use Integer Keys? • Performance • Integers is the fastest data type to operate on • Space and throughput • Integers are shorter than account numbers • Disk savings in tables and indexes • Speed from disk to processor • Flexibility • Allows multi-level summary tables • No IT involvement to create new summaries!! • Large Stars!!

  9. Why Use TV + SA? • “Be able to show me the value of any field at any time in the past” • Archival databases • When people really do not know what they want • Not easy to query • Cannot give this to business people

  10. What Does TV + SA Look Like? Each 3NF entity passed to the DW is split into 3 (or more) entities based on stability analysis and access analysis. An element of time is added to the key.

  11. Best Practice Since 1996 Business Users Desktops Highly Summarised (in star schemasor cubes) Structured for Department 3-5 years Meta Data Everywhere Lightly Summarised (in star schemas) Current Detail in archival datastore (3NF time variant model) Structured for Enterprise ODS 1-2 years 3NF integration/ transformation 5-10 years Operational Systems (System of Record) Older detail (3NF time variant model)

  12. New BI4ALL Proposition Business Users Desktops Do not have to build data marts Highly Summarised (in star schemasor cubes) Views for Department 3-5 years Meta Data Everywhere Virtual Lightly Summarised (star schemas) Current Detail in archival datastore (star schema) Structured for Enterprise ODS 3-5 years star schema integration/ transformation 5-10 years Operational Systems (System of Record) Older detail (star schema)

  13. BI4ALL Proposition • Increased query performance • Attributes ‘broken out’ to type 1 dims, directly joined to fact tables • All joins through integer keys • Almost no type 2 dimensions • Same model can support ODS/DW query patterns • You do not have to build data marts • Can answer all your queries directly from the ODS/DW • Approx 40% reduction in ETL Coding/Maintenance • HW/SW cost avoidance for data marts • All data readily available to end user tools • Oracle Discoverer, Business Objects etc. • All star schema • Even the archival data store

  14. Enough of the Background Show me something new!!!

  15. Terminology • Fact Table • A table that stores facts • Fact • A number about the business • Eg. Revenue, profit, costs, price, quantity, interest, txn amount • Dimension Table (Major Dimension) • A table containing dimension fields about a specific major dimension of the business eg. Time, Product, Customer • Dimension Field (Minor Dimension) • Generally a descriptive character field describing some aspect of some ‘object’ that is involved in the business. (Time, Product, customer) • Eg. Time - day, week, month, quarter year • Eg. Product - name, colour, size, weight, height, group, category • Eg. Customer – name, birthdate, age, sex, marital status, life stage

  16. Terminology • Association Table 1 • A table that associates two or more tables to each other. • Eg. Customer 1 is wife of customer 2 • Association Table 2 • A table that provides time variance support within a star schema • Type 1 Dimension Table • Replace the record when some fields change • Type 2 Dimension Table • Close the previous record and open a new record when some specific fields on the dimension table change.

  17. Terminology IntegerKeys Fact Table Fact Minor Dimension Major Dimension

  18. BI4ALL Naming Standards • TD_ : Table - Dimension • TF_ : Table - Fact • VM_ : View – diMension • VF_ : View - Fact

  19. Month – A Type 1 Dimension create view dbo.vm_month as Select a.pk_TD0005 pk_vm_month ,day_date first_day_of_month ,a.month_name ,a.month_name_sdesc ,a.month_in_year ,a.calendar_qtr ,a.month_in_qtr ,a.financial_qtr ,a.month_in_fncl_qtr ,a.year_name ,a.year_num ,a.level_col from dbo.TD0005 a where (a.level_col = 'level2') or a.pk_TD0005 = 0 go • All Table access is via views • Each dimension view has: • A pk_ • Defines a level for the table • Exposes a zero row • Opportunity to rename fields

  20. Demographics – A Type 1 Dimension create view dbo.vm_all_demographic as select a.pk_TD0001 pk_vm_all_demographic ,a.integer_01 demo_type ,a.varchar255_01 demo_type_sdesc ,a.varchar255_02 demo_type_ldesc ,a.money_01 income_low ,a.money_02 income_high ,a.varchar255_03 income_band ,a.integer_02 age_low ,a.integer_03 age_high ,a.char1_01 gender ,a.varchar255_04 age_band ,a.varchar255_05 education_level ,a.varchar255_06 marital_status ,a.varchar255_07 life_stage ,a.char1_02 dependents_flag ,a.integer_04 number_dependents ,a.varchar255_12 org_size_band ,a.integer_05 number_employees_low ,a.integer_06 number_employees_high ,a.varchar255_09 number_employees_band ,a.money_01 revenue_low ,a.money_02 revenue_high ,a.varchar255_03 revenue_band ,a.varchar255_10 industry_classification ,a.integer_07 number_in_household ,a.varchar255_13 head_of_hh_name ,a.level_col ,a.dim_table_number from dbo.TD0001 a where ( a.level_col = 'detail' and a.dim_table_number = 101) or a.pk_TD0001 = 0 • Note more complex dimension: • Exposes all demographics for all types of parties • Contains ‘bands’ as well as high low values • Shows the dimension table number • Shows the renaming of meaningless fields to meaningful fields Dimension table number

  21. Product – A Type 2 Dimension create view dbo.vm_product as select a.pk_TD0200 pk_vm_product ,a.varchar255_01 sku_number ,a.varchar255_02 barcode_num ,a.varchar255_03 prod_name ,a.varchar255_04 prod_sub_group ,a.varchar255_05 prod_group ,a.varchar255_06 prod_category ,a.varchar255_07 prod_class ,a.varchar255_08 prod_line ,a.varchar255_09 prod_size ,a.varchar255_10 prod_colour ,a.varchar255_11 prod_bundle_size ,a.integer_01 shelf_life_days ,a.money_01 unit_alloc_cost ,a.money_02 unit_abc_cost ,a.money_03 list_price_per_unit ,a.varchar255_22 unit_of_measure_code ,a.varchar255_23 unit_of_measure_sdesc ,a.char1_01 discount_allowed ,a.decimal_nn discount_rate_1-5 ,a.varchar255_nn discount_sdesc_1-5 ,a.varchar255_nn discount_ldesc_1-5 ,a.char1_02 returnable_flag ,a.varchar255_26 return_inspection_req_ldesc ,a.char1_03 sellable_flag ,a.char1_04 component_flag ,a.char1_05 requires_shipping_flag ,a.char1_06 taxable_flag ,a.varchar255_24 tax_type_1_sdesc ,a.varchar255_25 tax_type_2_sdesc ,a.date_from date_from ,a.date_to date_to ,a.current_flag current_flag ,a.level_col ,a.dim_table_number from dbo.TD0200 a where a.level_col = 'detail' and a.dim_table_number = 146 • Type 2 dimensions have: • Date from/to • Current flag • Note product dimension has a hierarchy for products • From SKU -> prod line • Actually not an ‘enforced’ hierarchy Date From Date To Current Flag

  22. Reporting Structure – Type 2 Dimension • Company structures change all the time • A type 2 dimension is used to maintain reporting versions of company structure • A separate recursive hierarchy is kept to maintain actual hierarchies in case of ‘missing’ levels of hierarchies create view dbo.vm_comp_reporting_struct as select a.pk_TD0004 pk_vm_comp_reporting_struct ,a.date_from date_from ,a.date_to date_to ,a.current_flag current_flag ,a.varchar255_01 parent_company ,a.varchar255_02 org_name ,a.varchar255_03 region_name ,a.varchar255_04 channel_type ,a.varchar255_05 channel_category ,a.varchar255_06 channel_group ,a.varchar255_07 channel_name ,a.varchar255_08 team_type ,a.varchar255_09 team_category ,a.varchar255_10 team_group ,a.varchar255_11 team_name ,a.varchar255_12 person_type ,a.varchar255_13 person_category ,a.varchar255_14 person_group ,a.varchar255_15 person_name ,a.level_col ,a.dim_table_number from dbo.TD0004 a where ( a.level_col = 'detail' And a.dim_table_number = 115) Or a.pk_TD0004 = 0

  23. Partys ,org_name ,number_employees ,gross_revenue ,gross_profit ,net_profit ,ss_org_key /* source system organisation key */ ,phone_switch_num ,start_up_date ,years_since_startup ,blacklisted_ind /* 1 = Yes, 2 = No */ ,vio_flag /* very important organisation */ ,ss_household_key /* source system household key */ ,vih_flag /* very important household */ ,cust_type_sdesc ,cust_type_ldesc create view dbo.vm_party as select pk_vm_party ,prefix ,first_name ,second_name ,third_name ,surname ,suffix ,concat_name ,address_to_name ,formal_salutation ,informal_salutation ,written_salutation ,email_address ,web_address ,tax_number ,govt_identifier_type ,govt_identifier ,ss_person_key /* source system person key */ ,marital_status ,phone_home_num ,phone_work_num ,phone_mobile_num ,phone_fax_num ,birthdate ,current_age ,vip_flag ,system_assigned_rating ,person_assigned_rating ,customer_flag ,employee_flag ,reseller_flag ,sales_rep_flag ,pmail_allowed_flag /* is paper mail allowed ? */ ,email_allowed_flag /* is e-mail allowed ? */ ,gender • Partys can be people, companies, households or anything else you want to keep as a ‘party’ • Party is a type 1 dimension • Though we may implement as type 2 at ####

  24. Types of Partys • In the ‘as is’ model a Party can be: • vm_person • vm_household • vm_organisation • vm_sales_rep • vm_all_cust • vm_person_cust • vm_org_cust • vm_household_cust • vm_vendor

  25. Account/Address Indicator Tables create view dbo.vm_account_inds As select pk_vm_account_inds ,credit_allowed_flag ,payment_15_days_flag ,payment_30_days_flag ,payment_45_days_flag ,payment_60_days_flag ,payment_90_days_flag ,interest_penalty_flag ,monthly_fee_flag ,annual_fee_flag ,level_col ,dim_table_number • For accounts we might record such things as credit allowed, payment terms and whether fees are levied. • For addresses we might want a short cut to get to addresses that are subject to certain severe weather or other natural dangers. create view dbo.vm_address_inds as select pk_vm_address_inds ,weather_hazard_flag ,flood_hazard_flag ,storm_hazard_flag ,cyclone_hazard_flag ,hurricane_hazard_flag ,drought_hazard_flag ,earthquake_hazard_flag ,level_col ,dim_table_number

  26. Product Cross Holding Inds create view dbo.vm_prod_x_hold_inds as select pk_vm_prod_x_hold_inds ,product_01 ,product_02 ,product_03 ,product_04 ,product_05 ,product_06 ,product_07 ,product_08 ,product_09 ,product_10 ,product_11 ,product_12 ,product_13 ,product_14 ,product_15 ,product_16 ,product_17 ,product_18 ,product_19 ,product_20 ,a.level_col ,a.dim_table_number • Sometimes we want to know who owns a combination of products. • And sometimes we want to know who owns a combination of products but not some other specific product. • This is much easier to do with product cross holding indicators.

  27. Associating Partys • Partys can be associated for: • A reason (code) • In a role • For a period • Specified by an amount • Remember a party can be: • vm_person • vm_household • vm_organisation • vm_sales_rep • vm_all_cust • vm_person_cust • vm_org_cust • vm_household_cust • vm_vendor • Many of these associations are exposed separately through their own views create view dbo.vf_party_party_asoc as select pk_vf_party_party_asoc ,asoc_key_vf_party_party_asoc ,dk_vm_party_1 ,dk_vm_party_2 ,dk_vm_asoc_code ,dk_vm_asoc_role ,dk_vm_date_from ,dk_vm_date_to ,date_from ,date_to ,current_flag ,asoc_code ,asoc_role ,asoc_amount1 ,asoc_amount2 ,party_1_type ,party_2_type ,a.asoc_table_number from dbo.TF0002 a

  28. Partys and Addresses • Partys are associated to addresses for: • A reason (code) • In a role • For a period • Specified by an amount • Addresses are considered independent of Partys • That is, if no-one lives in a house the house still has an address • Many of these associations are exposed separately through their own views create view dbo.vf_party_address_asoc as select pk_vf_party_address_asoc ,asoc_key_vf_party_address_asoc ,dk_vm_party ,dk_vm_address ,dk_vm_asoc_code ,dk_vm_asoc_role ,dk_vm_date_from ,dk_vm_date_to ,date_from ,date_to ,current_flag ,asoc_code ,asoc_role ,asoc_amount1 ,asoc_amount2 ,party_type ,a.asoc_table_number from dbo.TF0002 a

  29. Some Simple Fact Tables • Lets look closely at the fact table • Note a pk_ We put pks on fact tables now • Used to lookup fact rows for updates • Used to link asoc tables to fact tables • Note the set of dimension tables that are linked • Because it is an account close we would want to analyse this by many dimensions and it should run as fast as possible • Note company reporting structure • Note the three keys • dk_vf_account_dims_asoc • dk_vf_customer_dims_asoc • dk_vf_campaign_dims_asoc • More on these later • Then note there are some facts about the closed account create view dbo.vf_account_close_txn as select pk_vf_account_close_txn ,dk_vm_account_txn_date ,dk_vm_account_txn_minute ,dk_vm_account ,dk_vm_sub_campaign ,dk_vm_contact_method ,dk_vm_product ,dk_vm_customer ,dk_vm_customer_demographic ,dk_vm_currency ,dk_vm_unit_of_measure ,dk_vm_geography ,dk_vm_geo_code ,dk_vm_map_reference ,dk_vm_card_type ,dk_vm_card ,dk_vm_txn_type ,dk_vm_sales_rep ,dk_vm_account_close_rsn_code ,dk_vm_comp_reporting_struct ,dk_vf_account_dims_asoc ,dk_vf_customer_dims_asoc ,dk_vf_campaign_dims_asoc ,txn_amount ,txn_units ,txn_tstamp ,appsys_reference_number_str ,txn_sdesc ,txn_ldesc ,a.fact_table_number from dbo.TF0101 a

  30. Invoice Lines • Another very simple fact table • Invoice lines • Note link to invoice header fact row • This is a fact<->fact table link • Note again three columns • dk_vf_account_dims_asoc • dk_vf_customer_dims_asoc • dk_vf_campaign_dims_asoc • More on these later create view dbo.vf_invoice_line as select pk_vf_invoice_line ,dk_vm_invoice_date ,dk_vm_account ,dk_vf_account_dims_asoc ,dk_vm_sub_campaign ,dk_vm_product ,dk_vm_customer ,dk_vf_customer_dims_asoc ,dk_vm_customer_demographic ,dk_vm_currency ,dk_vm_unit_of_measure ,dk_vm_geography ,dk_vm_geo_code ,dk_vm_map_reference ,dk_vm_order_date ,dk_vm_comp_reporting_struct ,dk_vm_account_status ,dk_vm_account_type ,dk_vf_campaign_dims_asoc ,appsys_reference_number_str ,invoice_number_str ,invoice_number ,line_number ,line_number_units ,unit_base_amount ,unit_extra_amount_01 ,unit_tax_amount_01 ,unit_tax_amount_02 ,ext_base_amount ,ext_extra_amount_01 ,ext_tax_amount_01 ,ext_tax_amount_02 ,line_sdesc ,line_ldesc ,line_comments1 ,line_comments2 ,order_date ,scheduled_delivery_date ,scheduled_payment_date ,number_lines ,dk_vf_invoice_header ,a.fact_table_number

  31. Demonstration of Data Models

  32. The Process of BI4ALL Modeling

  33. BI4ALL Modeling • Associations dramatically change the process of modeling • Makes the modelers life MUCH easier • Eliminates some significant problems by design • Dramatically enhances productivity • Dramatically reduces development times

  34. BI4ALL Dimension Modeling • You identify the dimensional data • Define and design dimension tables • Do not worry about what fact tables it will be linked to • Attempt to define unique discrete values only • If there are large combinations of values break the input data up into more smaller dimension tables to keep the number of rows in the dimension tables low • Decide which dimension tables need to be type 2 • There are usually very few, and they usually have very few rows in them, product, geography are two examples • Determine ‘real key’ in the data • Map to a BI4ALL table or define a new table

  35. BI4ALL Fact Modeling • You identify the transaction/key measures data • Define and design fact tables • Initially, only worry about linking dimensions that are unique to the fact row and dimensions obviously required for performance • Do not worry about linking ‘low use’ dimensions • Always put relevant association keys onto fact tables (party/account) • Assume that ‘if it should be linked’ it will be linked • Add extra keys for performance as the reports are designed and as analysis requirements become more clear • Map to a BI4ALL table or define a new table

  36. BI4ALL History Modeling • Identify areas where you want to retain history • Entities/accounts already defined • You might add • End of month outstanding amounts • Daily outstanding balances • Decide how you are going to retain the history • Type 2 dimension table? • Develop a new association? • Develop an event fact table with ‘current/previous’ amounts • Eg. Banking transactions with before/after withdrawal balance

  37. BI4ALL Modeling • BI4ALL Modeling is far faster and easier than other current methods of designing models • It is also produces a faster/cheaper EDW • No loss of functionality

  38. Let’s Just Re-SummariseBecause this is important

  39. Best Practice Since 1996 Business Users Desktops Highly Summarised (in star schemasor cubes) Structured for Department 3-5 years Meta Data Everywhere Lightly Summarised (in star schemas) Current Detail in archival datastore (3NF time variant model) Structured for Enterprise ODS 1-2 years 3NF integration/ transformation 5-10 years Operational Systems (System of Record) Older detail (3NF time variant model)

  40. New BI4ALL Proposition Business Users Desktops Do not have to build data marts Highly Summarised (in star schemasor cubes) Views for Department 3-5 years Meta Data Everywhere Virtual Lightly Summarised (star schemas) Current Detail in archival datastore (star schema) Structured for Enterprise ODS 3-5 years star schema integration/ transformation 5-10 years Operational Systems (System of Record) Older detail (star schema)

  41. Next Steps

  42. Next Steps • View a detailed live demonstration of the BI4ALL models that is confidential • If interested perform a more detailed review of the models to determine fitness for purpose in your organisation

  43. Summary • Some names you should know • Key issues in modeling • Traditional DW Design mechanisms • Star Schemas • Time Variance + Stability Analysis • Business Intelligence Modeling Best Practice • Terminology • Naming Standards • Each Type of Table • Some details on BI4ALL • Live demonstration of data models • Next Steps

More Related