1 / 50

An Introduction to Data Warehousing

An Introduction to Data Warehousing. 1. Business Intelligence.

Download Presentation

An Introduction to Data Warehousing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction to Data Warehousing 1

  2. Business Intelligence • Now,if the Estimates made before a Battle indicate Victory,it is because careful calculations show that your conditions are more favorable than those of your enemy;if they indicate defeat ,it is because careful calculations show that the favorable conditions for a Battle are fewer.With more careful calculations one can win ; with less one cannot. How much chance of Victory has one who makes no calculations at all !! --- Sun Tzu , The Art of War • Business these days are ,war minus shooting. -Anonymous

  3. Course Roadmap • Introduction to Datawarehousing • Difference between Operational System and DataWarehouse • Emergence of Decision Support Systems • DataWarehouse Theoretical Architecture • DataWarehouse Technical Architecture • DataWarehouse Bus Architecture • Data Modelling concepts • E-R Modelling for OLTP System • Dimensional Modelling for a Datawarehouse • Scheme generation for Datawarehouse • Star Scheme Design • Snowflake Scheme Design • Key aspects in designing the Dimensional Model • Granularity with respect to the Fact Table in the Schemas • Conformed Facts,Dimensions

  4. Course Roadmap • Fact less Fact Tables,Aggregate Fact Tables • Out Trigger Entities in the Schemas • Types of Relationships to be maintained between Facts • and Dimensions • Dependencies while generating Physical Scheme for • a DataWarehouse • Case Study of design of DataWarehouse for an existing • ERmodel

  5. Objectives • At the end of this session, you will know : • What is Data Warehousing • The evolution of Data Warehousing • Need for Data Warehousing • OLTP Vs Warehouse Applications • Data marts Vs Data Warehouses • Operational Data Stores • Overview of Warehouse Architecture

  6. Objectives At the end of this lesson, you will know : • Data Warehouse Architectures • Components of Data Warehousing Architecture • An overview of each of the components • Considerations for Data Warehouse Design • Common mistakes in Warehouse designs • An overview of Warehouse on the web

  7. What is a DataWarehouse ?

  8. What is a Data Warehouse ? A data warehouse is a subject-oriented,integrated,nonvolatile,time-variant collection of data in support of management's decisions. - WH Inmon WH Inmon - Regarded As Father Of Data Warehousing

  9. Subject-Oriented- Characteristics of a Data Warehouse Data Warehouse Operational Leads Prospects Customers Products Time Quotes Regions Orders Focus is on Subject Areas rather than Applications

  10. Integrated - Characteristics of a Data Warehouse Appl A - m,f Appl B - 1,0 Appl C - male,female m,f Appl A - balance dec fixed (13,2) Appl B - balance pic 9(9)V99 Appl C - balance pic S9(7)V99 comp-3 balance dec fixed (13,2) Appl A - bal-on-hand Appl B - current-balance Appl C - cash-on-hand Current balance Appl A - date (julian) Appl B - date (yymmdd) Appl C - date (absolute) date (julian) Integrated View Is The Essence Of A Data Warehouse

  11. Non-volatile - Characteristics of a Data Warehouse Data Warehouse Operational insert change insert delete load read only access replace change Data Warehouse Is Relatively Static In Nature

  12. Time Variant - Characteristics of a Data Warehouse Data Warehouse Operational • Current Value data • time horizon : 60-90 days • Snapshot data • time horizon : 5-10 years • data warehouse stores historical data Data Warehouse Typically Spans Across Time

  13. Alternate Definitions A collection of integrated, subject oriented databases designed to support the DSS function, where each unit of data is relevant to some moment of time - Imhoff

  14. Alternate Definitions Data Warehouse is a repository of data summarized or aggregated in simplified form from operational systems. End user orientated data access and reporting tools let user get at the data for decision support - Babcock

  15. Evolution of Data Warehousing 1960 - 1985 : MIS Era • Unfriendly • Slow • Dependent on IS programmers • Inflexible • Analysis limited to defined reports Focus on Reporting

  16. Evolution of Data Warehousing 1985 - 1990 : Querying Era Queries that are formulated by the user on the spur of the moment • Adhoc, unstructured access to corporate data • SQL as interface not scalable • Cannot handle complex analysis Focus on Online Querying

  17. Evolution of Data Warehousing 1990 - 20xx : Analysis Era • Trend Analysis • What If ? • Cross Dimensional Comparisons • Statistical profiles • Automated pattern and rule discovery Focus on Online Analysis

  18. Need for Data Warehousing • Better business intelligence for end-users • Reduction in time to locate, access, and analyze information • Consolidation of disparate information sources • Strategic advantage over competitors • Faster time-to-market for products and services • Replacement of older, less-responsive decision support systems • Reduction in demand on IS to generate reports

  19. Business Queries Typical Business Queries • Which product generated maximum revenue over last two quarters in a chosen geographical region, city wise, relative to the previous version of product, compared with the plan • What percent of customer procures product A with B in a chosen region, brokendown by city, season, and income group

  20. OLTP Systems Vs Data Warehouse Remember Between OLTP and Data Warehouse systems users are different data content is different, data structures are different hardware is different Understanding The Differences Is The Key

  21. OLTP Vs Warehouse

  22. OLTP Vs Warehouse

  23. OLTP Vs Warehouse

  24. Capacity Planning Processing Power Time of day Processing Load Peaks During the Beginning and End of Day

  25. Examples Of Some Applications Manufacturers Retailers Customers • Target Marketing • Market Segmentation • Budgeting • Credit Rating Agencies • Financial Reporting and Consolidation • Market Basket Analysis - POS Analysis • Fraud Management • Profitability Management • Event tracking

  26. Do we need a separate database ? • OLTP and data warehousing require two very differently configured systems • Isolation of Production System from Business Intelligence System • Significant and highly variable resource demands of the data warehouse • Cost of disk space no longer a concern • Production systems not designed for query processing

  27. Data Marts • Enterprise wide data warehousing projects have a very large cycle time • Getting consensus between multiple parties may also be difficult • Departments may not be satisfied with priority accorded to them • Sometimes individual departmental needs may be strong enough to warrant a local implementation • Application/database distribution is also an important factor

  28. Data Marts Subject or Application Oriented Business View of Warehouse • Finance, Manufacturing, Sales etc. • Smaller amount of data used for Analytic Processing • Address a single business process A Logical Subset of The Complete Data Warehouse

  29. Data Warehouse and Data Mart

  30. Data Warehouse and Data Mart

  31. Warehouse or Mart First ?

  32. Different kinds of Information Needs • Current • Recent • Historical Is this medicine available in stock What are the tests this patient has completed so far Has the incidence of Tuberculosis increased in last 5 years in Southern region OLTP ODS Data Warehouse

  33. Operational Data Store - Definition A subject oriented, integrated, volatile, current valued data store containing only corporate detailed data Can I see credit report from Accounts, Sales from marketing and open order report from order entry for this customer Data from multiple sources is integrated for a subject Identicalqueries may give different results at different times. Supports analysis requiring current data Data stored only for current period. Old Data is either archived or moved to Data Warehouse

  34. Operational Data Store • Increasingly becoming integrated with the data warehouse • Are nothing but more responsive real time data warehouses • Data Mining has anyway forced Data Warehouses to store transactional level data

  35. OLTP Vs ODS Vs DWH

  36. OLTP Vs ODS Vs DWH

  37. OLTP Vs ODS Vs DWH

  38. Typical Data Warehouse Architecture Data Marts EIS /DSS Metadata Query Tools Select Extract Transform Integrate Maintain Data Warehouse OLAP/ROLAP Web Browsers Operational Systems/Data Middleware/ API Data Mining Data Preparation Multi-tiered Data Warehouse without ODS

  39. Typical Data Warehouse Architecture Data Marts Metadata Metadata Select Extract Transform Integrate Maintain Select Extract Transform Load Data Warehouse ODS Operational Systems/Data Data Preparation Data Preparation Multi-tiered Data Warehouse with ODS

  40. Benefits of DWH These capabilities empower the corporate... • To formulate effective business, marketing and sales strategies. • To precisely target promotional activity. • To discover and penetrate new markets. • To successfully compete in the marketplace from a position of informed strength. • To build predictive rather than retrospective models.

  41. Warehouse Architecture - 1 EIS /DSS Metadata Query Tools Select Extract Transform Integrate Maintain Data Warehouse OLAP/ROLAP Web Browsers Operational Systems/Data Middleware/ API Data Mining Data Preparation Enterprise Data Warehouse

  42. Warehouse Architecture - 2 Metadata Metadata Metadata EIS /DSS Data Mart Data Mart Data Mart Query Tools Select Extract Transform Integrate Maintain OLAP/ROLAP Web Browsers Operational Systems/Data Middleware/ API Data Mining Data Preparation Single Department Data Mart

  43. Warehouse Architecture - 3 Data Marts EIS /DSS Metadata Query Tools Select Extract Transform Integrate Maintain Data Warehouse OLAP/ROLAP Web Browsers Operational Systems/Data Middleware/ API Operational Data Store Data Mining Data Preparation Multi-tiered Data Warehouse

  44. Data Warehouse Architectures There are three schools of thought about DW architectures • One supports Dimensional Modeling all through (Ralph Kimball) • Second supports ER for Data Warehouse and Star Schemas for Data Marts • Third supports ER model for DW (NCR)

  45. Kimball’s View OperationalSystems Each Star is a Data Mart and has both summary and detail data Presentation Server Staging Area LAN Data Warehouse Server Processes • Extract • Scrubbing • Transformation • Load Jobs • Aggregation Jobs • Replication • Monitoring • Management • Meta Data Repository • Meta Data Population • Meta Data Maintenance DW is sum total of all Data Marts DW Bus using Conformed Dimensions Multiple Data Marts With Conformed Dimensions

  46. Inmon’s View Operational Systems Data Warehouse Staging Area Data Marts LAN Data Warehouse Server Processes • Extract • Scrubbing • Transformation • Load Jobs • Aggregation Jobs • Replication • Monitoring • Management • Meta Data Repository • Meta Data Population • Meta Data Maintenance Detail Data in ER format Summarized Data in Star formats Data Warehouse (ER) Feeding Multiple Data Marts (Star Schema)

  47. Components of a Data Warehouse Architecture • Source Databases • Data extraction/transformation/load (ETL) tool • Data warehouse maintenance and administration tools • Data modeling tool or interface to external data models • Warehouse databases • End-user data access and analysis tools

  48. Components of a Data Warehouse Architecture Data Cleansing Tools Data Modeling Tool ROLAP Engine Central Metadata Data Access and Analysis Tools -Managed Query -Desktop OLAP -ROLAP -MOLAP - Data Mining Source Databases ETL Tool Central Warehouse(RDBMS) RDBMS Local meta data MDDB Warehouse Admin Tool Architected Datamarts Warehouse Databases Data Warehouse Is Not Just About Data... But Tools Too

  49. Source Databases - Characteristics • Legacy, relational, text or external sources • Designed for high-speed transaction processing • Real-time, current, volatile data • Fast response for larger numbers of concurrent users • Many short transactions • Update-intensive; modifications by row • Inquiry-oriented; access by keys • High integrity, security, recoverability • Source data is often inconsistent and poorly modeled

  50. Data Cleaning Tools • To clean data at the source • Clean up source data in-place on the host • Business rule discovery tools which analyse the source data and write cleaning rules based on lexical analysis and AI techniques • Poorly integrated with data warehousing tools • ETL tools have limited yet adequate data cleansing functionality

More Related