1 / 291

Data Warehousing Fundamentals

Data Warehousing Fundamentals. Course Objectives. After completing this course, you should be able to do the following: Describe the role of business intelligence (BI) and data warehousing in today’s marketplace

topaz
Download Presentation

Data Warehousing Fundamentals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehousing Fundamentals

  2. Course Objectives • After completing this course, you should be able to do the following: • Describe the role of business intelligence (BI) and data warehousing in today’s marketplace • Describe data warehousing terminology and the various technologies that are required to implement a data warehouse • Explain the implementation and organizational issues surrounding a data warehouse project • Identify data warehouse modeling concepts • Explain the extraction, transformation, and loading processes for building a data warehouse

  3. Course Objectives • Identify management and maintenance processes that are associated with a data warehouse project • Describe methods for refreshing warehouse data • Explain warehouse metadata concepts • Identify tools that can be employed at each stage of the data warehouse project • Describe user profiles and techniques for querying the warehouse • Identify methods and tools for accessing and analyzing warehouse data

  4. Lessons • Business Intelligence and Data Warehousing • Defining Data Warehouse Concepts and Terminology • Planning and Managing the Data Warehouse Project • Modeling the Data Warehouse • Building the Data Warehouse: Extracting Data • Building the Data Warehouse: Transforming Data • Building the Data Warehouse: Loading Warehouse Data • Refreshing Warehouse Data • Leaving a Metadata Trail • Managing and Maintaining the Data Warehouse

  5. Let’s Get Started

  6. Lesson 1 Objectives • After completing this lesson, you should be able to do the following: • Describe the role of business intelligence in today’s marketplace • Describe why an online transaction processing system (OLTP) is not suitable for analytical reporting • Describe how extract processing for decision support querying led to data warehouse solutions that are employed today • Explain why businesses are driven to employ data warehouse technology

  7. What Is Business Intelligence? “Business Intelligence is the process of transforming data into information and through discovery transforming that information into knowledge.” Gartner Group

  8. Decision Knowledge Information Data Purpose of Business Intelligence The purpose of business intelligence is to convert the volume of data into business value through analytical reporting. Value Volume

  9. Ad hoc access Productionplatforms Operational reports Decision makers Early Management Information Systems • MIS systems provided business data. • Reports were developed on request. • Reports provided little analysis capability. • Decision support tools gave personal ad hoc access to data.

  10. Analyzing Data from Operational Systems • Data structures are complex. • Systems are designed for high performance and throughput. • Data is not meaningfully represented. • Data is dispersed. • OLTP systems may be unsuitable for intensive queries. Productionplatforms Operational reports

  11. Why OLTP Is Not Suitable for Analytical Reporting

  12. Operational systems Extracts Decisionmakers Data Extract Processing • End user computing offloaded from the operational environment • User’s own data

  13. Operationalsystems Extracts Decisionmakers Extract Explosion Management Issues with Data Extract Programs

  14. Productivity Issues with Extract Processing • Duplicated effort • Multiple technologies • Obsolete reports • No metadata

  15. Data Quality Issues with Extract Processing • No common time basis • Different calculation algorithms • Different levels of extraction • Different levels of granularity • Different data field names • Different data field meanings • Missing information • No data correction rules • No drill-down capability

  16. Data Warehousing and Business Intelligence Enterprise Data Warehouse Legacy Data Operations Data Analytical Reporting External Data Data Marts

  17. Internal and external systems Datawarehouse Decisionmakers Advantages of Warehouse Processing Environments • Controlled • Reliable • Quality information • Single source of data

  18. Advantages of Warehouse Processing Environments • No duplication of effort • No need for tools to support many technologies • No disparity in data, meaning, or representation • No time period conflict • No algorithm confusion • No drill-down restrictions

  19. Success Factors for a Dynamic Business Environment • Know the business • Reinvent to face new challenges • Invest in products • Invest in customers • Retain customers • Invest in technology • Improve access to business information • Provide superior services and products • Be profitable

  20. Business Drivers for Data Warehouses • Provide supporting information systems • Get quality information: • Reduce costs • Streamline the business • Improve margins

  21. Technological Advances Enabling Data Warehousing • Hardware • Operating system • Database • Query tools • Applications • Large databases • 64-bit architectures • Indexing techniques • Affordable, cost-effective open systems • Robust warehouse tools • Sophisticated end user tools

  22. 两种数据的区别

  23. Summary • In this lesson, you should have learned how to: • Describe the role of business intelligence in today’s marketplace • Describe why an online transaction processing system (OLTP) is not suitable for analytical reporting • Describe how extract processing for decision support querying led to data warehouse solutions employed today • Explain why businesses are driven to employ data warehouse technology

  24. Practice 1-1 Overview • This practice covers the following topics: • Answering questions about data warehousing • Discussing how data warehousing meets business needs

  25. Lesson 2Defining Data Warehouse Concepts and Terminology

  26. Objectives • After completing this lesson, you should be able to do the following: • Identify a common, broadly accepted definition of a data warehouse • Describe the differences of dependent and independent data marts • Identify some of the main warehouse development approaches • Recognize some of the operational properties and common terminology of a data warehouse

  27. Definition of a Data Warehouse • “A data warehouse is a subject oriented, integrated, non-volatile, and time variant collection of data in support of management’s decisions.” • — W.H. Inmon “数据仓库是一个面向主题的、集成的、随时间变化的、非易失的、用于战略决策的数据集合” “Building the Data Warehouse”(1991)

  28. Definition of a Data Warehouse • “…数据仓库无非是所有数据集市的集合...” — Ralph Kimball • “数据仓库是信息数据库的具体实现,用来存储源自业务数据库的共享数据。典型的数据仓库应该是一个主题数据库,支持用户从巨大的运营数据存储中发现信息,支持对业务趋势进行跟踪和响应,实现业务的预测和计划。” — DM Review

  29. Definition of a Data Warehouse “An enterprise structured repository of subject-oriented, time-variant, historical data used for information retrieval and decision support. The data warehouse stores atomic and summary data.” — Oracle’s Data Warehouse Definition “数据仓库是一个过程而不是一个项目” —另一角度描述数据仓库

  30. Subject- oriented Integrated Data Warehouse Nonvolatile Time-variant Data Warehouse Properties

  31. OLTP Applications Data Warehouse Subject Equity Plans Shares Insurance Loans Savings Customer financial information Subject-Oriented • Data is categorized and stored by business subject rather than by application.

  32. Savings Current Accounts Loans Integrated • Data on a given subject is defined and stored once. Customer OLTP Applications Data Warehouse

  33. Data Warehouse Time-Variant • Data is stored as a series of snapshots, each representing a period of time.

  34. Nonvolatile • Typically data in the data warehouse is not updated or deleted. Operational Warehouse Load Insert, Update, Delete, or Read Read

  35. First time load Refresh Refresh Purge or Archive Refresh Changing Warehouse Data Operational Databases Warehouse Database

  36. Data Warehouse Versus OLTP

  37. Usage Curves • Operational system is predictable • Data warehouse: • Variable • Random

  38. Enterprise wide Warehouse • Large scale implementation • Scopes the entire business • Data from all subject areas • Developed incrementally • Single source of enterprisewide data • Synchronized enterprisewide data • Single distribution point to dependent data marts

  39. 数据仓库设计中心思想 • 具有一个合适的粒度或细节以满足所有的数据集市 • 设计不能阻碍在数据集市中使用各种技术,能适应多维集市、统计、挖掘及探索型仓库

  40. Data Marts • 数据仓库数据的一个子集。 BI环境中的大部分分析活动均在数据集市中进行。每个数据集市中的数据通常是为特定的功能所定制,不必对其他的使用有效。

  41. Data Warehouses Versus Data Marts

  42. OperationalSystems Marketing Finance Sales Flat Files Legacy Data External Data Operations Data External Data Dependent Data Mart Data Marts Data Warehouse Marketing Sales Finance HR

  43. OperationalSystems Flat Files Legacy Data External Data Operations Data External Data Independent Data Mart Sales orMarketing

  44. Features of a Data Mart • Not Real-Time Data • Consolidation and Cleansing

  45. Warehouse Development Approaches • “Big bang” approach • Incremental approach: • Top-down incremental approach • Bottom-up incremental approach

  46. Analyze enterprise requirements Build enterprise data warehouse Report in subsets or store in data marts “Big Bang” Approach

  47. Top-Down Approach • Analyze requirements at the enterprise level • Develop conceptual information model • Identify and prioritize subject areas • Complete a model of selected subject area • Map to available data • Perform a source system analysis • Implement base technical architecture • Establish metadata, extraction, and load processes for the initial subject area • Create and populate the initial subject area data mart within the overall warehouse framework

  48. Bottom-Up Approach • Define the scope and coverage of the data warehouse and analyze the source systems within this scope • Define the initial increment based on the political pressure, assumed business benefit and data volume • Implement base technical architecture and establish metadata, extraction, and load processes as required by increment • Create and populate the initial subject areas within the overall warehouse framework

  49. Incremental Approach to Warehouse Development • Multiple iterations • Shorter implementations • Validation of each phase Increment 1 Strategy Definition Analysis Design Build Iterative Production

  50. Data Warehousing Process Components • Methodology • Architecture • Extraction, Transformation, and Load (ETL) • Implementation • Operation and Support

More Related