1 / 27

Introduction to Data Warehousing and OLAP

This presentation provides an introduction to data warehousing and OLAP, including the theory, system features, and a demo. Topics covered include data warehouse characteristics, data cube computation, efficient data cube computation, and on-line analytical processing (OLAP).

markkjones
Download Presentation

Introduction to Data Warehousing and OLAP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 資料倉儲介紹Data Warehousing and OLAP楊立偉教授台灣大學工管系

  2. Agenda • Introduction • Data Warehouse Theory • System Features • Demo • Discussions

  3. 1. Introduction

  4. 1.1 Introduction • A data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management decisions

  5. 1.1 Introduction (cont’d) How are organizations using data warehouse ? • Increasing customer focus, which includes the analysis of customer buying patterns. • Repositioning products and managing product portfolios by comparing the performance of sales by time or regions, in order to fine-tune production strategies • Analyzing operations and looking for sources of profit • Managing the customer relationship, making environmental corrections, and managing the cost of corporate assets

  6. 1.2 Data Warehouse Characteristics • It is a database designed for analytical tasks, using data from multiple applications • It supports a relatively small number of users with relatively long interactions • Its usage is read-intensive • Its content is periodically updated

  7. 1.2 Data Warehouse Characteristics (cont’d) • It contains current and historical data to provide a historical perspective of information • It contains a few large tables • Each query frequently results in a large result set and involves frequent full table scan and multi-table joins

  8. 1.3 Datawarehousing • The Processing of constructing and using data warehouses Heterogeneous Data Sources Data Cleaning Data Integration And Consolidation Interactive Analysis Making Strategic Decisions Constructing Data warehouse Using Data Warehouse

  9. 1.4 Three-tier System Architecture

  10. 2. Data Warehouse Theory

  11. 2.1 Data Warehouse Theory • Why not use Database directly ? • The update-driven approach is inefficient. • Potentially expensive for frequent queries. • Use Data warehouse instead • The query-driven approach is enough for making strategic decisions. • Separate the operational DBMS for daily and critical operations.

  12. 2.2 Data Cube • A multidimensional, logical view of the data • Concept hierarchy • Multiple data granularity 多重的資料顆粒度 • Data summarization 資料加總 • Data generalization 資料一般化

  13. A 3-dimension Data Cube

  14. Drill-down on time data for Q1 • Roll-up on address

  15. Adding a dimension supplier

  16. 2.3 Efficient Data Cube Computation • The challenges : 2N combinations • Concept hierarchy and Aggregations makes it more complicated ! • Materialization of data cube 如何實作 • Materialize every, none, or some ? • Algorithms for selection • Based on size • Based on sharing, • Based on access frequency.

  17. 2.4 On-Line Analytical Processing (OLAP) • Fast on-line processing of data cubes or multi-dimensional databases • OLAP operations: • Drilling • Pivoting 樞紐分析 • Slicing and Dicing • Filtering, etc.

  18. 2.4 On-Line Analytical Processing (Cont’d) • A multidimensional, logical view of the data. • Interactive analysis of the data (drill, pivot, slice_dice, filter) and Quick response to OLAP queries. • Summarization and aggregations at every dimension intersection. • Retrieval and display of data in 2-D or 3-D cross-tabs, charts, and graphs, with easy pivoting of the axes. • Analytical modeling: deriving ratios, variance, etc. and involving data across many dimensions. • Forecasting, trend analysis, and statistical analysis.

  19. 3. System Feature

  20. 3.1 Data sources supported • ODBC-compatible DBMS • Oracle, Microsoft SQL, MySQL, IBM DB2, etc. • Files • MS Access, MS Excel, etc. • Text files (CSV-format)

  21. 3.2 Data Cleansing 資料清洗 • Database schema translation • Field selection and mapping • Field re-naming • Field aggregating and deriving • Data filtering • Data value conversion • Data value mapping • Data value function • Date value conversion and decomposition

  22. 3.3 Building of Data Cube • Support for multi-dimension data • Support for concept hierarchy

  23. 3.5 Interactive Front-end Tools • User-defined multi-dimension • User-defined dimension hierarchy • User-defined data granularity • Real-time graph capabilities • Bar chart • Pie chart • Line chart

  24. 3.6 Other features • Web-based OLAP GUI • Easy to access from Internet • Easy to integrated with other systems • Import / Export capability

  25. 4. Demo

  26. 5. Discussions

  27. 5.1 Roadmap • Integrated with Data mining • Major Group / Sales Analysis 主力客群 • Prospects Analysis and Forecast 潛在購買分析與預測 • Association of Customers and Sales 關聯分析 • Market Segment Recommendation 市場區隔 • Other Business Intelligence application • Integrated to e-Marketing • 1-to-1 Personalization & Recommendation 個人化推薦 • Target marketing 目標行銷 • Loyalty program 客戶忠誠度計劃

More Related