1 / 57

ADBMS

ADBMS. Data Warehousing OLAP Technology. What is Data Warehouse?. Defined in many different ways, but not rigorously. A decision support database that is maintained separately from the organization’s operational database

coral
Download Presentation

ADBMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ADBMS Data Warehousing OLAP Technology

  2. What is Data Warehouse? • Defined in many different ways, but not rigorously. • A decision support database that is maintained separately from the organization’s operational database • Support information processing by providing a solid platform of consolidated, historical data for analysis. • “A data warehouse is asubject-oriented, integrated, time-variant, and nonvolatilecollection of data in support of management’s decision-making process.”—W. H. Inmon • Data warehousing: • The process of constructing and using data warehouses

  3. Data Warehouse—Subject-Oriented • Organized around major subjects, such as customer, product, sales. • Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.

  4. Data Warehouse—Integrated • Constructed by integrating multiple, heterogeneous data sources • relational databases, flat files, on-line transaction records • Data cleaning and data integration techniques are applied. • Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources • E.g., Hotel price: currency, tax, breakfast covered, etc. • When data is moved to the warehouse, it is converted.

  5. Data Warehouse—Time Variant • The time horizon for the data warehouse is significantly longer than that of operational systems. • Operational database: current value data. • Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) • Every key structure in the data warehouse • Contains an element of time, explicitly or implicitly • But the key of operational data may or may not contain “time element”.

  6. Data Warehouse—Non-Volatile • A physically separate store of data transformed from the operational environment. • Operational update of data does not occur in the data warehouse environment. • Does not require transaction processing, recovery, and concurrency control mechanisms • Requires only two operations in data accessing: • initial loading of data and access of data.

  7. Data Warehouse vs. Heterogeneous DBMS • Traditional heterogeneous DB integration: • Build wrappers/mediators on top of heterogeneous databases • Query driven approach • When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set • Complex information filtering, compete for resources • Data warehouse: update-driven, high performance • Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis

  8. Data Warehouse vs. Operational DBMS • OLTP (on-line transaction processing) • Major task of traditional relational DBMS • Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. • OLAP (on-line analytical processing) • Major task of data warehouse system • Data analysis and decision making • Distinct features (OLTP vs. OLAP): • User and system orientation: customer vs. market • Data contents: current, detailed vs. historical, consolidated • Database design: ER + application vs. star + subject • View: current, local vs. evolutionary, integrated • Access patterns: update vs. read-only but complex queries

  9. OLTP vs. OLAP

  10. Data Warehouse Design

  11. OLAP

  12. Objectives • What is OLAP • Need for OLAP • Features & functions of OLAP • Different OLAP models • OLAP implementations

  13. Demand for OLAP • To develop DM, three approaches • In all approaches, Data Marts rest on Dimensional Model • Data Marts are sufficient for basic data analysis • Users need to go beyond such basic analysis

  14. Demand for OLAP • Need for Multidimensional Analysis • Fast Access & Powerful Calculations • Limitations of other analysis methods like: • SQL • Spreadsheets • Report Writers

  15. Demand for OLAP • Traditional tools of report writers, query products, spreadsheets, & language interfaces do not match the user expectations as far as performing multidimensional analysis with complex calculations is concerned. • Tools used with OLTP and basic DW environments do not match up to the task

  16. OLAP is the Answer! OLAP is a category of software technology that enables analysts, managers, and executives to gain insight into the data through fast, consistent, interactive, access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.

  17. Why is OLAP useful? • Facilitates multidimensional data analysis by pre-computing aggregates across many sets of dimensions • Provides for: • Greater speed and responsiveness • Improved user interactivity

  18. Data Warehouses • A data warehouse is based on a multidimensional data model which views data in the form of a data cube • A data cube allows data to be modeled and viewed in multiple dimensions • In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.

  19. all 0-D(apex) cuboid time item location supplier 1-D cuboids time,item time,location item,location location,supplier 2-D cuboids time,supplier item,supplier time,location,supplier time,item,location 3-D cuboids item,location,supplier time,item,supplier 4-D(base) cuboid time, item, location, supplier Lattice of Cuboids

  20. day 2 day 1 CUBE Multi-dimensional cube: Fact table view: dimensions = 3

  21. Aggregates • Add up amounts for day 1 • In SQL: SELECT sum(amt) FROM SALE • WHERE date = 1 81

  22. Aggregates • Add up amounts by day • In SQL: SELECT date, sum(amt) FROM SALE • GROUP BY date

  23. Aggregates • Operators: sum, count, max, min, median, avg • “Having” clause • Using dimension hierarchy • average by region (within store) • maximum by month (within date)

  24. rollup drill-down Cube Aggregation Example: computing sums day 2 . . . day 1 129

  25. Cube Operators day 2 . . . day 1 sale(c1,*,*) 129 sale(c2,p2,*) sale(*,*,*) sale(*,p1,*)

  26. Extended Cube * day 2 sale(*,p2,*) day 1

  27. day 2 day 1 Aggregation Using Hierarchies customer region country (customer c1 in Region A; customers c2, c3 in Region B)

  28. day 2 day 1 Pivoting Fact table view: Multi-dimensional cube:

  29. day 2 day 1 Cube Aggregates Lattice 129 all city product date city, product city, date product, date use greedy algorithm to decide what to materialize city, product, date

  30. Dimension Hierarchies all state city

  31. Dimension Hierarchies all product city date product, date city, product city, date state city, product, date state, date state, product state, product, date not all arcs shown...

  32. Interesting Hierarchy all years weeks quarters conceptual dimension table months days

  33. Date 2Qtr 1Qtr sum 3Qtr 4Qtr TV Product U.S.A PC VCR sum Canada Country Mexico sum All, All, All SAMPLE CUBE Total annual sales of TV in U.S.A. Total annual sales of PC in U.S.A. Total sales In U.S.A Total annual sales of VCR in U.S.A. Total Q1 sales In U.S.A Total sales In Canada Total Q1 sales In Canada Total sales In Mexico Total Q1 sales In Mexico Total Q2 sales In all countries TOTAL SALES Total Q1 sales In all countries

  34. OLAP Operations • Roll-Up • Drill-Down • Slice & Dice • Pivot • Drill-Across • Drill-Through

  35. OLAP Operations

  36. Slicing

  37. Dicing (Sub-cube)

  38. Roll-Up

  39. Drill-Down

  40. Other OLAP Operations • Drill-Across: Queries involving more than one fact table • Drill-Through: Makes use of SQL to drill through the bottom level of a data cube down to its back-end relational tables • Pivot (rotate): Pivot (also called "rotate") is a • visualization operation which rotates the data axes in • view in order to provide an alternative presentation of • the data. Other examples include rotating the axes in a • 3-D cube, or transforming a 3-D cube into a series of 2- • D planes.

  41. Other OLAP Operations • Moving Averages • Growth Rates • Depreciation • Currency Conversion • Statistical Functions • Top N or Bottom N queries

  42. Conceptual vs. Actual • The “cube” is a logical way of visualizing the data in an OLAP setting • Not how the data is actually represented on disk • Two ways of storing data: • ROLAP: Relational OLAP • MOLAP: Multidimensional OLAP

  43. OLAP & CUBE • Construction of the data cube is key to the operation of OLAP • The computation process creates a set of aggregates on the various dimensions of the data • The CUBE operator

  44. An example of the CUBE Operator

  45. The CUBE Operator • Proposed by Gray et al* • Effectively involves a series of GROUP-BY operations to aggregate data • Creates power set on all attributes according to: • A measure • An aggregator function *J. Gray, S. Chaudhuri, A. Bosworth, A. Layman,D. Reichart, M. Venkatrao, F. Pellow and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29-54, 1997.

  46. CUBING Problem • Problem: this generates a lot of data and work (2n sets in total, where n is the number of dimensions) • Solution: optimized algorithms to run faster, consume less memory, and perform fewer I/Os.

  47. Efficient Computation of Data Cubes • ROLAP-based cubing algorithms (Agarwal et al’96) • Array-based cubing algorithm (Zhao et al’97) S. Agarwal, R. Agrawal, P. M. Deshpande, A.Gupta, J. F. Naughton, R. Ramakrishnan and S.Sarawagi. On the computation of multidimensional aggregates. In VLDB'96. Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD'97.

  48. Efficient Computation of Data Cubes • How many cuboids in a cube with 3 dimensions? • Answer: • As many group by operations? • No hierarchies involved!! • π(Li+1), where Li is the number of levels associated with dimension I • 10 dimensions & 4 levels for each dimension • Total Cuboids = 510

  49. Approaches to OLAP Servers • It is all about which DBMS you choose to store your data warehouse data • RDBMS – ROLAP • MDDB – MOLAP • BOTH - HOLAP

  50. Approaches to OLAP Servers Three possibilities for OLAP servers (1) Relational OLAP (ROLAP) • Relational and specialized relational DBMS to store and manage warehouse data • OLAP middleware to support missing pieces (2) Multidimensional OLAP (MOLAP) • Array-based storage structures • Direct access to array data structures (3) Hybrid OLAP (HOLAP) • Storing detailed data in RDBMS • Storing aggregated data in MDBMS • User access via MOLAP tools

More Related