1 / 37

DATA CUBE

DATA CUBE. DATA CUBE. Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP. Motivation - Data Analysis. Look for unusual patterns in data. Visualization tools Should display data trends, clusters. N-dimensional space Allow Dimensionality reduction

elvina
Download Presentation

DATA CUBE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATA CUBE Advanced Databases 584

  2. DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

  3. Motivation - Data Analysis Look for unusual patterns in data • Visualization tools • Should display data trends, clusters. • N-dimensional space • Allow Dimensionality reduction • Data Summarization Advanced Databases 584

  4. DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

  5. N-dimensional Data in 2D SQL tables N dimensions into N-attribute domain For Visualization, dimensionality reduction via aggregation to include data along neglected dimensions Advanced Databases

  6. N-dimensional Data in 2D SQL tables Query + Extract Each row in the above Weather table represents a weather measurement. SQL Standard Aggregators: COUNT() SUM() MIN() MAX() AVG() E.g: Select AVG(Temp) FROM Weather -> Returns a single Aggregate value for all tuples GROUP BY: -> Returns the aggregate value for each of the groups in the result-set, grouped by one or more columns. E.g: SELECT Time, Altitude, AVG(Temp) FROM Weather GROUP BY Time, Altitude; Complexity? Visualize Advanced Databases 584

  7. DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

  8. Few Extensions Informix Illustra – Handles for User defined CallBacks • Init(&handle): Allocates the handle and initializes the aggregate computation. • Iter(&handle, value): Aggregates the next value into the current aggregate. • value = Final(&handle): Computes and returns the resulting aggregate by using data saved in the handle. Red Brick systems • Rank(expression) • N_tile(expression, n) • Ratio_To_Total(expression) Advanced Databases 584

  9. DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

  10. Issues with GROUP BY- Histogram • Data analysis are difficult with these SQL aggregation constructs • Issue: SQL does not allow direct construction of aggregation over computed categories Advanced Databases 584

  11. Issues with GROUP BY- Histogram Suppose the user wants a grouping on day, area and Max(temp) for the day SELECT day, nation, MAX(Temp) FROM Weather GROUP BY Day(Time) AS day, Nation(Latitude, Longitude) AS nation; Not Possible in Standard SQL Advanced Databases 584

  12. Issues with GROUP BY- Histogram Solution : Compute indirectly from a table-valued expression which is then aggregated SELECT day, nation, MAX(Temp) FROM (SELECT Day(Time) AS day, Nation(Latitude, Longitude) AS nation, Temp FROM Weather) AS foo GROUP BY day, nation; Issue: Complex Nested Queries!! Advanced Databases 584

  13. Issues with GROUP BY- Rollups Not Relational! Rollup N states gives rise to 2^N possible aggregation columns Drill down Advanced Databases 584

  14. Issues with GROUP BY- Rollups Solution 1: Enormous Number of Domains Increase in number of columns ∞ power set of the number of aggregated attributes Advanced Databases 584

  15. Issues with GROUP BY- Rollups Solution 2: Pivot table in Excel Pivot on color column Explosion in number of Columns! ->Transposes the Spreadsheet -> Aggregate values based on their column ‘Values’ -> Increases as Pivot increases Advanced Databases 584

  16. Issues with GROUP BY- Rollups Another Solution: • Prevents Exponential Growth by ALL • Dummy Value “ALL” added to fill in super-aggregation terms Advanced Databases 584

  17. Issues with GROUP BY- Rollups SELECT `ALL', `ALL', `ALL', SUM(Sales) FROM Sales WHERE Model = `Chevy‘ UNION SELECT Model, `ALL', `ALL', SUM(Sales) FROM Sales WHERE Model = `Chevy' GROUP BY Model UNION SELECT Model, Year, `ALL', SUM(Sales) FROM Sales WHERE Model = `Chevy' GROUP BY Model, Year UNION SELECT Model, Year, Color, SUM(Sales) FROM Sales WHERE Model = `Chevy' GROUP BY Model, Year, Color; SQL statement to build this table from Sales Data Advanced Databases 584

  18. Issues with GROUP BY- Cross Tabulation • What is cross-tabulation?? • Symmetric aggregation result table • Cross-tab is two dimensional aggregation. If more aggregation values are added then it grows in dimension Advanced Databases 584

  19. Issues with GROUP BY- Cross Tabulation Equivalent Relational Representation in terms of ‘ALL’ values Explosion in terms of number of cross-tabs! Chevy Sales Cross Tab: Ford Sales Cross Tab: Advanced Databases 584

  20. Why CUBE instead of Group By’s • -> Group Bys solve aggregation • -> But roll-up and Cross-tab is daunting! • E.g A 6 Dimensional cross-tab requires 64-way union of 64 different GROUP BY operators!!!!! Solution: Lets look at the CUBE! Advanced Databases 584

  21. DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

  22. CUBE Operators Generalization of group-by, roll-up and cross-tab N-1 lower dimensional aggregates appear as points, lines, planes, cubes, hyper-cubes Advanced Databases 584

  23. Data Cubes : What, Why, Where, How • What? Operator in OLAP for fast analysis of data • Why? Requirement tomanipulate and analyze data from multiple perspectives. Traditional relational databases fit into this multi-dimensional data analysis??? • Where? Used in OLAP Advanced Databases 584

  24. Data Cubes : What, When, Why, How Advanced Databases 584

  25. OLAP • On-Line Analytical Processing - A type of data processing that allows decision makers to examine data according to the dimensions of the business • Data warehousing is a type of OLAP. • OLAP systems are built upon Multidimensional Databases - i.e., they are designed from the ground up to support the definition and query of multidimensional data cubes/structures. • The core of any OLAP system is an OLAP cube a.k.a Data cube Advanced Databases 584

  26. OLAP CUBE Company sales of a part to a customer at a store location Advanced Databases 584

  27. Advanced Databases 584

  28. Create a Data CUBE • Generate the power set (set of all subsets) of the aggregation columns by overloading GROUP BY operator. • The data cube operator builds a table containing all these aggregate values. The total • aggregate using function f() is represented as the tuple: • ALL, ALL, ALL, : : :, ALL, f(*) Advanced Databases 584

  29. Create a Data CUBE Advanced Databases 584

  30. Create a Data CUBE CUBE is a relational Operator If C1,C2,… CN are cardinalities of N attributes Then resulting cube relation is ∏(Ci +1) i.e initial SALES table is 2*3*3 = 18 rows The derived data cube 3*4*4 = 48 rows Advanced Databases 584

  31. CUBE-Syntax and Semantics Syntax: SELECT day, nation, MAX(Temp) FROM Weather GROUP BY CUBE Day(Time) AS day, Country(Latitude, Longitude) AS nation; Semantics: • Aggregates over all the <select list> in the GROUP BY . • It then unions in each super-aggregate of the global cube- substituting ALL for aggregation columns • -> If there are N attributes in <select list> there will be 2^N -1 super-aggregate values. Advanced Databases 584

  32. DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584

  33. ROLL-UP Operator To compute only a set of aggregates like this, CUBE would be an overkill! SOLUTION: ROLLUP! Advanced Databases 584

  34. ROLL-UP Operator ROLLUP produces just the super-aggregates • Difference : • CUBE generates a result set that shows aggregates for all combinations of values in the selected columns. • ROLLUP generates a result set that shows aggregates for a hierarchy of values in the selected columns. (v1 ,v2 ,...,vn, f()), (v1 ,v2 ,...,ALL, f()), ... (v1 ,ALL,...,ALL, f()), (ALL,ALL,...,ALL, f()). Advantage: ->Unlike CUBE in which the answer set is Multi-dimensional…. ROLL UP is naturally sequential ->Operators like SUM(),AVG() work well with ROLLUP Advanced Databases 584

  35. CUBE and ROLLUP Algebra SELECT Manufacturer, Year, Month, Day, Color, Model, SUM(price) AS Revenue FROM Sales GROUP BY Manufacturer, ROLLUP Year(Time) AS Year, Month(Time) AS Month, Day(Time) AS Day, CUBE Color, Model; GroupBy CUBE ROLLUP Advanced Databases 584

  36. CUBE and ROLLUP Algebra GROUP BY <select list> ROLLUP <select list> CUBE <select list> “Powerful” Advanced Databases 584

  37. CUBE and ROLLUP Algebra A nice link illustrating the difference between ROLLUP and CUBE http://msdn.microsoft.com/en-US/library/ms189305(v=SQL.90).aspx Advanced Databases 584

More Related