550 likes | 1k Views
DATA CUBE. DATA CUBE. Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP. Motivation - Data Analysis. Look for unusual patterns in data. Visualization tools Should display data trends, clusters. N-dimensional space Allow Dimensionality reduction
E N D
DATA CUBE Advanced Databases 584
DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584
Motivation - Data Analysis Look for unusual patterns in data • Visualization tools • Should display data trends, clusters. • N-dimensional space • Allow Dimensionality reduction • Data Summarization Advanced Databases 584
DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584
N-dimensional Data in 2D SQL tables N dimensions into N-attribute domain For Visualization, dimensionality reduction via aggregation to include data along neglected dimensions Advanced Databases
N-dimensional Data in 2D SQL tables Query + Extract Each row in the above Weather table represents a weather measurement. SQL Standard Aggregators: COUNT() SUM() MIN() MAX() AVG() E.g: Select AVG(Temp) FROM Weather -> Returns a single Aggregate value for all tuples GROUP BY: -> Returns the aggregate value for each of the groups in the result-set, grouped by one or more columns. E.g: SELECT Time, Altitude, AVG(Temp) FROM Weather GROUP BY Time, Altitude; Complexity? Visualize Advanced Databases 584
DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584
Few Extensions Informix Illustra – Handles for User defined CallBacks • Init(&handle): Allocates the handle and initializes the aggregate computation. • Iter(&handle, value): Aggregates the next value into the current aggregate. • value = Final(&handle): Computes and returns the resulting aggregate by using data saved in the handle. Red Brick systems • Rank(expression) • N_tile(expression, n) • Ratio_To_Total(expression) Advanced Databases 584
DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584
Issues with GROUP BY- Histogram • Data analysis are difficult with these SQL aggregation constructs • Issue: SQL does not allow direct construction of aggregation over computed categories Advanced Databases 584
Issues with GROUP BY- Histogram Suppose the user wants a grouping on day, area and Max(temp) for the day SELECT day, nation, MAX(Temp) FROM Weather GROUP BY Day(Time) AS day, Nation(Latitude, Longitude) AS nation; Not Possible in Standard SQL Advanced Databases 584
Issues with GROUP BY- Histogram Solution : Compute indirectly from a table-valued expression which is then aggregated SELECT day, nation, MAX(Temp) FROM (SELECT Day(Time) AS day, Nation(Latitude, Longitude) AS nation, Temp FROM Weather) AS foo GROUP BY day, nation; Issue: Complex Nested Queries!! Advanced Databases 584
Issues with GROUP BY- Rollups Not Relational! Rollup N states gives rise to 2^N possible aggregation columns Drill down Advanced Databases 584
Issues with GROUP BY- Rollups Solution 1: Enormous Number of Domains Increase in number of columns ∞ power set of the number of aggregated attributes Advanced Databases 584
Issues with GROUP BY- Rollups Solution 2: Pivot table in Excel Pivot on color column Explosion in number of Columns! ->Transposes the Spreadsheet -> Aggregate values based on their column ‘Values’ -> Increases as Pivot increases Advanced Databases 584
Issues with GROUP BY- Rollups Another Solution: • Prevents Exponential Growth by ALL • Dummy Value “ALL” added to fill in super-aggregation terms Advanced Databases 584
Issues with GROUP BY- Rollups SELECT `ALL', `ALL', `ALL', SUM(Sales) FROM Sales WHERE Model = `Chevy‘ UNION SELECT Model, `ALL', `ALL', SUM(Sales) FROM Sales WHERE Model = `Chevy' GROUP BY Model UNION SELECT Model, Year, `ALL', SUM(Sales) FROM Sales WHERE Model = `Chevy' GROUP BY Model, Year UNION SELECT Model, Year, Color, SUM(Sales) FROM Sales WHERE Model = `Chevy' GROUP BY Model, Year, Color; SQL statement to build this table from Sales Data Advanced Databases 584
Issues with GROUP BY- Cross Tabulation • What is cross-tabulation?? • Symmetric aggregation result table • Cross-tab is two dimensional aggregation. If more aggregation values are added then it grows in dimension Advanced Databases 584
Issues with GROUP BY- Cross Tabulation Equivalent Relational Representation in terms of ‘ALL’ values Explosion in terms of number of cross-tabs! Chevy Sales Cross Tab: Ford Sales Cross Tab: Advanced Databases 584
Why CUBE instead of Group By’s • -> Group Bys solve aggregation • -> But roll-up and Cross-tab is daunting! • E.g A 6 Dimensional cross-tab requires 64-way union of 64 different GROUP BY operators!!!!! Solution: Lets look at the CUBE! Advanced Databases 584
DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584
CUBE Operators Generalization of group-by, roll-up and cross-tab N-1 lower dimensional aggregates appear as points, lines, planes, cubes, hyper-cubes Advanced Databases 584
Data Cubes : What, Why, Where, How • What? Operator in OLAP for fast analysis of data • Why? Requirement tomanipulate and analyze data from multiple perspectives. Traditional relational databases fit into this multi-dimensional data analysis??? • Where? Used in OLAP Advanced Databases 584
Data Cubes : What, When, Why, How Advanced Databases 584
OLAP • On-Line Analytical Processing - A type of data processing that allows decision makers to examine data according to the dimensions of the business • Data warehousing is a type of OLAP. • OLAP systems are built upon Multidimensional Databases - i.e., they are designed from the ground up to support the definition and query of multidimensional data cubes/structures. • The core of any OLAP system is an OLAP cube a.k.a Data cube Advanced Databases 584
OLAP CUBE Company sales of a part to a customer at a store location Advanced Databases 584
Create a Data CUBE • Generate the power set (set of all subsets) of the aggregation columns by overloading GROUP BY operator. • The data cube operator builds a table containing all these aggregate values. The total • aggregate using function f() is represented as the tuple: • ALL, ALL, ALL, : : :, ALL, f(*) Advanced Databases 584
Create a Data CUBE Advanced Databases 584
Create a Data CUBE CUBE is a relational Operator If C1,C2,… CN are cardinalities of N attributes Then resulting cube relation is ∏(Ci +1) i.e initial SALES table is 2*3*3 = 18 rows The derived data cube 3*4*4 = 48 rows Advanced Databases 584
CUBE-Syntax and Semantics Syntax: SELECT day, nation, MAX(Temp) FROM Weather GROUP BY CUBE Day(Time) AS day, Country(Latitude, Longitude) AS nation; Semantics: • Aggregates over all the <select list> in the GROUP BY . • It then unions in each super-aggregate of the global cube- substituting ALL for aggregation columns • -> If there are N attributes in <select list> there will be 2^N -1 super-aggregate values. Advanced Databases 584
DATA CUBE Motivation Relational Data Extraction Few Extensions Group-by Fails CUBE ROLLUP Advanced Databases 584
ROLL-UP Operator To compute only a set of aggregates like this, CUBE would be an overkill! SOLUTION: ROLLUP! Advanced Databases 584
ROLL-UP Operator ROLLUP produces just the super-aggregates • Difference : • CUBE generates a result set that shows aggregates for all combinations of values in the selected columns. • ROLLUP generates a result set that shows aggregates for a hierarchy of values in the selected columns. (v1 ,v2 ,...,vn, f()), (v1 ,v2 ,...,ALL, f()), ... (v1 ,ALL,...,ALL, f()), (ALL,ALL,...,ALL, f()). Advantage: ->Unlike CUBE in which the answer set is Multi-dimensional…. ROLL UP is naturally sequential ->Operators like SUM(),AVG() work well with ROLLUP Advanced Databases 584
CUBE and ROLLUP Algebra SELECT Manufacturer, Year, Month, Day, Color, Model, SUM(price) AS Revenue FROM Sales GROUP BY Manufacturer, ROLLUP Year(Time) AS Year, Month(Time) AS Month, Day(Time) AS Day, CUBE Color, Model; GroupBy CUBE ROLLUP Advanced Databases 584
CUBE and ROLLUP Algebra GROUP BY <select list> ROLLUP <select list> CUBE <select list> “Powerful” Advanced Databases 584
CUBE and ROLLUP Algebra A nice link illustrating the difference between ROLLUP and CUBE http://msdn.microsoft.com/en-US/library/ms189305(v=SQL.90).aspx Advanced Databases 584