1 / 42

On-Line Analytical Processing Salman Azhar

On-Line Analytical Processing Salman Azhar. Warehousing Data Cubes Data Mining. These slides use some figures, definitions, and explanations from Elmasri-Navathe’s Fundamentals of Database Systems and Molina-Ullman-Widom’s Database Systems. Overview. Traditional database systems

octavious
Download Presentation

On-Line Analytical Processing Salman Azhar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On-Line Analytical Processing Salman Azhar Warehousing Data Cubes Data Mining These slides use some figures, definitions, and explanations from Elmasri-Navathe’s Fundamentals of Database Systemsand Molina-Ullman-Widom’s Database Systems Salman Azhar: Database Systems

  2. Overview • Traditional database systems • tuned to many, small, simple queries • Some newer “analytic” applications • fewer, more time-consuming, complex queries • New architectures • developed to handle complex “analytic” queries efficiently Salman Azhar: Database Systems

  3. The Data Warehouse • The most common form of data integration: • Copy sources into a single DB (warehouse) and try to keep it up-to-date • Usual method: periodic reconstruction of the warehouse, perhaps overnight • Warehouse essential for analytic queries Salman Azhar: Database Systems

  4. OLTP • Most database operations involve On-Line Transaction Processing (OTLP). • Short, simple, frequent queries and/or modifications • Each involving a small number of tuples. • Examples… : • Looking up a phone number on the web • Sales at cash registers • Selling airline tickets Salman Azhar: Database Systems

  5. OLAP • Increasing importance of On-Line Application Processing (OLAP) queries • Few, but complex queries --- may run for hours. • Queries do not depend on having an absolutely up-to-date database. • Sometimes called Data Mining Salman Azhar: Database Systems

  6. OLAP Examples • Amazon analyzes customer purchases by its customers to recommend with products of likely interest • Compares purchases between customers • Takes longer than customers are willing to wait • Wal-Mart looks for items with sales trends in a region or time period • Presents data to vendors • Used to determine ordering and inventory Salman Azhar: Database Systems

  7. Common Architecture • Databases at branches handle OLTP • Local databases copied to a central warehouse overnight (or periodically) • Analysts use the warehouse for OLAP OLTP OLAP OLTP OLTP Transaction Users OLTP Analysts Salman Azhar: Database Systems

  8. Data Warehouse User Data Access Data Sources Staging Area Data Marts Data Input Data Access Data Warehouse Salman Azhar: Database Systems

  9. Star Schemas • A star schema • common organization for data at a warehouse • It consists of… • Fact table : • a very large accumulation of facts such as sales • often “insert-only” • Dimension tables : • smaller, generally static information • about the entities involved in the facts Salman Azhar: Database Systems

  10. Employee_Dim EmployeeKey EmployeeID ... Product_Dim Time_Dim ProductKey TimeKey ProductID ... TheDate ... Shipper_Dim Customer_Dim ShipperKey CustomerKey ShipperID ... CustomerID ... StarSchema Dimension Table Fact Table Sales_Fact TimeKey EmployeeKey ProductKey CustomerKey ShipperKey Sales Amount Unit Sales ... Salman Azhar: Database Systems

  11. Example: Star Schema • Suppose we want to record in a warehouse information about every car sale: • dealer, car, buyer, day, time, price paid • The fact table is a relation: • Sale(dealer, model, buyer, day, time, price) Salman Azhar: Database Systems

  12. Example, Continued • The dimension tables include information about the dealer, car, and buyer “dimensions”: • Dealer(dealer, city, zip) • Car(model, manufacturer) • Buyer(buyer, city, phone) • Recall the fact table: • Sale(dealer, model, buyer, day, time, price) Salman Azhar: Database Systems

  13. Dimensions and Dependent Attributes • Two classes of fact-table attributes: • Dimension attributes : • the key of a dimension table • Sale(dealer, model, buyer, day, time, price) • Dependent attributes : • a value determined by the dimension attributes of the row • Sale(dealer, model, buyer, day, time, price) • E.g., price determined by the combination of dealer, model, buyer, day, time Salman Azhar: Database Systems

  14. Example: Dependent Attribute • price is determined by • the combination of dimension attributes: • dealer, car, buyer, and the time (combination of day and time attributes). Salman Azhar: Database Systems

  15. Approaches to Building Warehouses • ROLAP = “relational OLAP”: • Tune a relational DBMS to support star schemas • MOLAP = “multidimensional OLAP”: • Use a specialized DBMS with a model such as the “data cube” Salman Azhar: Database Systems

  16. ROLAP Techniques • Bitmap indexes : • For each key value of a dimension table (e.g., each model for relation Cars) • create a bit-vector telling which tuples of the fact table have that value • Materialized views : • Store the answers to several useful queries (views) in the warehouse itself • Stored views! Salman Azhar: Database Systems

  17. Typical OLAP Queries • Often, OLAP queries begin with a “star join”: • the natural join of the fact table with all or most of the dimension tables • Recall the tables: Sales(dealer, model, buyer, day, time, price) Dealers(dealer, city, zip) Cars(model, manufacturer) Buyers(buyer, city, phone) • Example: SELECT * FROM Sales, Dealers, Cars, Buyers WHERE Sales.dealer = Dealers.dealer AND Sales.model = Cars.model AND Sales.buyer = Buyers.buyer; Salman Azhar: Database Systems

  18. Typical OLAP Queries --- 2 • The typical OLAP query will: • Start with a star join • Select for interesting tuples, based on dimension data • Group by one or more dimensions • Aggregate certain attributes of the result Salman Azhar: Database Systems

  19. Example: OLAP Query • For each dealer in Indianapolis • find the total sales of each car manufactured by BMW • Filter: • city = “Indianapolis” manf = “BMW” • Grouping: • by dealer and car • Aggregation: • Sum of price GROUP EXERCISE:Write the SQL Query Note: Do not turn over to the next page before attempting this exercise yourself! Salman Azhar: Database Systems

  20. Example: In SQL SELECT dealer, model, SUM(price) FROM Sales NATURAL JOIN Dealers NATURAL JOIN Cars WHERE Dealer.city = ’Indianapolis’ AND Car.manf = ’BMW’ GROUP BY dealer, model; Salman Azhar: Database Systems

  21. Using Materialized Views • A direct execution of this query from Sales and the dimension tables could take too long • If we create a materialized view that contains enough information, • we may be able to answer our query much faster Salman Azhar: Database Systems

  22. Example: Materialized View • Which views could help with our query? • Key issues: • It must join Sales, Dealers, and Cars, at least • It must group by at least dealer and car • It must not select out Indianapolis Dealers or BMW Cars • It must not project out city or manf Salman Azhar: Database Systems

  23. Since dealer -> city and model -> manf, there is no real grouping. We need city and manf in the SELECT. Example --- Continued • Here is a materialized view that could help: CREATE VIEW vSales(dealer, city, car, manf, sales) AS SELECT dealer, city, model, manf, SUM(price) sales FROM Sales NATURAL JOIN Dealers NATURAL JOIN Cars GROUP BY dealer, city, model, manf; Salman Azhar: Database Systems

  24. Example --- Concluded • Here’s our query using the materialized view vSales: SELECT dealer, car, sales FROM vSales WHERE city = ’Indianapolis’ AND manf = ’BMW’; Salman Azhar: Database Systems

  25. MOLAP and Data Cubes • Keys of dimension tables are the dimensions of a hypercube • Example: for the Sales data, the four dimensions are Dealers, Cars, Buyers, and time • Dependent attributes (e.g., price) appear at the points of the cube Salman Azhar: Database Systems

  26. Defining a Cube Atlanta Chicago Market Dimension Denver Grapes Cherries Detroit Melons Products Dimension Apples Q4 Q1 Q2 Q3 Time Dimension Salman Azhar: Database Systems

  27. Querying a Cube SalesFact Atlanta Chicago MarketsDimension Denver Grapes Cherries Dallas Melons Apples ProductsDimension Q4 Q1 Q2 Q3 TimeDimension Salman Azhar: Database Systems

  28. Defining a Cube Slice Atlanta Chicago Markets Dimension Denver Grapes Cherries Detroit Melons Apples ProductsDimension Q4 Q1 Q2 Q3 Time Dimension Salman Azhar: Database Systems

  29. Working with Dimensions and Hierarchies • Dimensions Allow You to • Slice • Dice • Hierarchies Allow You to • Drill Down • Drill Up Salman Azhar: Database Systems

  30. Marginals • The data cube also includes aggregation (typically SUM) along the margins of the cube • The marginals include • aggregations over one dimension, two dimensions,… Salman Azhar: Database Systems

  31. Example: Marginals • Our 4-dimensional Sales cube includes • the sum of price over each dealer, each car, each buyer, and each time unit (perhaps days) • It would also have the sum of price over • all dealer-model pairs, all dealer-buyer-day triples,… Salman Azhar: Database Systems

  32. Structure of the Cube • Think of each dimension as having an additional value * • A point with one or more *’s in its coordinates aggregates over the dimensions with the *’s. • Example: • Sales(“Auto Nation”, “Mini Cooper”, *, *) holds the sum over all Buyers and all time of the Mini Coopers bought at AutoNation Salman Azhar: Database Systems

  33. Drill-Down • Drill-down = “de-aggregate” • = break an aggregate into its constituents • Example: • having determined that Auto Nation sells very few BMW Cars, • break down his sales by particular car Salman Azhar: Database Systems

  34. Roll-Up • Roll-up • = aggregate along one or more dimensions. • Example: • given a table of how many Mini Coopers each buyer buys at each dealer, • roll it up into a table giving total number of Mini Coopers bought by each buyer Salman Azhar: Database Systems

  35. Materialized Data-Cube Views • Data cubes invite materialized views that are aggregations in one or more dimensions • Dimensions may not be completely aggregated • an option is to group by an attribute of the dimension table Salman Azhar: Database Systems

  36. Example • A materialized view for our Sales data cube might: • Aggregate by buyer completely • Not aggregate at all by car • Aggregate by time according to the week • Aggregate according to the city of the dealer Salman Azhar: Database Systems

  37. Data Mining • Data mining is a popular term for queries that summarize big data sets in useful ways • Examples: • Clustering all Web pages by topic • Finding characteristics of fraudulent credit-card use Salman Azhar: Database Systems

  38. Market-Basket Data • An important form of mining from relational data involves market baskets • sets of “items” that are purchased together as a customer leaves a store • Summary of basket data is frequent itemsets • sets of items that often appear together in baskets Salman Azhar: Database Systems

  39. Example: Market Baskets • If people often buy bread and butter together, the store can: • Put bread and butter near each other and put potato chips between the two • Run a sale on bread and raise the price of butter Salman Azhar: Database Systems

  40. Finding Frequent Pairs • The simplest case is when we only want to find “frequent pairs” of items. • Assume data is in a relation Baskets(basket, item) • The support thresholds is the minimum number of baskets in which a pair appears before we are interested Salman Azhar: Database Systems

  41. Look for two Basket tuples with the same basket and different items. First item must precede second, so we don’t count the same pair twice. Create a group for each pair of items that appears in at least one basket. Throw away pairs of items that do not appear at least s times. Frequent Pairs in SQL SELECT b1.item, b2.item FROM Baskets b1, Baskets b2 WHERE b1.basket = b2.basket AND b1.item < b2.item GROUP BY b1.item, b2.item HAVING COUNT(*) >= s; Salman Azhar: Database Systems

  42. Summary • OLAP vs. OLTP • Two different worlds • Warehousing • Data Cubes • Data Mining • Materialized views • Storing aggregate data Salman Azhar: Database Systems

More Related