1 / 64

From Transaction Processing to Support for Decision Making

From Transaction Processing to Support for Decision Making. CIS 671. Computerized Information Systems. Used to “run the business”. OSU Examples Personnel & Payroll (ARMS) Course Offerings Students, including course enrollments and grades (estimated $30M to replace) Inventory

fox
Download Presentation

From Transaction Processing to Support for Decision Making

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Transaction ProcessingtoSupport for Decision Making CIS 671 Decision Support Systems

  2. Computerized Information Systems • Used to “run the business”. • OSU Examples • Personnel & Payroll (ARMS) • Course Offerings • Students, including course enrollments and grades • (estimated $30M to replace) • Inventory • Transaction Processing Decision Support Systems

  3. 1st Generation DBMS • Designed for Transaction Processing • Hierarchical – IBM – IMS • Network • Management Information Systems • Added later • Mostly standard summary reports • Produced on a regular basis Decision Support Systems

  4. Relational DBMS • Codd – particularly designed for “ad hoc” queries • First uses for Transaction Processing • Transaction Data now available on-line • Use it to help Decision Making • Ad Hoc Decision Support Systems

  5. Decision Support Systems (DSS) • Use comprehensive view of all aspects of business. • Different business units • Historical data • Summary information • Classes of analysis tools: • Complex “traditional” SQL queries • Many “group-by” and “aggregation” queries (On Line Analytical Processing) • Exploratory data analysis - Data Mining Decision Support Systems

  6. Data Warehousing • Properties • Consolidated data from many sources • Spanning long time periods • Augmented with summary information • Size: several gigabytes to terabytes Decision Support Systems

  7. Data Warehouse Creation • Integrate schemas from different groups • Semantic mismatches • Different currencies • Different names for same attributes • Different structures for similar tables Decision Support Systems

  8. Data Warehouse Creation, cont. • Extract data from different operational databases and other external sources • Clean data - correct errors, fill in missing data • Transform data to match integrated schema • Load data into warehouse • Refresh data in a timely fashion • Purge very old data • Create metadata repository • May be so large that it is in a separate database Decision Support Systems

  9. Data Warehouse - Provide Variety of Analytical Tools • Complex “traditional” SQL queries • OLAP query engine • Data mining algorithm • Information visualization tools • Statistical packages • Report generators Decision Support Systems

  10. Data Mart • Departmental subset of a data warehouse • Top-down approach • Derive from the organization’s data warehouse • May be too hard to do all at once • Bottom-up approach • Initially create departmental data marts • Integrate data marts into organizational data warehouse • If not done carefully, may be hard to integrate Decision Support Systems

  11. OLTP Transaction oriented Thousands of users Small (MB to several GB) Current data Normalized data (many tables, few columns per table) Continuous update Simple to complex queries Data Warehouse Subject oriented Few users ( 100) Large (hundreds of GB to several TB) Historical data Denormalized data (few tables, many columns per table) Batch updates Usually very complex queries OLTP vs. Data Warehouse DBs(from Toby J. Teorey, Database Modeling & Design, Morgan Kaufmann, 1999, p. 212) Decision Support Systems

  12. Complex “traditional” SQL queries • Relational DBMS optimized for decision support • in contrast to a DBMS optimized for transaction processing • Example: • Teradata machine from NCR Decision Support Systems

  13. On Line Analytical Processing (OLAP)Multidimensional Databases (MDD) Decision Support Systems

  14. Example from Finkelstein [Fink95]: • Note that Branch, ProdID, Date  Sales, Returns • Note the multidimensionality of the SALES_INFO table. Decision Support Systems

  15. Dimension Hierarchies LOCATION Territory Region Branch TIME Year Quarter Week Month Date PRODUCT Category ProdID Decision Support Systems

  16. Possible queries: 1. How did product Widget sell in the last month, and how does this figure compare with sales over the last five years? How about by branch, region and territory? 2. Did this product sell better in different regions, and are there any regional trends? 3. Were there more returns of Widgets over the last year? Were these returns caused by defects? Were they manufactured in any particular plants? Decision Support Systems

  17. Additional Possible query: 4. Do commissions and pricing affect how sales persons sell the product? Do particular salespersons do a better job of selling the product? Note that a "multidimensional" spreadsheet would be useful. Codd called this type of problem On Line Analytical Processing (OLAP) in contrast to On Line Transaction Processing (TP). Decision Support Systems

  18. Codd's rules for OLAP: [Codd93] 1. Multi-Dimensional Concept View The user should be able to see the data as being multidimensional insofar as it should be easy to 'pivot' or 'slice and dice’. (See later.) 2. Transparency The OLAP functionality should be provided behind the user's existing software without adversely affecting the functionality of the 'host'. 3. Accessibility OLAP should allow the user to access diverse data stores but see the data within a common 'schema' provided by the OLAP tool. Decision Support Systems

  19. OLAP Rules, cont. 4. Consistent Reporting Performance There should not be significant degradation in performance with large numbers of dimensions or large quantities of data. 5. Client-Server Architecture Since much of the data is on mainframes, and the users work on PCs, the OLAP tool must be able to bring the two together! 6. Generic Dimensionality Data dimensions must all be treated equally. Functions available for one dimension must be available for others. Decision Support Systems

  20. OLAP Rules, cont. 7. Dynamic Sparse Matrix Handling The OLAP tool should be able to work out for itself the most efficient way to store sparse matrix data. 8. Multi User Support This is self-evident. 9. Unrestricted Cross-Dimensional Operations e.g., individual office overheads are allocated according to total corporate overheads divided in proportion to individual office sales. Decision Support Systems

  21. OLAP Rules, cont. 10. Intuitive Data Manipulation Navigation should be done by operations on individual cells rather than menus. 11. Flexible Reporting Row and column headings must be capable of more than one dimension each, and of displaying subsets of any dimension. 12. Unlimited Dimensions and Aggregation Levels At least 15 dimensions may be required, and within each there may be many hierarchical levels. Decision Support Systems

  22. Example from Finkelstein [Fink95]: • Note that Branch, ProdID, Date  Sales, Returns • Note the multidimensionality of the SALES_INFO table. Decision Support Systems

  23. “Pivoting”Cross TabulationSales by Date and Region Decision Support Systems

  24. “Drill Down”(narrower category)Replace Region by Branch. “Rollup” (more general category) Replace Region by Territory. Decision Support Systems

  25. OLAP Questions 1. Query language - how to say what's wanted. 2. Processing language - how to specify calculations: ratios, variances, . . . . 3. Data visualization - how to see the data. 4. Performance - time to process the query (5 second rule). Decision Support Systems

  26. OLAP References [Codd93] E. F. Codd, S. B. Codd, and C.T. Salley, "Providing OLAP to User Analysts: An IT Mandate," Codd & Date Inc., 1993. [Fink95] Richard Finkelstein, "MDD: Database Reaches the Next Dimension," DATABASE Programming and Design, 8(4), April 1995. Decision Support Systems

  27. Exploratory Data Analysis Data Mining • Find interesting trends or patterns in large data sets. • Statistics - Exploratory Data Analysis • Artificial Intelligence - Knowledge Discovery and Machine Learning • Much larger data sets Decision Support Systems

  28. Mining for Association Rules • Classic example • Market basket analysis • Record each customer transaction at a grocery store. • Try and identify sets of items purchased together. Decision Support Systems

  29. Association Rule: {coke}  {chips} People who buy coke usually buy chips. • Measures for Association Rule • {LHS}  {RHS} • Support: % of transactions containing this set of items. (2/5=40%) • Confidence: given all transactions containing LHS items, the % that also contain the RHS (2/3=67%) • Want both to be “reasonably” large. Decision Support Systems

  30. On-Line Analytical Processing (OLAP)Part II: CIS 671 Elmasri & Navathe §26.1 Decision Support Systems

  31. Multi-dimensional View of Data • Fact Table (also called cubes) • Dimension attributes • Dependent attributes (functions of the dimension attributes) • Dimension Tables, potentially one for each dimension Decision Support Systems

  32. OLAP Operations • Roll-up – increase the level of aggregation • Drill-down - decrease the level of aggregation • Slice-and-dice - selection and projection, i.e., reduce dimensionality of the data • Pivot – re-orient the dimensional view Decision Support Systems

  33. Implementation Approaches • Relational OLAP (ROLAP) Servers • Data stored in a relational • system • SQL extended • To allow easy OLAP query expression • To provide efficient OLAP query execution. • Multidimensional OLAP (MOLAP) • Systems directly store multidimensional data in special data structures • OLAP operations implemented directly on these data structures. • Hybrid OLAP (HOLAP) • Combines ROLAP and MOLAP. • Detail records (largest volume) in relational database. • Aggregations in separate, but connected”, MOLAP store. Decision Support Systems

  34. Product Example a Star Schema ProdNo ProdName ProdDescr Category CategoryDescr UnitPrice QOH Order OrderNo OrderDate Sales (Fact) table OrderNo SalespersonID CustomerNo ProdNo DateKey CityName Quantity TotalPrice Customer CustomerNo CustomerName CustomerAddress City Date DateKey Date Month Year Salesperson SalespersonID SalespersonName City Quota City CityName State Region Decision Support Systems

  35. Product Snowflake Schema Category ProdNo ProdName ProdDescr Category UnitPrice QOH Order CategoryName CategoryDescr OrderNo OrderDate Sales (Fact) table Customer OrderNo SalespersonID CustomerNo ProdNo DateKey CityName Quantity TotalPrice CustomerNo CustomerName CustomerAddress City Date Year Month DateKey Date Month Year Month Year Salesperson SalespersonID SalespersonName City Quota Region City State CityName State Region State Region Decision Support Systems

  36. Data Cubes • Precompute all possible aggregations. • Required extra storage is tolerable. • Little penalty to keep aggregate up-to-date if data does not change. • Normally some aggregation of raw data is done before it is entered into the data cube. Decision Support Systems

  37. Product Data Cube with Orders Accumulated Category ProdNo ProdName ProdDescr Category UnitPrice QOH CategoryName CategoryDescr Sales table Customer CustomerNo CustomerName CustomerAddress City SalespersonID CustomerNo ProdNo DateKey CityName Quantity TotalValue Date Year Month DateKey Date Month Salesperson Month Year SalespersonID SalespersonName City Quota Region City State CityName State Note that average for any aggregate can be calculated from TotalValue and Quantity. State Region Decision Support Systems

  38. Sample of Aggregates in the CUBE Sales (SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue) 22 11 100 2 ‘Columbus’ 3 300 CUBE(Sales) (SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue) 22 11 100 2 ‘Columbus’ 3 300 22 * 100 2 ‘Columbus’ 6 2222 22 * * 2 ‘Columbus’ 25 33000 * * * 2 ‘Columbus’ 75 90000 * * * * ‘Columbus’ 200 503444 Decision Support Systems

  39. How to answer query given the relation CUBE(Sales) Choose tuples in CUBE(Sales) with the following properties: • Query specifies valuev for attribute a • tuple t has v in its component for a. • Query groupsby attribute a • tuple t has any non-* value in its component for a. • Query has neithergroups by attribute a nor specifies value for a • tuple t has * value in its component for a. Decision Support Systems

  40. How to answer query given the relation CUBE(Sales) Cube(Sales) (SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue) 22 11 100 2 ‘Columbus’ 3 300 22 * 100 2 ‘Columbus’ 6 2222 22 * * 2 ‘Columbus’ 25 33000 * * * 2 ‘Columbus’ 75 90000 * * * * ‘Columbus’ 200 503444 select CustomerNo, avg(Price) from Sales where SalespersonID = 22 Group by CustomerNo Result(c, v/n) Cube(Sales) (SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue) 22 c * * * nv Decision Support Systems

  41. Cube Implementation by Materialized Views • Dimensions may have hierarchies. • Product, Category • City, State, Region Decision Support Systems

  42. Example: Materialized Views Cube(Sales) (SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue) City (CityName, State, Region) insert into SalesV1 select SalespersonID, CustomerNo, Month, State sum(Quantity) as Quantity, sum(TotalValue) as TotalValue from Sales join City on Sales.CityName = City.CityName group by SalespersonID, CustomerNo, Month, State; insert into SalesV2 select SalespersonID, CustomerNo, Month, Region sum(Quantity) as Quantity, sum(TotalValue) as TotalValue from Sales join City on Sales.CityName = City.CityName group by SalespersonID, CustomerNo, Month, Region; Decision Support Systems

  43. Example: Query 1 select SalespersonID, sum(TotalValue) from Sales group by SalespersonID; Answer by select SalespersonID, sum(TotalValue) from SalesV1 group by SalespersonID; or by select SalespersonID, sum(TotalValue) from SalesV2 group by SalespersonID; Decision Support Systems

  44. Example: Query 2 select SalespersonID, State, sum(TotalValue) from Sales group by SalespersonID, State; Answer only by select SalespersonID, State, sum(TotalValue) from SalesV1 group by SalespersonID, State; Decision Support Systems

  45. Example: Query 3 select SalespersonID, State, date, sum(TotalValue) from Sales group by SalespersonID, State, Date; Cannot be answered by either SalesV1 or SalesV2. Thus must use Sales itself. Decision Support Systems

  46. Lattice of Views All All Years Region Quarters State Weeks Months Days City Decision Support Systems

  47. Lattice of Materialized Views and Queries Q1 Q2 Q3 SalesV1 SalesV2 Sales Decision Support Systems

  48. OLAP ExampleGarcia-Molina, Ullman & Widom, Database System Implementation, Prentice Hall, 2000 • Automobile Sales Company: analyze sales of cars • Sales(serialNo, date, dealer, price) • Autos(serialNo, model, color) • Dealers(name, city, state) • Days(day, week, month, year) • ( 5, 27, 7, 2000) Fact Table Dimension Tables Time Dimension Table, probably not stored Decision Support Systems

  49. Assume a particular car model, say ‘Gobi’, is not selling as well as anticipated.How to analyze? • Maybe it’s the color. Slice for ‘Gobi. Dice for color. • select color, sum(price) • from Sales natural join Autos • where model = ‘Gobi’ • group by color; • Doesn’t show anything interesting. Decision Support Systems

  50. Gobi analysis, continuing • What about time? • Drill downfor month. • select color, month, sum(price) • from Sales natural join Autos • join Days on date = day • where model = ‘Gobi’ • group by color, month; • Suppose we discover red Gobis have not sold well recently. Decision Support Systems

More Related