1 / 20

OLAP & Data Warehouse

OLAP & Data Warehouse. Predefined reports. MIS 3500. Interactive data analysis. Operations’ data. Periodical transfers. Online Transaction Processing (OLTP): Querying Databases with 3NF tables. Online Analytical Processing (OLAP); Data warehousing; Data Mining.

lacey
Download Presentation

OLAP & Data Warehouse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OLAP & Data Warehouse Predefined reports MIS 3500 Interactive data analysis Operations’ data Periodical transfers Online Transaction Processing (OLTP): Querying Databases with 3NF tables Online Analytical Processing (OLAP); Data warehousing; Data Mining. Usually denormalized data. Flat files

  2. OLTP vs. OLAP

  3. Integrate data from different sources to get a larger picture of business Data aggregations (summaries on different dimensions) Ad hoc queries (support non-routine decision making) Statistical analysis (test hypotheses on relationships between pieces of data) Discover new relationships (data mining) Warehousing Goals

  4. Extraction, Transformation, and Transportation • Preparations performed on data Transform Transport Customers Extract Convert “Client” to “Customer” Apply standard product numbers Convert currencies Fix region codes Data warehouse: All data must be consistent. Transaction data from diverse systems.

  5. Three-Dimensional View of Data: Cube Category Customer Location Similar ideas used in crosstab query and pivot table. Sale Date

  6. Data Hierarchy Year Roll-up To get higher-level totals Levels Quarter Month Drill-down To get lower-level details Week Day

  7. Product Category Customer Location Star Design Dimension Table Dimension Table Hierarchical: Dimension tables can link only via fact table. Sale SaleDate SalePrice Quantity Fact Table Amount=SalePrice*Quantity Dimension Table Measures Amounts broken down by product category, period, and customer location.

  8. City CityID ZipCode City State Sale SaleID SaleDate EmployeeID CustomerID SalesTax Customer CustomerID Phone FirstName LastName Address ZipCode CityID Snowflake Design Network-like design: Dimension tables can link directly. Merchandise ItemID Description QuantityOnHand ListPrice Category OLAPItems SaleID ItemID Quantity SalePrice Amount

  9. Excel Pivot Table Reports Can place data in rows or columns. By grouping months, can instantly get quarterly or monthly totals.

  10. CUBE Option (SQL 99) SELECT Category, Month, Sum, GROUPING (Category) AS Gc, GROUPING (Month) AS Gm FROM … GROUP BY CUBE (Category, Month...) Category Month Amount Gc Gm Bird 1 135.00 0 0 Bird 2 45.00 0 0 … Bird (null) 32.00 0 0 Bird (null) 607.50 1 0 Cat 1 396.00 0 0 Cat 2 113.85 0 0 … Cat (null) 1293.30 1 0 (null) 1 1358.8 0 1 (null) 2 1508.94 0 1 (null) 3 2362.68 0 1 … (null) (null) 8451.79 1 1

  11. GROUPING SETS: Hiding Details SELECT Category, Month, Sum FROM … GROUP BY GROUPING SETS ( ROLLUP (Category), ROLLUP (Month), ( ) ) Category Month Amount Bird (null) 607.50 Cat (null) 1293.30 … (null) 1 1358.8 (null) 2 1508.94 (null) 3 2362.68 … (null) (null) 8451.79

  12. SQL RANK Functions SELECT Employee, SalesValue RANK() OVER (ORDER BY SalesValue DESC) AS rank DENSE_RANK() OVER (ORDER BY SalesValue DESC) AS dense FROM Sales ORDER BY SalesValue DESC, Employee; Employee SalesValue rank dense Jones 18,000 1 1 Smith 16,000 2 2 Black 16,000 2 2 White 14,000 4 3 DENSE_RANK does not skip numbers • Therefore, advances in SQL motivate DBMS vendors to support • OLAP and data warehousing.

  13. Data Mining • Goal: To discover unknown relationships in the data that can be used to make better decisions. • Exploratory analysis. • A bottom-up approach that scans the data to find relationships • Some statistical routines, but they are not sufficient • Statistics relies on averages • Sometimes the important data lies in more detailed pairs • Supervised by developer vs. unsupervised (self-organizing artificial neural networks)

  14. Common Techniques • 1. Classification/Prediction • 2. Association Rules/Market Basket Analysis • 3. Clustering

  15. 1. Classification(Prediction) • Purpose: “Classify” things that are causes and those that are effects. • Examples • Which borrowers/loans are most likely to be successful? • Which customers are most likely to want a new item? • Which companies are likely to file bankruptcy? • Which workers are likely to quit in the next six months? • Which startup companies are likely to succeed? • Which tax returns are fraudulent?

  16. Classification Process • Clearly identify the outcome/dependent variable. • Identify potential variables that might affect the outcome. • Use sample data to test and validate the model. • Regression/correlation analysis, decision tables and trees, etc.

  17. 2. Association/Market Basket • Purpose: Determine what events or items go together/co-occur. • Examples: • What items are customers likely to buy together? (Business use: Consider putting the two together to increase cross-selling.)

  18. Association Challenges • If an item is rarely purchased, any other item bought with it seems important. So combine items into categories. • Some relationships are obvious. • Burger and fries. • Some relationships are puzzling/meaningless. • Hardware store found that toilet rings sell well only when a new store first opened. But what does it mean?

  19. 3. Cluster Analysis • Purpose: Determine groups of people or some entities. • Examples • Are there groups of customers? (If so, we could target them; market segmentation) • Do the locations for our stores have elements in common? (If so, we can search for similar clusters for new locations.) • Do employees have common characteristics? (If so, we can hire similar, or dissimilar, people.) Large intercluster distance Small intracluster distance

More Related