1 / 18

Cube Intro

Cube Intro. OLAP. Term coined in mid-1990’s Main goal: support ad-hoc but complex querying performed by business analysts Interactive process of creating, managing, analyzing and reporting on data Extends spreadsheet-like analysis to work with huge amounts of data in a data warehouse.

thora
Download Presentation

Cube Intro

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cube Intro

  2. OLAP • Term coined in mid-1990’s • Main goal: support ad-hoc but complex querying performed by business analysts • Interactive process of creating, managing, analyzing and reporting on data • Extends spreadsheet-like analysis to work with huge amounts of data in a data warehouse

  3. Decision Making • Effective decision • Choice that moves an organization closer to an agreed-on set of goals in a timely manner • Needs supports (Figure 1-1) • Goals – needs to be specific • Not to hit an iceberg – for Titanic or other boats • Measures – needs to be concrete • Away from any visible iceberg • Even better, say, 2 miles away from any visible iceberg • Feedback – needs to be timely and accurate • We hit an iceberg, rather than saw one – not timely enough • The boat can sink, actually, again too late to find out

  4. BI – Business Intelligence Is the delivery of accurate, useful information to the appropriate decision makers within necessary timeframe to support effective decision making, supported with predictions on what will likely be happening It is a superset of what can be provided by data mining SQL Server 2012 is a decent tool for this.

  5. Known vs. Unknown Questions • When using SQL statements or even Cubes, we know the questions and can formulate the SQL statements (most of the time) to dig out answers to the questions • Layout let answers – SQL • Data-led answers – the drill down in a cube dimension (if NW sales figure is low, which state is the main contributor of that?) • When we do not know what to ask, we use data mining – book’s definition is less accurate

  6. BI at different levels • Figures 2-1 to 2-3 • Author emphasized on cubes • The higher the level, the more like the use of data mining rather than cubes for historical data • KPIs (Key Performance Indicators) • What would be an Amazon’s possible top level KPI? • What would be an AT&T’s (the cell phone part of it) possible top level KPI? • Reading assignment • Read about the Maximum Miniatures, INC (page 19-22, no due date, but read before next quiz)

  7. What should be the sources • OLTP is the source • On line transaction Process • These are the things we have learned in CS420/IS420 • But OLTP, the direct data store is not suitable for BI because • Not to fight for computing resources of OLTP systems • OLTP may not have all the history readily available • BI may have to use data from several OLTP systems • The solution is generally a Data Mart

  8. Data marts Data marts are related sets of data that are grouped together and separated out from the main body of data. It can be a repository of OLTP data or portion of data from a data warehouse Data in a Data Mart are not meant for supporting real time operations Data in a Data mart are generally have gone through the data clearing process

  9. More on OLAP/Cube • Place key performance indicators (measures) into context (dimensions) • measures are pre-aggregated • data retrieval is significantly faster • modeled in a DW • The processed “cube” is made available to business analysts who can browse the data using a variety of tools, making ad hoc interactive and analytical processing

  10. OLAP versus OLTP

  11. OLAP versus OLTP

  12. Measures and dimensions • Measures: key performance indicators, such as sales amount, that you would like to evaluate • typically numerical, including volume, sales, and costs • a rule of thumb: if a number makes (business/social) sense when aggregated, then it is a measure • examples: • aggregate daily volume to month, quarter and year • aggregating telephone numbers would not make sense therefore, telephone numbers are not measures • postcode/Zip code: not a measure, but can be a dimension (?) • Totals in all the transactions would be a measure • Number of high school students by school, county, state should be a measure

  13. Measures and dimensions • Dimensions: categories of data analysis typical dimensions include product, time, region • a rule of thumb: when a report is requested to be ordered "by“ something, that something is usually a dimension • e.g. in a sales report: view sales by month, by region, so the two dimensions needed are time and region

  14. Star Schema • Dimensions and measures are physically represented by a star schema (typically) • arrange the dimension tables around a central fact table that contains the measures • a fact table contains a column for each measure as well as a column for each dimension • Dimensions are stored as keys of dimension tables

  15. Example • Dimension tables • Market (Store_ID, City, Region) • Product (Product_ID, Name, Category, Price) • Time (Time_ID, Week, Month, Quarter, Year) • Fact table • Sales (Store_ID, Product_ID, Time_ID, Amount) • Notes • Amount should be the measure • Time_ID is preprocessed to show at week or month level • This can be modified to show Snowflakes (month-quarter-year)

  16. OLAP Operations • Roll-up • Drill-down • Pivoting • slice/dice • slicing: selection on one or more dimensions, possibly with some dimensions projected out • For example, Sales in Oregon • dicing: a range selection in a cube (partition or group on one or more dimensions (e.g., dealers by state and cars by color) • For example, sales by region • Slicing and dicing – combining the two

  17. Example • supermarket sales • roll-up: summaries over all cities, regions • slice the cube to select sales only in day 3 • dice the cube to select sales only in week 2 (days 8-14), and group by regions

  18. MS OLAP Extensions

More Related