160 likes | 266 Views
Data Warehousing. Willem Visser RW334. Somebody is watching!. Everybody seems to be recording your every move Loyalty cards Cookies Facebook, Twitter,… Check out Collusion plug-in for Firefox They want to know how to market to you Same is true in business
E N D
Data Warehousing Willem VisserRW334
Somebody is watching! • Everybody seems to be recording your every move • Loyalty cards • Cookies • Facebook, Twitter,… • Check out Collusion plug-in for Firefox • They want to know how to market to you • Same is true in business • Know your data, know your business
Data Warehousing • Integrated repository of data to understand your business • Separate from the Operational Database • Supports decision making • Subject oriented • Time variant • Non-volatile
Features • Only necessary data to allow modeling and decision making • Coming from potentially many sources • Time component • Even though operationally there might not be • Data doesn’t change after loading • No operational updates • Periodic refresh
Operational vs Warehouse • Operational • Optimized for on-line transactional processing • OLTP • Warehouse • Optimized for online analytic processing • OLAP • Complex queries • Very large volumes
Data cube • Multi dimensional data • Not just 3D (but mostly shown as such)
Lattice of Cuboids all 0-D(apex) cuboid country product date 1-D cuboids product,date product,country date, country 2-D cuboids 3-D(base) cuboid product, date, country • How much of the cube is materialized before the query: • Full (complete cuboid) • None (materialized on the fly) • Partial Slide by Dr. Hany Saleeb
OLAP • “Querying and presenting text and numeric data from data warehouses in a dimensional cube-style” • Slicing a dimension:Per region, per product, per period • Drill-down:Country Region Town Suburb • Drill-up, drill-around, etc. • Visualization Slide by Cor Winkler
other sources Extract Transform Load Refresh Operational DBs Multi-Tiered Architecture Monitor & Integrator OLAP Server Metadata Analysis Query Reports Data mining Serve Data Warehouse Data Marts Data Sources Data Storage OLAP Engine Front-End Tools Slide by Dr. Hany Saleeb
Steps • Data extraction: • get data from multiple, heterogeneous, and external sources • Data cleaning: • detect errors in the data and rectify them when possible • Data transformation: • convert data from legacy or host format to warehouse format • Load: • sort, summarize, consolidate, compute views, check integrity, and build indices and partitions • Refresh • propagate the updates from the data sources to the warehouse
SALES FACT TABLE PRODUCT DIMENSION TIME DIMENSION time_key (FK) product_key (FK) store_key (FK) promo_key (FK) dollars units cost product_key (PK) SKU description brand category package_type size flavor time_key (PK) SQL_date day_of_week week_number month PRODUCT STORE DIMENSION store_key (PK) store_ID store_name address district region PROMOTION DIMENSION promotion_key (PK) promotion_name promotion_type price_treatment ad_treatment display_treatment coupon_type Total Cost $ 1,058 $ 2,200 $ 650 $ 1,848 $ 2,350 $ 580 District Atherton Atherton Atherton Belmont Belmont Belmont Brand Clean Fast More Power Zippy Clean Fast More Power Zippy Total Dollars $ 1,233 $ 2,239 $ 848 $ 2,097 $ 2,428 $ 633 Gross Profit $ 175 $ 39 $ 198 $ 249 $ 78 $ 53 Star-Schema Example Slide by Cor Winkler
Data Warehouse Usage • Three kinds of data warehouse applications • Information processing • supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs • Analytical processing • multidimensional analysis of data warehouse data • supports basic OLAP operations, slice-dice, drilling, pivoting • Data mining • knowledge discovery from hidden patterns • supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools. • Differences among the three tasks Slide by Dr. Hany Saleeb
Information Dashboards Slide by Cor Winkler
Information ExploitationBusiness Intelligence (BI) R Value of decision Intelligent agents Make it happen!Automatic response to business triggers Data Mining What might happen? Obscure data relationships and trends Analysis Why it happened? Dynamic slice&dice Complexity → Reporting What happened? Historical info ← # of users Complexity → Slide by Cor Winkler