Chapter 3 & 11 Data Management: Warehousing & Analysis
Data Management Topics • Problems with data and data mgmt • Data Warehouses • Data Analysis Methods
Value & Burden of Data • Data is ABSOLUTELY Necessary • For ALL organizational functions and systems • Source of information & knowledge creation efforts • Yet involves immense complexities! • What are some of these complexities? • Think of the origins and intended uses of data… • Why is there so much data available today? • Where does it all come from?
Data Management Problems • Data is fragmented in separate systems • What are some examples? • Data quality is often poor! Why? • Costs have been estimated in the Billions… • Mercedes database problems… • Decision makers make decisions from incomplete knowledge bases • Huge amounts of time are wasted trying to compile this data into usable form • Have you ever considered the cost of information?
What is a Data Warehouse? • Data warehouse provides a centralized data resource to meet needs of decision makers • Data is stored in a form accessible for processing such as DSS, KM • It is a process… • Includes Access, Retrieval, Analysis, and Visualization • … and it is an entity • Data is organized by subject or topic and then stored
Storage for a DW • A Physical Example: • One or more “rack units” • Each contains 500+TBytes • 200+ hard disk drives • 8+ host ports, 10+GB/sec • Cost: $200,000+ • A Logical Example…
Table: Complaints Call# 1 2 3 Name Date Problem Assigned_To Bob Jones 2/3/2002 Power Out Fred Meyers Pat Samuels 2/5/2002 Flickering Tom New …. …. A Relational Data Table
Table: Meter_Readings Table: Customers Key Key Table: Complaints Key Database is a Collection of Tables
A Data Warehouse is a Collection of Databases Data and Meta-Data about Databases SalesDB CustDB HRDB
Storing Data in a Warehouse • Video Clip: Walmart IS • Data problems must be fixed before storage! • There are 2 primary ways to store data • The first allows direct access to all (raw) data • The second contains processed/summarized data (most common) • What are potential tradeoffs of these approaches?
Data Analysis I: Data Mining • Find new patterns & relationships in large pools of data and infer rules from them • Rules guide decision making and forecasting • Typically yields five types of information • Classification, clustering, association, sequences, forecasting • What are examples of these? • Predict trends & behaviors • Use results of past promotions to identify most successful future targets for new promotion • What is the connection between diapers and beer?
Examples of Data Mining • AMEX mines data from 30M card holders • Purchase of a dress at Sacks 5th Ave might trigger a coupon to be included with bill for shoes at same store • Predicting which customers are likely to switch to competition based on purchase patterns • Credit Cards: Identifying fraudulent transactions • Banking: forecasting loan foreclosure rates • Manufacturing: predicting product failures • Healthcare: correlating demographics and illnesses (cancer clusters) • Broadcasting: which ads to run in which spaces • Video Clip: Govt Data Minining from CNN
Data Analysis II: OLAP • OnLine Analytical Processing • Multidimensional analysis to learn more about known patterns, trends, exceptions in company data • Involves analysis of hundreds of thousands of data items in complex (but known) relationships • Compares aggregate data in different perspectives • Largely retrospective in nature • Multidimensional storage more efficient than traditional database • How do OLAP and Data Mining fit together?
OLAP Examples • Company sells 4 products (nuts, bolts, washers, screws) in 3 regions (east, midwest, west) • Each aspect is a different dimension (prod, loc, act v proj sale) • Want to know actual sales by product for each region and compare them with projected sales (3D cube) for greater understanding • Office Depot reduced its inventory of PCs from 22 to 12 products, resulting in increased profits • Eliminated unnecessary inventory • Avoided markdowns on equipment that wasn’t selling • Remember current events: Lowes vs. Home Depot?? • Victoria’s Secret stock garment colors by a standard ratio (10 black : 1 ivory) • OLAP revealed that in certain markets, demand was the reverse of this ratio; tailoring to location, they increased sales
Data Analaysis III: Visualization • Managers and Systems Analysts like to see data in different graphical formats • Ability to quickly change graphical perspective • Dimensions: products, salespeople, geography • Measures: sales volume, net $, gross $, actual vs. forecast • Time: daily, weekly, monthly... • Visual analysis may reveal problems not yet detected with statistical analysis
Wrapping It All Up • Corporate data has been fragmented • Data Warehouses provide a centralized resource • Data Analysis Methods • OLAP • Data Mining • Visualization
For Next Time… • Chapter 3 Minicases: Sears & Dell • Next Time: Decision Support Systems