1 / 40

Business Intelligence Technologies – Data Mining

Business Intelligence Technologies – Data Mining . Lecture 1 Introduction. Agenda. Course Objectives Course Logistics Case discussion Introduction to BI Methods. Discuss where you see data, and how companies dealt with the data they have. Data is Everywhere. Data in our daily life

ciara
Download Presentation

Business Intelligence Technologies – Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Business Intelligence Technologies – Data Mining Lecture 1 Introduction

  2. Agenda • Course Objectives • Course Logistics • Case discussion • Introduction to BI Methods

  3. Discuss where you see data, and how companies dealt with the data they have.

  4. Data is Everywhere • Data in our daily life • Retailers • Manufacturers, supply chain • Financial services: credit card, credit score • Scientific data • remote sensors on a satellite, telescopes scanning the skies • gene expression data • scientific simulations generating terabytes of data • Surveillance camera • Insurance • Telecommunication: cell phone calls • Social networking • RFID video

  5. Course Objectives • How to uncover business intelligence from data. • Understand BI process • Learn popular BI methods • Master a data mining package • Connect with business problems

  6. Agenda • Course Objectives • Course Logistics • Case discussion • Introduction to BI Methods

  7. Course Logistics • Catherine Yang • yiyang@ucdavis.edu • Gallagher Hall, Room 3418 • 530-754-5967 • Office hours: • Walk-in • By appointment • Before and after class • Call me

  8. Class Resources • Class homepage: http://faculty.gsm.ucdavis.edu/~yiyang/teaching/269win2011/269win2011.htmlpost slides, additional articles, announcements, downloads • Text Book + Text Pak + Articles posted on class homepage

  9. Text Book Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, Second Edition Michael Berry and Gordon Linoff, 2004,  Wiley, ISBN 0471-470643 • Course Schedule, Due dates: • Open Syllabus

  10. Group Term Project • Group of 2-3 or individual • Identify a company to study • Focus: Data and Business Intelligence • Current practice • Your recommendations • Two phases • Phase 1: Describe the chosen company • Phase 2: Final report + class presentation

  11. Software • WEKA: free • Used for homework assignments • Support both Windows and Mac • I’ll demo WEKA in most classes. • Tutorial available on course website • Every student is recommended to have a copy in order to follow class demo. • Microsoft Access is optional

  12. Grading • 15% Participation • 3: Excellent • 2: Good • 1: OK • 0: Absent with good reason and advance notification • -3: Absent with no reason • 60% Homework • 6 assignments • Problem solving, data analysis and/or case discussion. • 25% Term Project • Phase 1 report --- 5% • Final report --- 15% • Class presentation --- 5%

  13. Misc. Issues • Slides are available before class • Download or print them before class • Lectures may be different from the text book • Some materials in the lectures may not be in the book, so please focus in class • The book is a great reference book, not a bible • Finish assigned case readings before each class • Attendance is required • In-class random cold call

  14. Agenda • Course Objectives • Course Logistics • Case discussion • Introduction to BI Methods

  15. Case 1: Bank of America • Discussion Questions: • What is BoA trying to achieve? • What are the alternative solutions? Pros and cons of each? • What are the stages of data mining? Describe each. • What are the data mining techniques used, and what are the findings from each technique?

  16. Case 2: A Wireless Company • Discussion Questions: • What is the company trying to achieve? • How can data mining help? • Where did data come from and How are data processed? • How is the data mining approach evaluated?

  17. Case 3: SUV • Discussion Questions: • What is the company trying to achieve? • How can data mining help? • What data files are used? What information are contained in these files? • How is the two data mining technique combined and why is it more powerful to combine?

  18. Agenda • Course Objectives • Course Logistics • Case discussion • Introduction to BI Methods

  19. Business Intelligence Technologies • Enabling Technologies • Simple data summary • Database queries • Data Warehouse tools • Statistics • Data Mining

  20. Simple Data Summary Histogram Distribution Average/Max/Min/Sum

  21. Data Tables in a Database Take the Following Database:

  22. Using database queries, we can get: The type of queries used to achieve the above: SELECT Description, Location, Sum(Quantity) FROM Purchases P, Product Pr, Store S WHERE P.ProdID=Pr.ProdID AND P.StoreID=S.StoreID GROUP BY Description, Location Other types of questions which can be answered using queries: Return the stores with >1m revenue/day. Rank the cities according to sales.

  23. Data Warehouse Tools • Managers often don’t know how to write complex database queries to retrieve desired information. • Requesting technical staff prevents managers to make quick decisions in this competitive world. • Data warehouse tools allow managers to view data in many ways without writing queries. • Data warehouse and OLAP are terms which are often used interchangeably. While data in a data warehouse is composed of the historical data of the organization stored for end user analysis, OLAP is a technology that enables a data warehouse to be used effectively for analysis using complex queries.

  24. Make Sure to Use the Right Dimension An analysis of the number of deaths per month revealed no patterns in data for a South African hospital. However, drilling down to deaths per hour revealed that, over the past 3 years, more people were dying on Wednesdays around 9am. The hospital subsequently discovered that the cleaning staff had been unplugging the life support machines to plug in the floor polishing equipment. (This is a true story.) 24

  25. Simpson’s Paradox • Simpson’s Paradox refers to the reversal of the direction of a comparison or an association when data from several groups are combined to form a single group. • This is caused by the different percentages in admission in the two tables - they really shouldn't be combined.

  26. Statistics & Data Mining Methods • Statistics • Correlation Analysis, Regression, Time series analysis • Data Mining Techniques • Aka. Business Analytics, business intelligence tech. • Data Mining aims to uncover previously unknown, valuable, and actionable patterns and trends. Output is generalized rules or (predictive or descriptive) models, induced from the data. • Association Rules (beer diaper) • Clustering (market segmentation) • Classification (whether a user will buy) • Others: Personalization, Link analysis (Google), Text mining

  27. What is data mining? • Informal definition: Finding patterns in data • More formal definition: Non-trivial process of identifying valid, novel, potentially useful, and understandable patterns in data • Business Intelligence: a process for increasing the competitive advantage of a business by intelligent use of available data in decision making. (one definition)

  28. What is a pattern? • Informal definition: Any structure that can be found in the data. e.g. • People with good credit ratings have fewer accidents • Risk = 0.93*prior_default + 0.23*num_cards –1.3* employed • On Friday nights male customers who buy diapers also tend to buy beer • Not every pattern is desirable • People with high income buy expensive cars

  29. Examples from Different Industries • My consulting projects: • Chinese Supermarket Promotion Planning • Auto Lead Price Prediction • Distribution Center • Newspaper (the Boston Globe) • Airlines issuing credit cards to learn more about customers (do they travel a lot, do they use competitors’ product). • Financial market (Neural fair value) • Pfizer pharmaceuticals: • Construct a predictive model which tells patients their cholesterol risk score. High risk patients can request Lipitor, Pfizer’s cholesterol medication. • Fidelity: • Cross selling, when a customer calls, know what other services to offer

  30. An example – Building online user profiles~ What data is needed? • Personal information, preferences & interests • Registration data, including demographic data • Customer ratings • Purchasing data • What was bought, when and where • Browsing & visitation data • Clickstream (Weblog files) • Build an integrated (3600) view of a customer • Collect customer data across all the communication channels

  31. Data Sources- Explicit vs. Implicit • Explicit: solicited from the user; easy to get but • Demographics, interests, etc. • Intrusive: inconvenience users • Misleading/deceptive: inaccurate information provided (inadvertently or on purpose) • Static: Preferences change over time • Implicit: collected automatically from touchpoints • Data based on users’ actions • Non-intrusive: transparent to users • Accurate/Factual: data “speaks” objectively (a hope) • Dynamic: Changes can be learned and included • Messy, need to figure out how to utilize these data • Privacy concerns

  32. Building Profiles Using Different Techniques • Factual information (simple summary, queries) • Demographic (e.g., name, address, age) • Behavioral (e.g., favorite type of book – adventure, largest transaction - $295) • Things learned from data (stat, data mining) • Rules, e.g., • If customer visits children’s book section of B&N from Amazon, she tends to go back soon • Sequences, e.g., • Usually, Joe visits page X, then Y, Z

  33. Steps for Data-driven Solutions • Finding information from data is not enough • Must respond to the information by taking actions • Turning: • Data into Information • Information into Action • Action into Value • Four-step process: 1, Identify the business problem 2, Analyze data to transform the data into actionable information 3, Act on the information 4, Measure the results

  34. 1, Identify the Business Problem • Business problems can often be big and vague • Data analysis tasks need to be more concrete • Sample business problems: • How to improve response rate to a direct marketing campaign? • Which ads to place on web pages in order to maximize ads revenue? • Understanding customer attrition/churn • Or more specific problems • What types of customers responded to our last campaign? • Where do the best customers live? • Are long waits in check-out lines a cause of customer attrition? • What products should be promoted with our XYZ product? • Another goal of this lecture is for you to think strategically about what business problems can be addressed using data.

  35. 2,Analyze Data to Transform it into Actionable Information • Success is making business sense of the data • Need to figure out the specific data analysis tasks used to address the business problems identified in the first step. • Deal with messy data • Don’t expect clean data. Data cleaning accounts for 70% of efforts • Consolidate data from different sources • Need to collect additional data? handle missing value • Transform data to the right format for analysis • Implementation problems: • What information different techniques can bring out from the data • What techniques to use? • How to use the techniques?

  36. 3, Take Action • Taking action is the whole purpose of data analysis • Now with discovered information from data, we have better informed decisions. • Examples • Select customers to target • Adjusting inventory levels • Rearrange products on the shelves • Customize products for different segments • Adjusting price level

  37. 4, Measure Results • Assess the impact of the action taken • Often overlooked, ignored, skipped • Planning for the measurement should begin when analyzing the business opportunity, not after it is “all over” • Assessment questions (examples): • Did this campaign do what we hoped? • Did some offers work better than others? • Lower cost, increase profit?

  38. Business Value of Data Companies invest in data-related hardware, software and services. How to quantify the return of the investment. Realize value in data by transforming data to information and information to action It is not always easy to quantity the exact value data provides.

  39. Data Driven Applications and Business Models Market Segmentation Personalization/product recommendation Google Capitol One BroadVision comScore Tricision

  40. Take-Away Messages Decisions should be supported by real data. Don’t assume, use real data to backup your decision to avoid risks. A lot can be learned from data. Innovative business strategies can be derived from data

More Related