1 / 40

Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr Steven Parker, Standa

Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr Steven Parker, Standard Chartered . The Art and Science of Data Mining. Y V Hui City University of Hong Kong. The Driving Forces. Specialization and focus in business

jeneil
Download Presentation

Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr Steven Parker, Standa

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr Steven Parker, Standard Chartered Knowledge Discovery Centre: CityU-SAS Partnership

  2. The Art and Science of Data Mining Y V Hui City University of Hong Kong Knowledge Discovery Centre: CityU-SAS Partnership

  3. The Driving Forces • Specialization and focus in business - To satisfy the needs of customers - To improve and develop specific business strategies and processes - Personalization through mass customization Knowledge Discovery Centre: CityU-SAS Partnership

  4. The Driving Forces • Challenges - local and global competition - distributed business operations - product innovation • Technology development • Benefit, cost and risk on a product or customer basis Knowledge Discovery Centre: CityU-SAS Partnership

  5. Data Mining • Also known as knowledge discovery in databases. Data mining digs out valuable information from large and messy data. (Computer scientist’s definition) • Data mining is a knowledge discovery process. It’s the integration of business knowledge, people, information, statistics and computing technology. Knowledge Discovery Centre: CityU-SAS Partnership

  6. Data Mining is Hot • Ten Hottest Job, Time, 22 May, 2000 • 10 emerging areas of technology, MIT’s Magazine of Technology Review, Jan/Feb, 2001 Knowledge Discovery Centre: CityU-SAS Partnership

  7. Data Mining Philosophy • A powerful enabler of competitive advantage. • Data mining is driven from business knowledge. • Data mining is about enabling people to discover actionable information about their business. • Return of profit isn’t about algorithms Knowledge Discovery Centre: CityU-SAS Partnership

  8. Scope of Data Mining Management’s Decision World Data Miner’s Analytical World Interface Business outlook Industry conditions Product offering Customer analysis Strategic options Competitive actions etc Problem development and management Reporting and evaluations Project design Data collection and preparation Model building Validation Knowledge Discovery Centre: CityU-SAS Partnership

  9. Project Management • Cross-functional team • System architecture Knowledge Discovery Centre: CityU-SAS Partnership

  10. Successful applications • Business transaction - risks and opportunities • Customer relationship management - personalization, target marketing • Electronic commerce & web - web mining Knowledge Discovery Centre: CityU-SAS Partnership

  11. Successful applications • Science & engineering • Health care • Multi-media • Others Knowledge Discovery Centre: CityU-SAS Partnership

  12. Data Mining Process Understanding of business Problem identification Knowledge Discovery Centre: CityU-SAS Partnership

  13. Understanding Your Business • Do we have a problem? - What is the current situation? Are there any undesirable situations that need attention? - Are there any conditions, processes, etc, that could be improved? - Are any problems foreseeable that could affect the business? - Are there any potential opportunities that the company may capitalize on? A problem is a learning opportunity Knowledge Discovery Centre: CityU-SAS Partnership

  14. Understanding Your Problem • Operational or analytical • Convention rule or knowledge discovery • Product based or customer based • Market research or data mining • Ownership of the information • Privacy • Added value Knowledge Discovery Centre: CityU-SAS Partnership

  15. Data Mining Process Collecting relevant information Understanding of business Problem identification Knowledge Discovery Centre: CityU-SAS Partnership

  16. Collecting Relevant Information • Data Search • Data Collection • Data Preparation • Data Mining Database Knowledge Discovery Centre: CityU-SAS Partnership

  17. Data Search • Exploring the problem space. Don’t let the data drive the problem. • Measurement • Exploring the data sources Knowledge Discovery Centre: CityU-SAS Partnership

  18. Data Collection • Data retrieval • Data audit • Data set assembly and data warehouse • Survey Knowledge Discovery Centre: CityU-SAS Partnership

  19. Data Preparation • Data representation • Data exploration • Data normalization • Data transformation • Imputation of missing data • Data tuning Knowledge Discovery Centre: CityU-SAS Partnership

  20. Data Mining Database • Variable selection • Record selection • Data set partition Knowledge Discovery Centre: CityU-SAS Partnership

  21. Data Mining Process Learning Collecting relevant information Model building Understanding of business Problem identification Knowledge Discovery Centre: CityU-SAS Partnership

  22. Model Building • Model based vs non-model based y1,y2,…,yp=f(x1, …, xq) Inputs Outputs y1, …, yp x1, …, xq Knowledge Discovery Centre: CityU-SAS Partnership

  23. Model Building • Parametric vs nonparametric Knowledge Discovery Centre: CityU-SAS Partnership

  24. Model Building • Estimation vs trial and error • Directed vs undirected • Multidimensional analysis • Large data set vs small data set Knowledge Discovery Centre: CityU-SAS Partnership

  25. Data Mining Algorithms Online Analytical Processing Discovery Driven Methods Description Prediction SQL Query Tools Classification Regressions Visualization Decision Trees Clustering Neural Networks Association Sequential Analysis Knowledge Discovery Centre: CityU-SAS Partnership

  26. Online Analytical Processing • Query and reporting Example of SQL query: How many credit-card customers who made purchases of over $1,000 on sporting goods in December have at least $20,000 of available credit? • Manual and validation driven Knowledge Discovery Centre: CityU-SAS Partnership

  27. Estimation and Prediction • Statistical models • Neural network Example: Housing price valuation model Knowledge Discovery Centre: CityU-SAS Partnership

  28. Classification Algorithms • Statistical techniques • Neural networks • Genetic algorithms • Nearest neighbor method • Rule induction and decision tree Example: Customer segmentation and buying behavior description Knowledge Discovery Centre: CityU-SAS Partnership

  29. Association Rules • Apriori algorithm Example: Market basket analysis, cross selling analysis Knowledge Discovery Centre: CityU-SAS Partnership

  30. Sequential Analysis • Count-all algorithm • Count-some algorithm Example: Attached mailing, add-on sales Knowledge Discovery Centre: CityU-SAS Partnership

  31. Algorithms Comparison • No single data mining algorithm can outperform any other. Try different algorithms and draw conclusions from the results. Use your business knowledge. • Neural networks do no better than statistical models when the underlying structure is known. However, neural networks detect hidden interactions and nonlinearity. Use the prior information if available. Knowledge Discovery Centre: CityU-SAS Partnership

  32. Algorithms Comparison • Data mining algorithms cannot handle dependent records. Use the prior information. Statistical models help. • Data tuning and dimension reduction enhance data mining before and after the analysis. Statistical techniques help. Knowledge Discovery Centre: CityU-SAS Partnership

  33. Data Mining Process Learning Collecting relevant data Model building Understanding of business Problem identification Business strategy and evaluation Action Knowledge Discovery Centre: CityU-SAS Partnership

  34. Trends that Effect Data Mining • Data trends - data explosion - data types Knowledge Discovery Centre: CityU-SAS Partnership

  35. Trends that Effect Data Mining • Hardware trends - memory - processing speed - storage Knowledge Discovery Centre: CityU-SAS Partnership

  36. Trends that Effect Data Mining • Network trends - network connectivity - distributed databases • Wireless communication Knowledge Discovery Centre: CityU-SAS Partnership

  37. Trends that Effect Data Mining • Scientific computing trends - theory, experiment and simulation Knowledge Discovery Centre: CityU-SAS Partnership

  38. Trends that Effect Data Mining • Business trends - total quality management, - customer relationship management, - business process reengineering, - enterprise resources planning, - supply chain management, - business intelligence and knowledge management, - e – business and m – business Knowledge Discovery Centre: CityU-SAS Partnership

  39. Trends that Effect Data Mining • Privacy and Security Knowledge Discovery Centre: CityU-SAS Partnership

  40. Pot of Gold • The benefits of knowing one’s business and customers become so critical that technologies are coming together to support data mining. • Data mining is not a cybernetic magic that will turn your data into gold. It’s the process and result of knowledge production, knowledge discovery and knowledge management. Knowledge Discovery Centre: CityU-SAS Partnership

More Related