1 / 16

UNDERSTANDING DATA MINING SOFTWARE II

UNDERSTANDING DATA MINING SOFTWARE II. Ekin Baykal Nikhil Brahmbhatt Jechand Chennupati Joel Edgeman Pushpendra Singh. INTRODUCTION. Data Mining CRISP-DM Model Teradata Warehouse Miner Show and Tell. INTRODUCTION.

jariath
Download Presentation

UNDERSTANDING DATA MINING SOFTWARE II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNDERSTANDING DATA MINING SOFTWARE II Ekin Baykal Nikhil Brahmbhatt Jechand Chennupati Joel Edgeman Pushpendra Singh

  2. INTRODUCTION • Data Mining • CRISP-DM Model • Teradata Warehouse Miner • Show and Tell

  3. INTRODUCTION • The best search engine on the internet indexes only 16% of the sites. In 1999 the internet contained over 15 terabytes of data. (Nature, 1999b) • The quantity of data in GenBank, the international repository for genome-sequences doubles every 14 months.  (Economist, 1999) • The 'Large Hadron Collider' at the CERN will generate 20 terabytes of test data each day, for the next 15 years.  (Nature, 1999a). Source: http://www.stt.nl/stt2_intl/projects/datm/datm.htm

  4. DATA MINING The process of identifying and interpreting intrinsic patterns in data to solve a business problem.

  5. HISTORICAL CHALLENGES • Lack of standards and business packaging • Inability of tools to scale up to the volumes of data • Noisy, missing, and faulty corporate data • Corporate warehousing have been slow to evolve • Databases designed for operational processing cannot scale up to voluminous analytical processing • Business doesn’t trust results that it can’t validate/understand • Data analysis and mining are typically niche oriented processes that exist outside of business processes.

  6. TODAY’S GROWING DEMAND • Technological advances in compute power and speed • Advanced data processing and management techniques • Greater user sophistication • BUT… • Most tools still work in their own proprietary environment • Most databases aren’t optimized for analytic processing. • Businesses haven’t integrated data mining and knowledge discovery into their workflow. • Lack of executive commitment

  7. Name, Addr., # Prod.s, Tot.$, #Yrs. Prop to buy Product X,Y,Z Prof. Score, Churn Score, Cluster ID Data Warehouse Data Mined Intelligence WHERE DOES MINING FIT? Data Warehouse Data Name, Addr., # Prod.s, Tot.$, #Yrs.

  8. TEST & DEPLOY USE- DEPLOY TO BUILD Test model can be deployed as: Code Database triggers Called module One-time report DW OLAP DSS Reports Operational Databases Develop Analytical Model WHERE DOES MINING FIT? The intelligence from the analysis is incorporated back into the warehouse in the form of scores, predictions, forecasts, and descriptions.

  9. SUCCESSFUL MINING • The right people, an integrated technological environment, good tools and sound business commitment. • To be successful, and profitable, it must a be a collaboration driven by the business, developed by mining analysts and supported by IT. • Good quality data • The right tools: IT and analysts work together to determine which tools work best within the technical architecture.

  10. THE ANALYTIC ROADMAP CUSTOMER MARKETING SALES Channel Analysis Sales Forecast Loyalty Buying Prop Target Marketing Cross-sell Strategy Best Campaign Mkt Basket Analysis Churn Prop Satisfact. Rep Profiling Best Practices Profitab. Lifetime Value Campaing Effectiv. Life Cycle Sequence Partner Profiling Bundling FINANCIAL PRODUCT EQUIPMENT Profitab Retention Supply/ Demand Price Point Analysis Inventory Analysis Shipper Profiling Loss Satisfac. Bundling New Product Projections Timeline Optimiz. Shipment Analysis Forecasting Lifetime Value Product Optimization Lifecycle Analysis Warehouse Optimization Maintenance Forecast

  11. DATA MINING SYSTEMS • Four generations of Data Mining Systems • First – Vector value data • Second – Databases & data warehouses • Third – Internets and Extranets • Fourth – Mobile & embedded computing devices Source: http://www.lac.uic.edu/~grossman/papers/esj-98.htm

  12. CRISP-DM MODEL • CRoss Industry Standard Process for Data Mining • Non-proprietary, documented, and freely available data model • Provides “Complete blueprint for conducting a data mining project” • Conceived by four leaders of the data mining market – Daimler-Benz, Integral Solutions, NCR, & OHRA

  13. CRISP-DM MODEL • Data Mining process organized into six phases • Business understanding • Data understanding • Data preparation • Modeling • Evaluation • Deployment

  14. CRISP-DM REFERENCE MODEL

  15. ActiveXTM Private Interface Client Platform : Windows NT 4.0 Windows 2000 Teradata ODBC Driver Teradata Warehouse Miner Graphical User Interface Teradata Warehouse Miner Interfaces TeraMinerTM Stats COM Interface Analytic Algorithm EXE Server Matrix Builder EXE Server Scoring & Evaluation EXE Server Visualization EXE Server 3rd party / NCR CRM applications Metadata Services Teradata Platform: MP-RAS Windows NT 4.0 Windows 2000 Teradata OLAP and Data Mining Assists Teradata RDBMS Version 2 Release 3.1 or later Teradata Data Dictionary Teradata Source Data Analytic Metadata Source: Teradata product documentation

  16. RESOURCES • Data Mining for Enterprise Solutions, Lelia Morrill, NCR Corporation, 2001 • The CRISP-DM Model: The New Blueprint for Data Mining, Colin Shearer, Journal of Data Warehousing, Vol. 5 No. 4, Fall 2000 (Abstract) • Data Mining (DATM), http://www.stt.nl/stt2_intl/projects/datm/datm.htm • Data Rich, Information Poor, http://www.eco.utexas.edu/~norman/BUS.FOR/course.mat/Alex/ • There's Gold in that Mountain of Data, Dan R. Greening, http://www.newarchitectmag.com/archives/2000/01/greening/ • Supporting the Data Mining Process with Next Generation Data Mining Systems, Robert Grossman, http://www.lac.uic.edu/~grossman/papers/esj-98.htm

More Related