330 likes | 694 Views
Introduction to SQL Server Data Mining. Nick Ward SQL Server & BI Product Specialist Microsoft Australia. Agenda. What is Data Mining? Why use Data Mining? Data Mining Tasks Data Mining Process SQL Server 2005 Data Mining Demonstration SQL Server 2005 Data Mining Discussion.
E N D
Introduction to SQL ServerData Mining Nick Ward SQL Server & BI Product Specialist Microsoft Australia
Agenda • What is Data Mining? • Why use Data Mining? • Data Mining Tasks • Data Mining Process • SQL Server 2005 Data Mining Demonstration • SQL Server 2005 Data Mining Discussion
What is not Data Mining? • Ad-Hoc Query • Event Notifications • Multidimensional Analysis/Slice Dice • Statistics • OLAP • Canned orad-hoc reports
What is Data Mining? • “Data mining is the semi-automatic extraction of patterns, changes, associations, anomalies, and other statistically significant structures from large data sets.” R. Grossman • Also known as • Machine Learning • Predictive Analytics
Why Data Mining? Disk Processor Time
Types of Analysis • Query-Reporting-Analysis • “What happened?” • Simple Reports • Key Performance Indicators • OLAP Cubes – Slice/Dice • Real-Time - “What is happening?” • Events/Triggers • Data Mining • “What will happen?” • “How/why did this happen?”
Explores Your Data Performs Predictions Finds Patterns Data Mining Tasks
Data To Predict Training Data Mining Model Mining Model Mining Model Data Mining Tasks DM Engine DM Engine Predicted Data
Customer Examples • ComputerFleet (Australia): Predict when hired equipment will be returned • Sanford Securities (Australia): Data mining automation • Clait Health Services: Identify patients likely to suffer deteriorating health for pro-active treatment • AIM Healthcare: Identify billing errors, duplicate payments etc. to minimize costs
Data Mining Tasks • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis
Data Mining Tasks • What type of membership card should I offer? • Which customers will respond to my mailing? • Is this transaction fraudulent? • Will I lose this customer? • Will this product be defective? • Why is my system failing? • Which patients health will degrade? • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis
Data Mining Tasks • How much revenue will I get from this customer? • How long will this asset be in service? • What is the mean time to failure? • What is the particle density of this fluid? • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis
Data Mining Tasks • Describe my customers • How can I differentiate my customers? • How can I organize my data in a manner that makes sense? • Is this record an outlier? • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis
Data Mining Tasks • What items are bought together? • Which services are used together? • What products should I recommend to my customers? • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis
Data Mining Tasks • What are projected revenues for all products? • What are inventory levels next month? • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis
Data Mining Tasks • Analysis of unstructured data • Finds key terms and phrases in text • Conversion to structured data • Feed into other algorithms • Classification • Segmentation • Association • How do I handle call center data? • How can I classify mail? • What can I do with web feedback? • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis
“Doing Data Mining” Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment “Putting Data Mining to Work” Data Mining ProcessCRISP-DM Data www.crisp-dm.org
Business Knowledge Data Mining OLAP Relative Business Value Reports (Adhoc) Reports (Static) Easy Difficult Usability Value of Data Mining SQL Server 2005
“Doing Data Mining” Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment “Putting Data Mining to Work” Data Mining ProcessCRISP-DM Data www.crisp-dm.org
Data Mining User Interface • SQL Server BI Development Studio • Creation and exploration environment • Data Mining projects inside Visual Studio solutions with related projects • Source Control Integration • SQL Server Management Studio • Single place for management of all SQL Server technologies • Manage, Browse, and Query Data Mining Models
Data Mining Algorithms • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis
Data Mining Algorithms • Decision Trees • Neural Nets • Naïve Bayes • Logistic Regression • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis
Data Mining Algorithms • Decision Trees • Neural Nets • Logistic Regression • Linear Regression • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis
Data Mining Algorithms • Clustering • Sequence Clustering • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis
Data Mining Algorithms • Association Rules • Decision Trees • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis
Data Mining Algorithms • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis • Time Series
Data Mining Algorithms • Classification • Estimation • Segmentation • Association • Forecasting • Text Analysis • Integration Services • Term Extraction Transform • Term Lookup Transform
Data Mining Programmability • DMX Query Interface • OLEDB, ADO, ADO.Net, ADOMD.Net, XMLA Dim cmd as ADOMD.Command Dim reader as ADOMD.DataReader Cmd.Connection = conn Set reader = Cmd.ExecuteReader(“Select Predict(Gender)…”) • Data Mining Object Model • Analysis Management Objects (AMO) • ADOMD.Net, Server ADOMD.Net • Direct access to Mining content • CLR User Defined Procedures execute on the server • Expandability • Plug-In Algorithms • Plug-In Viewers
Session Summary • Data Mining is the automatic extraction of information from data for descriptive or predictive purposes • Data Mining addresses a wide variety of problems • SQL Server 2005 contains a full-featured set of data mining tools and API’s for the creation and deployment of data mining solutions.
Next Steps • SQL Server website:http://www.microsoft.com/sql • Virtual labs • Data Mining Tutorial • Find more info at: http://www.sqldatamining.com • Ask Questions: news:microsoft.public.sqlserver.datamining