1 / 15

Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout

Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout. Saptak Sen, Microsoft Bill Ramos, Advaiya. Agenda. Overview of predictive analytics & data m ining How Microsoft supports predictive analytics How Mahout fits into the picture Demos. Data Mining.

turi
Download Presentation

Big Data Analytics Module 4 – Data Mining and Predictive Analytics Including Mahout

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data AnalyticsModule 4 – Data Mining and Predictive Analytics Including Mahout Saptak Sen, Microsoft Bill Ramos, Advaiya

  2. Agenda • Overview of predictive analytics & data mining • How Microsoft supports predictive analytics • How Mahout fits into the picture • Demos

  3. Data Mining

  4. Predicting future performance from historical data Predictive analytics should address the likelihood of something happening in the future, even if it is just an instant later* Legal discovery and document archiving IT infrastructure and web app optimization Social network analysis Recommenda-tion engines Churn analysis Weather forecasting for business planning Location-based tracking and services Personalized Insurance Advertising analysis Fraud detection Equipment monitoring Pricing analysis • *Source: Ventana Research, Predictive Analytics Benchmark Research Report, March 2012.

  5. Data mining tool in SQL Server Analysis Services • Rich data mining algorithms, for clustering, classification, forecasting through time series analysis, and more • Rich developer experience

  6. Analysis Services Data Mining Algorithms Classify Estimate Cluster Forecast Associate • Decision Trees • Logistic Regression • Naïve Bayes • Neural Networks • Decision Trees • Linear Regression • Logistic Regression • Neural Networks • Clustering • Time Series • Association Rules • Decision Trees

  7. Data mining add-in for Excel • Ease of use through Excel • Rich data mining algorithms for clustering, prediction, forecasting, market basket analysis, and more • Scalable through integration with SSAS

  8. Algorithms: Data Mining Add-in for Excel

  9. Demo 1: Excel Data Mining Add-In Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Microsoft Excel(Mining Add-in) Excel Data Mining Add-in Flat files (.txt, .dat, .xlsx, etc.) Microsoft Excel

  10. Mahout

  11. Mahout Applications • Scalable machine learning algorithms on Hadoop platform • Algorithms for clustering, classification, and batch-based collaborative filtering using the map/reduce paradigm • Supports a wide range of use cases—from email spam filtering, to fraud detection, to recommendations for books or movies Examples Regression Genetic Dimension Reduction Matrices Pattern Mining Classification Collocations Vector Similarity Recommenders Clustering

  12. Demo 2: Mahout Batch Layer Speed Layer Serving Layer Windows Azure HDInsight HDInsight Consoles Convert to Mahout input Flat files (.txt, .dat, .xlsx, etc.) Running Mahout job on Hadoop Command Window to get output file Output file Hadoop Command Window

  13. Learn more • Data Mining SSAS http://msdn.microsoft.com/en-us/library/bb510516.aspx • Microsoft SQL Server 2012 SP1 Data Mining Add-ins for Microsoft Office 2013 • http://www.microsoft.com/en-us/download/details.aspx?id=35578. • Mahout on Windows Azure - Machine Learning Using Microsoft HDInsight http://social.technet.microsoft.com/wiki/contents/articles/15102.mahout-on-windows-azure-machine-learning-using-microsoft-hdinsight.aspx

  14. Questions?

More Related