1 / 18

Apache MADlib AI/ML

This presentation gives an overview of the Apache MADlib AI/ML project. It explains Apache MADlib AI/ML in terms of it's functionality, it's architecture, dependencies and also gives an SQL example. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/

semtechs
Download Presentation

Apache MADlib AI/ML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Is Apache MADlib ? ● For scalable in-database analytics ● Open source Apache 2.0 license ● For machine learning in SQL ● At big data scale ● Offers graph, statistics, analytics, deep learning ● Provides data-parallel implementations ● For structured and unstructured data

  2. MADlib Prerequisites ● Currently supports databases – PostgreSQL ●Needs Python extension specified – Greenplum (distributed db) – Apache Hawq ( v1.12+ ) (distributed db) ● Requires the GNU M4 Unix macro processor ● Works with Python 2.6 and 2.7

  3. MADlib Architecture

  4. MADlib Architecture ● MADlib has three main layers ● Python driver functions – Main entry point from user input – Largely responsible for algorithm flow control – Validating input parameters – Executing SQL statements – Evaluating the results – Potentially looping to execute more SQL statements ●Until some convergence criteria has been hit

  5. MADlib Architecture ● MADlib has three main layers ● C++ implementations functions – C++ definitions of the core functions/aggregates ●Needed for particular algorithms – Implemented in C++ rather than Python ●For performance reasons

  6. MADlib Architecture ● MADlib has three main layers ● C++ database abstraction layer – Provide a programming interface – Abstracts all the Postgres internal details – Provides support for different back end platforms – Focuses on the internal functionality ●Rather than the platform integration logic

  7. MADlib Data Types and Transformations ● Arrays and Matrices ● Encoding Categorical Variables ● Path ● Pivot ● Sessionize ● Stemming

  8. MADlib Graph Functionality ● All Pairs Shortest Path ● Breadth-First Search ● HITS ● Measures ● PageRank ● Single Source Shortest Path ● Weakly Connected Components

  9. MADlib Model Selection / Sampling ● Model Selection – Cross Validation – Prediction Metrics – Train-Test Split ● Sampling – Balanced Sampling – Stratified Sampling

  10. MADlib Statistics / Supervised Learning ● Statistics – Descriptive Statistics – Inferential Statistics – Probability Functions ● Supervised Learning – Conditional Random Field – k-Nearest Neighbors – Neural Network – Regression Models – Support Vector Machines – Tree Methods

  11. MADlib Time Series / Unsupervised Learning ● Time Series Analysis – ARIMA ● Unsupervised Learning – Association Rules – Clustering – Dimensionality Reduction – Topic Modelling

  12. MADlib Utilities ● Columns to Vector ● Database Functions ● Linear Solvers ● Mini-Batch Preprocessor ● PMML Export ● Term Frequency ● Vector to Columns

  13. MADlib Deep Learning Example SQL ● First define the model configurations to train ● Meaning either model architectures or hyperparameters ● Load them into a model selection table ● The combination of model architectures and hyperparameters ● Constitutes the model configurations to train ● In the picture there are three model configurations ● Represented by the three different purple shapes

  14. MADlib Deep Learning Example SQL

  15. MADlib Deep Learning Example SQL ● Once we have model combinations ● In the model selection table ● Call the fit function to train the models – In parallel. ● In the picture the three orange shapes ● Represent the three models that have been trained

  16. MADlib Deep Learning Example SQL

  17. Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –

  18. Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

More Related