Introducing apache mahout
Download
1 / 29

Introducing Apache Mahout - PowerPoint PPT Presentation


  • 296 Views
  • Uploaded on

Introducing Apache Mahout. Scalable Machine Learning for All! Grant Ingersoll Lucid Imagination. Overview. What is Machine Learning? Mahout. Definition. “Machine Learning is programming computers to optimize a performance criterion using example data or past experience”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Introducing Apache Mahout' - tekli


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introducing apache mahout

Introducing Apache Mahout

Scalable Machine Learning for All!

Grant Ingersoll

Lucid Imagination


Overview
Overview

What is Machine Learning?

Mahout


Definition
Definition

“Machine Learning is programming computers to optimize a performance criterion using example data or past experience”

Intro. To Machine Learning by E. Alpaydin

Subset of Artificial Intelligence

Many other fields: comp sci., biology, math, psychology, etc.


Types
Types

Supervised

Using labeled training data, create function that predicts output of unseen inputs

Unsupervised

Using unlabeled data, create function that predicts output

Semi-Supervised

Uses labeled and unlabeled data


Characterizations
Characterizations

Lots of Data

Identifiable Features in that Data

Too big/costly for people to handle

People still can help


Clustering
Clustering

Unsupervised

Find Natural Groupings

Documents

Search Results

People

Genetic traits in groups

Many, many more uses


Example clustering
Example: Clustering

Google News


Collaborative filtering
Collaborative Filtering

Unsupervised

Recommend people and products

User-User

User likes X, you might too

Item-Item

People who bought X also bought Y



Classification categorization
Classification/Categorization

Many, many types

Spam Filtering

Named Entity Recognition

Phrase Identification

Sentiment Analysis

Classification into a Taxonomy


Example ner
Example: NER

NER?

Excerpt from Yahoo News



Info retrieval
Info. Retrieval

Learning Ranking Functions

Learning Spelling Corrections

User Click Analysis and Tracking


Other
Other

Image Analysis

Robotics

Games

Higher level natural language processing

Many, many others


What is apache mahout
What is Apache Mahout?

A Mahout is an elephant trainer/driver/keeper, hence…

(and other distributed techniques)

+

Machine Learning

=


What?

Hadoop brings:

Map/Reduce API

HDFS

In other words, scalability and fault-tolerance

Mahout brings:

Library of machine learning algorithms

Examples


Why mahout
Why Mahout?

Many Open Source ML libraries either:

Lack Community

Lack Documentation and Examples

Lack Scalability

Lack the Apache License ;-)

Or are research-oriented


Why mahout1
Why Mahout?

Intelligent Apps are the Present and Future

Thus, Mahout’s Goal is:

Scalable Machine Learning with Apache License


Current status
Current Status

What’s in it:

Simple Matrix/Vector library

Taste Collaborative Filtering

Clustering

Canopy/K-Means/Fuzzy K-Means/Mean-shift/Dirichlet

Classifiers

Naïve Bayes

Complementary NB

Evolutionary

Integration with Watchmaker for fitness function


How?

Examples

Taste

Clustering

Classification

Evolutionary


Taste movie recommendations
Taste: Movie Recommendations

Given ratings by users of movies, recommend other movies

http://lucene.apache.org/mahout/taste.html#demo


Taste demo

http://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=truehttp://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=true

http://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=43&debug=true

Taste Demo


Clustering synthetic control data
Clustering: Synthetic Control Datahttp://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=true

http://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series

Each clustering impl. has an example Job for running in <MAHOUT_HOME>/examples

o.a.mahout.clustering.syntheticcontrol.*

Outputs clusters…


Classification nb and cnb examples
Classification: NB and CNB Exampleshttp://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=true

20 Newsgroups

http://cwiki.apache.org/confluence/display/MAHOUT/TwentyNewsgroups

Wikipedia

http://cwiki.apache.org/confluence/display/MAHOUT/WikipediaBayesExample


Evolutionary
Evolutionaryhttp://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=true

Traveling Salesman

http://cwiki.apache.org/confluence/display/MAHOUT/Traveling+Salesman

Class Discovery

http://cwiki.apache.org/confluence/display/MAHOUT/Class+Discovery


What s next
What’s Next?http://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=true

More Examples

Winnow/Perceptron (MAHOUT-85)

Text Clustering

Association Rules (MAHOUT-108)

Logistic Regression

Solr Integration (SOLR-769)

GSOC


When who
When, Whohttp://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=true

When? Now!

Mahout is growing

Who? You!

We want programmers who:

Are comfortable with math

Like to work on hard problems

We want others to:

Kick the tires


Where
Where?http://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=true

  • http://lucene.apache.org/mahout

    • Hadoop - http://hadoop.apache.org

  • http://cwiki.apache.org/MAHOUT

  • mahout-{user|dev}@lucene.apache.org

    • http://www.lucidimagination.com/search/p:mahout


Resources
Resourceshttp://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=true

“Programming Collective Intelligence” by Segaran

“Data Mining - Practical Machine Learning Tools and Techniques” by Witten and Frank

“Taming Text” by Ingersoll and Morton


ad