Introducing apache mahout
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Introducing Apache Mahout PowerPoint PPT Presentation


  • 189 Views
  • Uploaded on
  • Presentation posted in: General

Introducing Apache Mahout. Scalable Machine Learning for All! Grant Ingersoll Lucid Imagination. Overview. What is Machine Learning? Mahout. Definition. “Machine Learning is programming computers to optimize a performance criterion using example data or past experience”

Download Presentation

Introducing Apache Mahout

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Introducing apache mahout

Introducing Apache Mahout

Scalable Machine Learning for All!

Grant Ingersoll

Lucid Imagination


Overview

Overview

What is Machine Learning?

Mahout


Definition

Definition

“Machine Learning is programming computers to optimize a performance criterion using example data or past experience”

Intro. To Machine Learning by E. Alpaydin

Subset of Artificial Intelligence

Many other fields: comp sci., biology, math, psychology, etc.


Types

Types

Supervised

Using labeled training data, create function that predicts output of unseen inputs

Unsupervised

Using unlabeled data, create function that predicts output

Semi-Supervised

Uses labeled and unlabeled data


Characterizations

Characterizations

Lots of Data

Identifiable Features in that Data

Too big/costly for people to handle

People still can help


Clustering

Clustering

Unsupervised

Find Natural Groupings

Documents

Search Results

People

Genetic traits in groups

Many, many more uses


Example clustering

Example: Clustering

Google News


Collaborative filtering

Collaborative Filtering

Unsupervised

Recommend people and products

User-User

User likes X, you might too

Item-Item

People who bought X also bought Y


Example collab filtering

Example: Collab Filtering

Amazon.com


Classification categorization

Classification/Categorization

Many, many types

Spam Filtering

Named Entity Recognition

Phrase Identification

Sentiment Analysis

Classification into a Taxonomy


Example ner

Example: NER

NER?

Excerpt from Yahoo News


Example categorization

Example: Categorization


Info retrieval

Info. Retrieval

Learning Ranking Functions

Learning Spelling Corrections

User Click Analysis and Tracking


Other

Other

Image Analysis

Robotics

Games

Higher level natural language processing

Many, many others


What is apache mahout

What is Apache Mahout?

A Mahout is an elephant trainer/driver/keeper, hence…

(and other distributed techniques)

+

Machine Learning

=


Introducing apache mahout

What?

Hadoop brings:

Map/Reduce API

HDFS

In other words, scalability and fault-tolerance

Mahout brings:

Library of machine learning algorithms

Examples


Why mahout

Why Mahout?

Many Open Source ML libraries either:

Lack Community

Lack Documentation and Examples

Lack Scalability

Lack the Apache License ;-)

Or are research-oriented


Why mahout1

Why Mahout?

Intelligent Apps are the Present and Future

Thus, Mahout’s Goal is:

Scalable Machine Learning with Apache License


Current status

Current Status

What’s in it:

Simple Matrix/Vector library

Taste Collaborative Filtering

Clustering

Canopy/K-Means/Fuzzy K-Means/Mean-shift/Dirichlet

Classifiers

Naïve Bayes

Complementary NB

Evolutionary

Integration with Watchmaker for fitness function


Introducing apache mahout

How?

Examples

Taste

Clustering

Classification

Evolutionary


Taste movie recommendations

Taste: Movie Recommendations

Given ratings by users of movies, recommend other movies

http://lucene.apache.org/mahout/taste.html#demo


Taste demo

http://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=true

http://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=43&debug=true

Taste Demo


Clustering synthetic control data

Clustering: Synthetic Control Data

http://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series

Each clustering impl. has an example Job for running in <MAHOUT_HOME>/examples

o.a.mahout.clustering.syntheticcontrol.*

Outputs clusters…


Classification nb and cnb examples

Classification: NB and CNB Examples

20 Newsgroups

http://cwiki.apache.org/confluence/display/MAHOUT/TwentyNewsgroups

Wikipedia

http://cwiki.apache.org/confluence/display/MAHOUT/WikipediaBayesExample


Evolutionary

Evolutionary

Traveling Salesman

http://cwiki.apache.org/confluence/display/MAHOUT/Traveling+Salesman

Class Discovery

http://cwiki.apache.org/confluence/display/MAHOUT/Class+Discovery


What s next

What’s Next?

More Examples

Winnow/Perceptron (MAHOUT-85)

Text Clustering

Association Rules (MAHOUT-108)

Logistic Regression

Solr Integration (SOLR-769)

GSOC


When who

When, Who

When? Now!

Mahout is growing

Who? You!

We want programmers who:

Are comfortable with math

Like to work on hard problems

We want others to:

Kick the tires


Where

Where?

  • http://lucene.apache.org/mahout

    • Hadoop - http://hadoop.apache.org

  • http://cwiki.apache.org/MAHOUT

  • [email protected]

    • http://www.lucidimagination.com/search/p:mahout


Resources

Resources

“Programming Collective Intelligence” by Segaran

“Data Mining - Practical Machine Learning Tools and Techniques” by Witten and Frank

“Taming Text” by Ingersoll and Morton


  • Login