1 / 9

An introduction to Apache Mahout

A introduction to Apache Mahout, what is it and how does it work ? What is machine inteligence ? How can mahout be installed and tested on Hadoop ?

semtechs
Download Presentation

An introduction to Apache Mahout

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Apache Mahout • What is it ? • How does it work ? • Machine Learning • Algorithms • Install www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  2. Mahout – What is it ? • Machine learning • For large data • Based on Hadoop • But can work on a non Hadoop cluster • Scaleable • Licensed by Apache www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  3. Mahout – How does it work ? • Uses Hadoop Map Reduce • Has many supplied algorithms • Supports four use cases • Recommendation mining • Clustering • Classification • Frequent Itemset Mining www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  4. Mahout - Machine Learning Machine learning – what does it mean ? • A branch of artificial intelligence • Systems that learn from data • Classify data after learning • Learn on test data sets • Generalisation – the ability to classify unseen data sets • after learning www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  5. Mahout – Algorithms Some of the available algorithms (among many others)‏ • Collaborative filtering • Narrow Sense – make predictions about user interests by collecting preferences • General - Multi agent collaboration for information filtering • Mean shift clustering • Mode seeking, used for visual tracking • Parallel frequent pattern mining • Find unique features www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  6. Mahout – Install So how do we install Mahout and test it ? • Install Maven • sudo apt-get install maven3 • Install Apache Mahout • You will need subversion installed • svn co http://svn.apache.org/repos/asf/mahout/trunk • Go to dir containing pom.xml file • mvn install ## in ./trunk Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  7. Mahout – Test Install So let us run a test • cd $MAHOUT_HOME/examples/bin • ./build-reuters.sh • choose option 1 kmeans clustering • Should finish with – see next slide Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  8. Mahout – Test Install cd $MAHOUT_HOME/examples/bin ; ./build-reuters.sh Please call cluster-reuters.sh directly next time. This file is going away. Please select a number to choose the corresponding clustering algorithm 1. kmeans clustering 2. fuzzykmeans clustering 3. lda clustering Enter your choice : 1 ok. You chose 1 and we'll use kmeans Clustering ................................. Inter-Cluster Density: NaN Intra-Cluster Density: 0.0 CDbw Inter-Cluster Density: NaN CDbw Intra-Cluster Density: NaN CDbw Separation: NaN Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz info@semtech-solutions.co.nz

  9. Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems

More Related