1 / 16

K-means Clustering

K-means Clustering. Group 15 Swathi Gurram Prajakta Purohit. Goal. To program K-means on Twister (Iterative Map-Reduce) and Hadoop (Map - Reduce) and see how the change of framework effects the implementation time. Survey. Twister Configurable long running (cacheable) map/reduce tasks

Download Presentation

K-means Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. K-means Clustering Group 15 SwathiGurram PrajaktaPurohit

  2. Goal • To program K-means on Twister (Iterative Map-Reduce) and Hadoop(Map - Reduce) and see how the change of framework effects the implementation time.

  3. Survey • Twister • Configurable long running (cacheable) map/reduce tasks • Pub/sub messaging based communication/data transfers • Efficient support for Iterative MapReducecomputation • Combine phase to collect all reduce outputs • Data access via local disks

  4. Survey • Hadoop: a software framework that supports data-intensive distributed applications • Uses Map- reduce programming model • it's own filesystem ( HDFS Hadoop Distributed File System based on the Google File System) which is specifically tailored for dealing with large files • can intelligently manage the distribution of processing and your files, and breaking those files down into more manageable chunks for processing

  5. Survey • Haloop : a modified version of the HadoopMapReduce framework •  provide caching options for loop-invariant data access • let users reuse major building blocks from applications' Hadoop implementations • have similar intra-job fault-tolerance mechanisms to Hadoop. • HaLoop reduces query runtimes by 1.85 compared with Hadoop

  6. K-means Clustering

  7. K-means Clustering

  8. Twister K-means

  9. Hadoop K-means

  10. Implementation Timeline

  11. Validation methods

  12. Conclusion • Twister framework is faster than Hadoop for iterative map- reduce applications.

  13. References • http://salsahpc.indiana.edu • http://www.iterativemapreduce.org/samples.html • http://hadoop.apache.org/ • http://en.wikipedia.org/wiki/Apache_Hadoop • http://clue.cs.washington.edu/node/14 • http://code.google.com/p/haloop/ • http://www.cs.washington.edu/homes/billhowe/pubs/HaLoop.pdf

  14. Demo

  15. Thank you

More Related