180 likes | 499 Views
K-means Clustering. Group 15 Swathi Gurram Prajakta Purohit. Goal. To program K-means on Twister (Iterative Map-Reduce) and Hadoop (Map - Reduce) and see how the change of framework effects the implementation time. Survey. Twister Configurable long running (cacheable) map/reduce tasks
E N D
K-means Clustering Group 15 SwathiGurram PrajaktaPurohit
Goal • To program K-means on Twister (Iterative Map-Reduce) and Hadoop(Map - Reduce) and see how the change of framework effects the implementation time.
Survey • Twister • Configurable long running (cacheable) map/reduce tasks • Pub/sub messaging based communication/data transfers • Efficient support for Iterative MapReducecomputation • Combine phase to collect all reduce outputs • Data access via local disks
Survey • Hadoop: a software framework that supports data-intensive distributed applications • Uses Map- reduce programming model • it's own filesystem ( HDFS Hadoop Distributed File System based on the Google File System) which is specifically tailored for dealing with large files • can intelligently manage the distribution of processing and your files, and breaking those files down into more manageable chunks for processing
Survey • Haloop : a modified version of the HadoopMapReduce framework • provide caching options for loop-invariant data access • let users reuse major building blocks from applications' Hadoop implementations • have similar intra-job fault-tolerance mechanisms to Hadoop. • HaLoop reduces query runtimes by 1.85 compared with Hadoop
Conclusion • Twister framework is faster than Hadoop for iterative map- reduce applications.
References • http://salsahpc.indiana.edu • http://www.iterativemapreduce.org/samples.html • http://hadoop.apache.org/ • http://en.wikipedia.org/wiki/Apache_Hadoop • http://clue.cs.washington.edu/node/14 • http://code.google.com/p/haloop/ • http://www.cs.washington.edu/homes/billhowe/pubs/HaLoop.pdf