Explorations into internet distributed computing
Download
1 / 21

Explorations into Internet Distributed Computing - PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on

Explorations into Internet Distributed Computing. Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu. Project Overview. Design and implement a simple internet distributed computing framework

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Explorations into Internet Distributed Computing' - andren


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Explorations into internet distributed computing

Explorations into Internet Distributed Computing

Kunal Agrawal, Ang Huey Ting, Li Guoliang, and Kevin Chu


Project overview
Project Overview

Design and implement a simple

internet distributed computing

framework

Compare application development for this environment with traditional parallel computing environment.


Grapevine

Grapevine

An Internet Distributed Computing Framework

- Kunal Agrawal, Kevin Chu



Motivation
Motivation

  • Supercomputers are very expensive

  • Large numbers of personal computers and workstations around the world are naturally networked via the internet

  • Huge amounts of computational resources are wasted because many computers spend most of their time idle

  • Growing interest in grid computing technologies



Internet distributed computing issues
Internet Distributed Computing Issues

  • Nodes reliability

  • Network quality

  • Scalability

  • Security

  • Cross platform portability of object code

  • Computing Paradigm Shift



Client Application

Grapevine

Server

Grapevine

Volunteer

Grapevine

Volunteer

Grapevine

Volunteer


Grapevine features
Grapevine Features

  • Written in Java

  • Parametrized Tasks

  • Inter-task communication

  • Result Reporting

  • Status Reporting


Un addressed issues
Un-addressed Issues

  • Node reliability

  • Load Balancing

  • Un-intrusive Operation

  • Interruption Semantics

  • Deadlock


Meta classifier

Meta Classifier

- Ang Huey Ting, Li Guoliang


Classifier
Classifier

  • Function(instance) = {True,False}

  • Machine Learning Approach

    • Build a model on the training set

    • Use the model to classify new instance

  • Publicly available packages : WEKA(in java), MLC++.


Meta classifier1
Meta Classifier

  • Assembly of classifiers

  • Gives better performance

  • Two ways of generating assembly of classifiers

    • Different training data sets

    • Different algorithms

  • Voting


Building meta classifier
Building Meta Classifier

  • Different Train Datasets - Bagging

    • Randomly generated ‘bags’

    • Selection with replacement

    • Create different ‘flavors’ of the training set

  • Different Algorithms

    • E.g. Naïve Bayesian, Neural Net, SVM

    • Different algorithms works well on different training sets


Why parallelise
Why Parallelise?

  • Computationally intensive

    One classifier = 0.5 hr

    Meta classifier (assembly of 10 classifiers) = 10 *0.5 = 5 hr

  • Distributed Environment - Grapevine

    • Build classifiers in parallel independently

    • Little communication required


Distributed meta classifiers
Distributed Meta Classifiers

  • WEKA- machine learning package

    • University of Waikato, New Zealand

    • http://www.cs.waikato.ac.nz/~ml/weka/

    • Implemented in Java

    • Including most popular machine learning tools


Distributed meta classifiers on grapevine
Distributed Meta-Classifiers on Grapevine

Distributed Bagging

  • Generate different Bags

  • Define bag and Algorithm for each task

  • Submit tasks to Grapevine

  • Node build Classifiers

  • Receive results

  • Perform voting


Preliminary study
Preliminary Study

  • Bagging on Quick Propagation in openMP

    • Implemented in C


Trial domain
Trial Domain

  • Benchmark corpus Reuters21578 for Text Categorization

    • 9000+ train documents

    • 3000+ test documents

    • 90+ categories

    • Perform feature selection

    • Preprocess documents into feature vectors


Summary
Summary

  • Successful internet distributed computing requires addressing many issues outside of traditional computer science

  • Distributed computing is not for everyone


ad