liang chen advisor gagan agrawal computer science engineering n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Grid-Based Middleware’s Support for Processing Distributed Data Streams PowerPoint Presentation
Download Presentation
A Grid-Based Middleware’s Support for Processing Distributed Data Streams

Loading in 2 Seconds...

play fullscreen
1 / 36

A Grid-Based Middleware’s Support for Processing Distributed Data Streams - PowerPoint PPT Presentation


  • 116 Views
  • Uploaded on

Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering. A Grid-Based Middleware’s Support for Processing Distributed Data Streams. Introduction- Motivation. Data stream processing and analysis Data stream: data arrive continuously and need to be processed in real-time

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Grid-Based Middleware’s Support for Processing Distributed Data Streams' - stu


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
liang chen advisor gagan agrawal computer science engineering
Liang Chen

Advisor: Gagan Agrawal

Computer Science & Engineering

A Grid-Based Middleware’s Support for Processing Distributed Data Streams
introduction motivation
Introduction-Motivation
  • Data stream processing and analysis
    • Data stream: data arrive continuously and need to be processed in real-time
  • Data Stream Applications:
    • Online network Intrusion Detection
    • Sensor networks
    • Network Fault Management System for Telecommunication Network Elements
    • Computer Vision Based Surveillance
  • Common features of data streams
    • Continuous arrival
    • Enormous volume
    • Real-time constraints
    • Data sources could be distributed
introduction motivation1

X

Introduction-Motivation

Network Fault Management System

analyzing

alarm message streams

Switch Network

Network Fault Management System

introduction motivation2
Introduction-Motivation

Computer Vision Based Surveillance

introduction motivation3

Switch Network

X

Introduction-Motivation
  • Challenges & possible Solutions
    • Challenge1: Data and/or Computation intensive
introduction motivation4

Switch Network

Introduction-Motivation
  • Challenges & possible Solutions
    • Challenge1: Data and/or Computation intensive
    • Solution: Grid computing technologies
introduction motivation5
Introduction-Motivation
  • Challenges & possible Solutions
    • Challenge1: Data and/or Computation intensive
    • Solution: Grid computing technologies
  • Challenge 2: real-time analysis is required
  • Solution: Self-Adaptation functionality is desired
introduction motivation6
Introduction-Motivation
  • From point of view of the developers who are interested in applications of data streams
    • Would like to concentrate on applications themselves
    • Would not like to focus efforts on
      • Grid computing
      • Adaptation function
introduction our approach
Introduction-Our Approach
  • A Middle-ware that is based on Grid standards and tools and provides self-adaptation functionality
  • The middleware is referred to as GATES (Grid-based AdapTive Execution on Stream)
    • Automatically distributed to proper computing nodes
    • Automatically self-adaptive to varying environment without implementing certain algorithms
system architecture and design from application perspective
System Architecture and Design(From Application Perspective)
  • Breaking down a task into several sub-tasks so that the sub-tasks can consist of a pipeline
  • Implementing each sub-task in Java
  • Writing an XML configuration file for the sub-tasks to be automatically deployed. I.E.
    • specify how many stages (sub-tasks) the pipeline has
    • specify where the codes that are implementing the sub-tasks reside
  • Launch the application by running a java program (StreamClient.class) provided by the GATES
system architecture and design architecture1

:Buffers for applications

:Queues between Grid services

:Grid services of the GATES

:Stages of an application

System Architecture and Design(Architecture)

Stage A

Stage B

Stage C

A

B

C

system architecture and design example
System Architecture and Design(Example)

Public class Sampling-Stage implements StreamProcessing{

void init(){…}

void work(buffer in, buffer out){

while(true)

{

Image img = get-from-buffer-in-GATES(in);

Image img-sample = Sampling(img, sampling-ratio);

put-to-buffer-in-GATES(img-sample, out);

}

}

GATES.Information-About-Adjustment-Parameter(min, max, 1)

sampling-ratio = GATES.getSuggestedParameter();

self adaptation algorithm
Self-adaptation Algorithm
  • Given a queue’s long-term factor at each stage, we want to improve the method of adjusting values of an adaptation parameter
      • Should the adaptation parameter be modified, and if so, in which direction?
      • How to find a new value (update the value) of the adaptation parameter
enhanced self adaptation algorithm
Enhanced Self-adaptation Algorithm
  • Should the adaptation parameter be modified, and if so, in which direction?
    • The answer is related to load status of queues at two consecutive stages
enhanced self adaptation algorithm1

A

B

C

A

B

C

A

B

C

Performance Parameter

A

B

C

A

B

C

A

B

C

A

B

C

A

B

C

Enhanced Self-adaptation Algorithm

A

B

C

Convergent States

Non-Convergent States

enhanced self adaptation algorithm3
Enhanced Self-adaptation Algorithm
  • How to determine the new value for the adaptation parameter
    • Linear update: increase or decrease by a fixed value
      • Hard to find a proper fixed value
    • Previous method
    • Binary tree search
enhanced self adaptation algorithm4

Left Border

Current Value

Right Border

Enhanced Self-adaptation Algorithm

Left Border

Current Value

New Value

Right Border

data mining applications system evaluation
Data Mining Applications & System Evaluation
  • Two Data mining applications
    • Clustream: Clustering data arriving in data streams
data mining applications system evaluation1
Data Mining Applications &System Evaluation
  • Dist-Freq-Counting: finding frequent itemsets from distributed streams
resource allocation schemes
Resource Allocation Schemes
  • Problem Definition
    • Grid resource scheduling for Pipelined processing and real-time distributed streaming applications
    • Mapping workflows onto Grid is a NP-complete problem
    • Static Part: the resource allocation problem for GATES is to determine a deployment configuration
    • Dynamic Part
static allocation scheme
Static Allocation Scheme
  • Static allocation problem: determining a deployment configuration
  • Objective: Automatically generate a deployment configuration according to the information of available resources
  • The number of data sources and their location
  • The destination
  • The number of stages consisting of a pipeline
  • The number of instances of each stage
  • How the instances connect to each other
  • The node where each instance is placed
static allocation scheme1
Static Allocation Scheme

Examples of deployment configurations

related work
Related work
  • Grid Resource Allocation
    • Condor
    • Realtor
    • ACDS etc.
    • Main Differences: our work focuses on Grid resource allocation for workflow applications
  • Adaptation Through a Middleware
    • Cheng et al.’s adaptation framework
    • SWiFT
    • Conductor
    • DART
    • ROAM
    • Main Differences: our work focuses on general supports for adaptation in run-time
summary
Summary
  • Grid computing could be an effective solution for distributed data stream processing
  • GATES
    • Distributed processing
    • Exploit grid web services
    • Self-adaptation to meet the real-time constraints
    • Grid resource allocation schemes