data intensive computing from multi cores and gpgpus to cloud computing and deep web n.
Skip this Video
Loading SlideShow in 5 Seconds..
Data-Intensive Computing: From Multi-Cores and GPGPUs to Cloud Computing and Deep Web PowerPoint Presentation
Download Presentation
Data-Intensive Computing: From Multi-Cores and GPGPUs to Cloud Computing and Deep Web

Loading in 2 Seconds...

play fullscreen
1 / 51

Data-Intensive Computing: From Multi-Cores and GPGPUs to Cloud Computing and Deep Web - PowerPoint PPT Presentation

  • Uploaded on

Data-Intensive Computing: From Multi-Cores and GPGPUs to Cloud Computing and Deep Web. Gagan Agrawal. Data-Intensive Computing. Simply put: scalable analysis of large datasets How is it different from: related to Databases: Emphasis on processing of static datasets Data Mining

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Data-Intensive Computing: From Multi-Cores and GPGPUs to Cloud Computing and Deep Web' - uriah-hale

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data intensive computing from multi cores and gpgpus to cloud computing and deep web

Data-Intensive Computing: From Multi-Cores and GPGPUs to Cloud Computing and Deep Web

Gagan Agrawal


data intensive computing
Data-Intensive Computing
  • Simply put: scalable analysis of large datasets
  • How is it different from: related to
    • Databases:
      • Emphasis on processing of static datasets
    • Data Mining
      • Community focused more on algorithms, and not scalable implementations
    • High Performance / Parallel Computing
      • More focus on compute-intensive tasks, not I/O or large datasets
    • Datacenters
      • Use of large resources for hosting data, less on their use for processing
why now
Why Now ?
  • Amount of data is increasing rapidly
  • Cheap Storage
  • Better connectivity, easy to move large datasets on web/grids
  • Science shifting from compute-X to X-informatics
  • Business intelligence and analysis
  • Google’s Map-Reduce has created excitement
architectural context
Architectural Context
  • Processor architecture has gone through a major change
    • No more scaling with clock speeds
    • Parallelism – multi-core / many-core is the trend
  • Accelerators like GPGPUs have become effective
  • More challenges for scaling any class of applications
grid cloud utility computing
Grid/Cloud/Utility Computing
  • Cloud computing is a major new trend in industry
    • Data and computation in a Cloud of resources
    • Pay for use model (like a utility)
  • Has roots in many developments over the last decade
    • Service-oriented computing, Software as a Service (SaaS)
    • Grid computing – use of wide-area resources
my research group
My Research Group
  • Data-intensive computing on emerging architectures
  • Data-intensive computing in Cloud Model
  • Data-integration and query processing – deep web data
  • Querying low-level datasets through automatic workflow composition
  • Adaptive computation – time as a constraint
  • Current students
    • 6 PhD students
    • 2 MS thesis students
    • Talking to several first year students
  • Past students
    • 7 PhDs completed between 2005 and 2008
  • FREERIDE: Data-intensive Computing on Cluster of Multi-cores
  • A system for exploiting GPGPUs for data-intensive computing
  • FREERIDE-G: Data-intensive computing on Cloud Environments
  • Quick overview of three other projects
freeride motivation
FREERIDE - Motivation
  • Availability of very large datasets and it’s analysis (Data-intensive applications)
  • Adaptation of Multi-core and inevitability of parallel programming
  • Need for abstraction of difficulties of parallel programming.
  • A middle-ware for parallelizing Data-intensive applications
  • Motivated by difficulties in implementing and performance tuning of Datamining applications
  • Based on observation of similar generalized reduction among datamining, OLAP and other scientific applications
smp techniques
SMP Techniques
  • Full-replication(f-r) (obvious technique)
  • Locking based techniques
    • Full-locking (f-l)
    • Optimized Full-locking(o-f-l)
    • Fixed Locking(fi-l)
    • Cache-sensitive locking( Hybrid of o-f-l & fi-l)
experimental setup
Experimental setup
  • Intel Xeon E5345 CPU
  • 2 Quad-core machine
  • Each core 2.33GHz
  • 6GB Main memory
  • Nodes in cluster connected by Infiniband
summary of results
Summary of Results
  • Both Full-replication and Cache-sensitive locking can outperform each other based on the nature of application
  • Cache-sensitive locking seems to have high overhead when there is little computation between updates in ReductionObject
  • MPI processes competes well with best of other two when run on smaller cores, but experiences communication overheads when run on larger number of cores
background gpu computing
Background: GPU Computing
  • Multi-core architectures are becoming more popular in high performance computing
  • GPU is inexpensive and fast
  • CUDA is a high level language that supports programming on GPU
challenges of data intensive computing on gpu
Challenges of Data-intensive Computing on GPU
  • SIMD shared memory programming
  • 3 steps involved in the main loop
    • Data read
    • Computing update
    • Writing update
complication of cuda programming
Complication of CUDA Programming
  • User has to have thorough knowledge of the architecture of GPU and the programming model of CUDA
  • Must specify the grid configuration
  • Has to deal with the memory allocation and copy
  • Need to know what data to be copied onto shared memory and how much shared memory to use
  • ……
architecture of the middleware
Architecture of the Middleware
  • User input
  • Code analyzer
    • Analysis of variables (variable type and size)
    • Analysis of reduction functions (sequential code from the user)
  • Code Generator ( generating CUDA code and C++ code invoking the kernel function)
architecture of the middleware1
Architecture of the middleware

Variable Analyzer

Host Program

Variable information

Variable Access Pattern and Combination Operations

Kernel functions

Reduction functions

Code Generator

Grid configuration and kernel invocation

Optional functions

Code Analyzer( In LLVM)


user input

Variables to be used in the reduction function

Values of each variable (typically specified as length of arrays)‏

A sequential reduction function

Optional functions (initialization function, combination function…)‏

User Input
analysis of sequential code
Analysis of Sequential Code
  • Get the information of access features of each variable
  • Figure out the data to be replicated
  • Get the operator for global combination
  • Calculate the size of shared memory to use and which data to be copied to shared memory
experiment results
Experiment Results

Speedup of k-means

emergence of cloud and utility computing
Emergence of Cloud and Utility Computing
  • Group generating data
    • use remote resources for storing data
    • Already popular with SDSC/SRB
  • Scientist interested in deriving results from data
    • use distinct but remote resources for processing
  • Remote Data Analysis Paradigm
    • Data, Computation, and User at Different Locations
    • Unaware of location of other
remote data analysis
Remote Data Analysis


Flexible use of resources

Do not overload data repository

No unnecessary data movement

Avoid caching process once data

Challenge: Tedious details:

Data retrieval and caching

Use of parallel configurations

Use of heterogeneous resources

Performance Issues

Can a Grid Middleware Ease Application Development for Remote Data Analysis and Yet Provide High Performance ?

our work
Our Work

FREERIDE-G (Framework for Rapid Implementation of Datamining Engines in Grid)

Enable Development of Flexible and Scalable Remote Data Processing Applications

Middleware user

Repository cluster

Compute cluster

  • Support use of parallel configurations
    • For hosting data and processing data
  • Transparent data movement
  • Integration with Grid/Web Standards
  • Resource selection
    • Computing resources
    • Data replica
  • Scheduling and Load Balancing
  • Data Wrapping Issues
freeride g processing structure
FREERIDE (G) Processing Structure

KEY observation: most data mining algorithms follow canonical loop

Middleware API:

  • Subset of data to be processed
  • Reduction object
  • Local and global reduction operations
  • Iterator

Derived from precursor system FREERIDE

While( ) {

forall( data instances d) {

(I , d’) = process(d)

R(I) = R(I) op d’




freeride g evolution
FREERIDE-G Evolution


data stored locally


  • ADR responsible for remote data retrieval
  • SRB responsible for remote data retrieval

FREERIDE-G grid service

Grid service featuring

  • Load balancing
  • Data integration










compute node
Compute Node

More compute nodes than data hosts

Each node:

  • Registers IO (from index)
  • Connects to data host

While (chunks to process)

  • Dispatch IO request(s)
  • Poll pending IO
  • Process retrieved chunks
freeride g in action
FREERIDE-G in Action

Compute Node

Data Host

I/O Registration

Connection establishment

SRB Agent

While (more chunks to process)

I/O request dispatched

Pending I/O polled


SRB Master

Retrieved data chunks


SRB Agent

Compute Node

implementation challenges
Implementation Challenges
  • Interaction with Code Repository
    • Simplified Wrapper and Interface Generator
    • XML descriptors of API functions
    • Each API function wrapped in own class
  • Integration with MPICH-G2
    • Supports MPI
    • Deployed through Globus components (GRAM)
    • Hides potential heterogeneity in service startup and management
experimental setup1
Experimental setup

Organizational Grid:

  • Data hosted on Opteron 250 cluster
  • Processed on Opteron 254 cluster
  • Connected using 2 10 GB optical fibers


  • Demonstrate parallel scalability of applications
  • Evaluate overhead of using MPICH-G2 and Globus Toolkit deployment mechanisms
deployment overhead evaluation
Deployment Overhead Evaluation

Clearly a small overhead associated with using Globus and MPICH-G2 for middleware deployment.

Kmeans Clustering with 6.4 GB dataset: 18-20%.

Vortex Detection with 14.8 GB dataset: 17-20%.

deep web data integration
Deep Web Data Integration
  • The emerge of deep web
    • Deep web is huge
    • Different from surface web
    • Challenges for integration
      • Not accessible through search engines
      • Inter-dependences among deep web sources

motivating example
Motivating Example

Given a gene ERCC6, we want to know the amino acid occurring

in the corresponding position in orthologous gene of non-human mammals



AA Positions for Nonsynonymous SNP



Encoded Protein

Entrez Gene

Protein Sequence



Encoded Orthologous Protein

  • Inter-dependences between sources
  • Time consuming if done manually
  • Intelligent order of querying
  • Implicit sub-goals in user query

  • Formulate the query planning problem for deep web databases with dependences
  • Propose a dynamic query planner
  • Develop cost models and an approximate planning algorithm
  • Integrate the algorithm with a deep web mining tool

haste middleware design goals
HASTE Middleware Design Goals
  • To Enable the Time-critical Event Handling to Achieve the Maximum Benefit, While Satisfying the Time Constraint
  • To be Compatible with Grid and Web Services
  • To Enable Easy Deployment and Management with Minimum Human Intervention
  • To be Used in a Heterogeneous Distributed Environment

ICAC 2008

  • Several projects cross cutting Parallel Computing, Distributed Computing and Database/ Data mining
  • Number of opportunities for MS thesis, MS project, and PhD students
  • Relevant Courses
    • CSE 621/721
    • CSE 762
    • CSE 671 / 674