data intensive computing from clouds to gpus n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data-Intensive Computing: From Clouds to GPUs PowerPoint Presentation
Download Presentation
Data-Intensive Computing: From Clouds to GPUs

Loading in 2 Seconds...

play fullscreen
1 / 47

Data-Intensive Computing: From Clouds to GPUs - PowerPoint PPT Presentation


  • 141 Views
  • Uploaded on

Data-Intensive Computing: From Clouds to GPUs. Gagan Agrawal. Motivation. Growing need for analysis of large scale data Scientific Commercial Data-intensive Supercomputing (DISC) Map-Reduce has received a lot of attention Database and Datamining communities

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data-Intensive Computing: From Clouds to GPUs' - edan-buck


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
motivation
Motivation
  • Growing need for analysis of large scale data
    • Scientific
    • Commercial
  • Data-intensive Supercomputing (DISC)
  • Map-Reduce has received a lot of attention
    • Database and Datamining communities
    • High performance computing community
motivation 2
Motivation (2)
  • Processor Architecture Trends
    • Clock speeds are not increasing
    • Trend towards multi-core architectures
    • Accelerators like GPUs are very popular
    • Cluster of multi-cores / GPUs are becoming common
  • Trend Towards the Cloud
    • Use storage/computing/services from a provider
      • How many of you prefer gmail over cse/osu account for e-mail?
    • Utility model of computation
    • Need high-level APIs and adaptation
my research group
My Research Group
  • Data-intensive theme at multiple levels
    • Parallel programming Models
    • Multi-cores and accelerators
    • Adaptive Middleware
    • Scientific Data Management / Workflows
    • Deep Web Integration and Analysis
personnel
Personnel
  • Currently
    • 10 PhD students
    • 4 MS thesis students
  • Graduated PhDs
    • 7 Graduated PhDs between 2005 and 2008
this talk
This Talk
  • Parallel Programming API for Data-Intensive Computing
    • An alternate API and System for Google’s Map-Reduce
    • Show actual comparison
  • Data-intensive Computing on Accelerators
    • Compilation for GPUs
  • Overview of other topics
    • Scientific Data Management
    • Adaptive and Streaming Middleware
    • Deep web
map reduce
Map-Reduce
  • Simple API for (data-intensive) parallel programming
  • Computation is:
    • Apply map on each input data element
    • Produce (key,value) pair(s)
    • Sort them using the key
    • Apply reduce on the set with a distinct key values
map reduce positives and questions
Map-Reduce: Positives and Questions
  • Positives:
    • Simple API
      • Functional language based
      • Very easy to learn
    • Support for fault-tolerance
      • Important for very large-scale clusters
  • Questions
    • Performance?
      • Comparison with other approaches
    • Suitability for different class of applications?
class of data intensive applications
Class of Data-Intensive Applications
  • Many different types of applications
    • Data-center kind of applications
      • Data scans, sorting, indexing
    • More ``compute-intensive’’ data-intensive applications
      • Machine learning, data mining, NLP
      • Map-reduce / Hadoop being widely used for this class
    • Standard Database Operations
      • Sigmod 2009 paper compares Hadoop with Databases and OLAP systems
  • What is Map-reduce suitable for?
  • What are the alternatives?
    • MPI/OpenMP/Pthreads – too low level?
hadoop implementation
Hadoop Implementation
  • HDFS
    • Almost GFS, but no file update
    • Cannot be directly mounted by an existing operating system
  • Fault tolerance
    • Name node
    • Job Tracker
    • Task Tracker
freeride goals
FREERIDE: GOALS
  • Framework for Rapid Implementation of Data Mining Engines
  • The ability to rapidly prototype a high-performance mining implementation
    • Distributed memory parallelization
    • Shared memory parallelization
    • Ability to process disk-resident datasets
  • Only modest modifications to a sequential implementation for the above three
  • Developed 2001-2003 at Ohio State

November 19, 2014

12

freeride technical basis
FREERIDE – Technical Basis
  • Popular data mining algorithms have a common canonical loop
  • Generalized Reduction
  • Can be used as the basis for supporting a common API
  • Demonstrated for Popular Data Mining and Scientific Data Processing Applications

While( ) {

forall (data instances d) {

I = process(d)

R(I) = R(I) op f(d)

}

…….

}

comparing processing structure
Comparing Processing Structure
  • Similar, but with subtle differences
observations on processing structure
Observations on Processing Structure
  • Map-Reduce is based on functional idea
    • Do not maintain state
  • This can lead to sorting overheads
  • FREERIDE API is based on a programmer managed reduction object
    • Not as ‘clean’
    • But, avoids sorting
    • Can also help shared memory parallelization
    • Helps better fault-recovery
experiment design
Experiment Design
  • Tuning parameters in Hadoop
    • Input Split size
    • Max number of concurrent map tasks per node
    • Number of reduce tasks
  • For comparison, we used four applications
    • Data Mining: KMeans, KNN, Apriori
    • Simple data scan application: Wordcount
  • Experiments on a multi-core cluster
    • 8 cores per node (8 map tasks)
results data mining
Results – Data Mining
  • KMeans: varying # of nodes

Dataset: 6.4G

K : 1000

Dim: 3

Avg. Time Per Iteration (sec)

# of nodes

results data mining ii
Results – Data Mining (II)
  • Apriori: varying # of nodes

Dataset: 900M

Support level: 3%

Confidence level: 9%

Avg. Time Per Iteration (sec)

# of nodes

November 19, 2014

18

results data mining iii
Results – Data Mining (III)
  • KNN: varying # of nodes

Dataset: 6.4G

K : 1000

Dim: 3

Avg. Time Per Iteration (sec)

# of nodes

November 19, 2014

19

results datacenter like application
Results – Datacenter-like Application
  • Wordcount: varying # of nodes

Dataset: 6.4G

Total Time (sec)

# of nodes

scalability comparison
Scalability Comparison
  • KMeans: varying dataset size

K : 100

Dim: 3

On 8 nodes

Avg. Time Per Iteration (sec)

Dataset Size

scalability word count
Scalability – Word Count
  • Wordcount: varying dataset size

On 8 nodes

Total Time (sec)

Dataset Size

overhead breakdown
Overhead Breakdown
  • Four components affecting the hadoop performance
    • Initialization cost
    • I/O time
    • Sorting/grouping/shuffling
    • Computation time
  • What is the relative impact of each ?
  • An Experiment with k-means
analysis with k means
Analysis with K-means
  • Varying the number of clusters (k)

Dataset: 6.4G

Dim: 3

On 16 nodes

Avg. Time Per Iteration (sec)

# of KMeans Clusters

analysis with k means ii
Analysis with K-means (II)
  • Varying the number of dimensions

Dataset: 6.4G

K : 1000

On 16 nodes

Avg. Time Per Iteration (sec)

# of Dimensions

observations
Observations
  • Initialization costs and limited I/O bandwidth of HDFS are significant in Hadoop
  • Sorting is also an important limiting factor for Hadoop’s performance
this talk1
This Talk
  • Parallel Programming API for Data-Intensive Computing
    • An alternate API and System for Google’s Map-Reduce
    • Show actual comparison
  • Data-intensive Computing on Accelerators
    • Compilation for GPUs
  • Overview of other topics
    • Scientific Data Management
    • Adaptive and Streaming Middleware
    • Deep web
slide28

Background - GPU Computing

  • Many-core architectures/Accelerators are becoming more popular
  • GPUs are inexpensive and fast
  • CUDA is a high-level language for GPU programming
slide29

CUDA Programming

  • Significant improvement over use of Graphics Libraries
  • But ..
    • Need detailed knowledge of the architecture of GPU and a new language
    • Must specify the grid configuration
    • Deal with memory allocation and movement
    • Explicit management of memory hierarchy
slide30

Parallel Data mining

  • Common structure of data mining applications (FREERIDE)‏
  • /* outer sequential loop */
  • while() {
  • /* Reduction loop */
  • Foreach (element e){
  • (i, val) = process(e);
  • Reduc(i) = Reduc(i) opval;
  • }
porting on gpus
Porting on GPUs
  • High-level Parallelization is straight-forward
  • Details of Data Movement
  • Impact of Thread Count on Reduction time
  • Use of shared memory
slide32

Architecture of the System

Variable Analyzer

Host Program

User Input

Variable information

Variable Access Pattern and Combination Operations

Kernel functions

Reduction functions

Code Generator

Grid configuration and kernel invocation

Optional functions

Code Analyzer( In LLVM)‏

Executable

slide33

Variables to be used in the reduction function

Values of each variable or size of array

A sequential reduction function

Optional functions (initialization function, combination function…)‏

User Input

slide34

Analysis of Sequential Code

  • Get the information of access features of each variable
  • Determine the data to be replicated
  • Get the operator for global combination
  • Variables for shared memory
slide35

Memory Allocation and Copy

……

A.

……

T61

T62

T0

T1

T63

T0

T1

T2

T3

T4

B.

……

T61

T62

T0

T1

T63

T0

T1

T2

T3

T4

Need copy for each thread

C.

Copy the updates back to host memory after the kernel reduction function returns

slide36

Generating CUDA Code and C++/C code Invoking the Kernel Function

  • Memory allocation and copy
  • Thread grid configuration (block number and thread number)‏
  • Global function
  • Kernel reduction function
  • Global combination
slide37

Optimizations

  • Using shared memory
  • Providing user-specified initialization functions and combination functions
  • Specifying variables that are allocated once
slide38

Applications

  • K-means clustering
  • EM clustering
  • PCA
slide39

ExperimentalResults

Speedup of k-means

deep web data integration
Deep Web Data Integration
  • The emerge of deep web
    • Deep web is huge
    • Different from surface web
    • Challenges for integration
      • Not accessible through search engines
      • Inter-dependences among deep web sources

wangfa@cse.ohio-state.edu

our contributions
Our Contributions
  • Structured Query Processing on Deep Web
  • Schema Mining/Matching
  • Analysis of Deep web data sources
adaptive middleware
Adaptive Middleware
  • To Enable the Time-critical Event Handling to Achieve the Maximum Benefit, While Satisfying the Time/Budget Constraint
  • To be Compatible with Grid and Web Services
  • To Enable Easy Deployment and Management with Minimum Human Intervention
  • To be Used in a Heterogeneous Distributed Environment

ICAC 2008

summary
Summary
  • Growing data is creating new challenges in HPC, grid and cloud environments
  • Number of topics being addressed
  • Many opportunities for involvement
  • 888 meets Thursdays 5:00 – 6:00, DL 280