parallel programming on computational grids n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Parallel Programming on Computational Grids PowerPoint Presentation
Download Presentation
Parallel Programming on Computational Grids

Loading in 2 Seconds...

play fullscreen
1 / 50

Parallel Programming on Computational Grids - PowerPoint PPT Presentation


  • 217 Views
  • Uploaded on

Parallel Programming on Computational Grids. Outline. Grids Application-level tools for grids Parallel programming on grids Case study: Ibis. Grids. Seamless integration of geographically distributed computers, databases, instruments The name is an analogy with power grids

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Parallel Programming on Computational Grids' - herne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Grids
  • Application-level tools for grids
  • Parallel programming on grids
  • Case study: Ibis
grids
Grids
  • Seamless integration of geographically distributed computers,databases, instruments
    • The name is an analogy with power grids
  • Highly active research area
    • Global Grid Forum
    • Globus middleware
    • Many European projects
      • e.g. Gridlab: Grid Application Toolkit and Testbed
    • VL-e (Virtual laboratory for e-Science) project
    • ….
why grids
Why Grids?
  • New distributed applications that use data or instruments across multiple administrative domains and that need much CPU power
    • Computer-enhanced instruments
    • Collaborative engineering
    • Browsing of remote datasets
    • Use of remote software
    • Data-intensive computing
    • Very large-scale simulation
    • Large-scale parameter studies
web grids and e science
Web, Grids and e-Science
  • Web is about exchanging information
  • Grid is about sharing resources
    • Computers, data bases, instruments
  • e-Science supports experimental science by providing avirtual laboratory on top of Grids
    • Support for visualization, workflows, data management,security, authentication, high-performance computing
slide6

Grids

Harness distributed resources

The big picture

Application

Application

Application

Potential Generic

part

Potential Generic

part

Potential Generic

part

Management

of comm. &

computing

Virtual Laboratory

Application oriented services

Management

of comm. &

computing

Management

of comm. &

computing

applications
Applications
  • e-Science experiments generate much data, that often is distributed and that need much (parallel) processing
    • high-resolution imaging: ~ 1 GByte per measurement
    • Bio-informatics queries: 500 GByte per database
    • Satellite world imagery: ~ 5 TByte/year
    • Current particle physics: 1 PByte per year
    • LHC physics (2007): 10-30 PByte per year
grid programming
Grid programming
  • The goal of a grid is to make resource sharing very easy (transparent)
  • Practice: grid programming is very difficult
    • Finding resources, running applications, dealing with heterogeneity and security, etc.
  • Grid middleware (Globus Toolkit) makes this somewhat easier, but is still low-level and changes frequently
  • Need application-level tools
application level tools
Application-level tools
  • Builds on grid software infrastructure
  • Isolates users from dynamics of the grid hardware infrastructure
  • Generic (broad classes of applications)
  • Easy-to-use
taxonomy of existing application level tools
Taxonomy of existing application-level tools
  • Grid programming models
    • RPC
    • Task parallelism
    • Message passing
    • Java programming
  • Grid application execution environments
    • Parameter sweeps
    • Workflow
    • Portals
remote procedure call rpc
Remote Procedure Call (RPC)
  • GridRPC: specialize RPC-style (client/server) programming for grids
    • Allows coarse-grain task parallelism & remote access
    • Extended with resource discovery, scheduling, etc.
  • Example: NetSolve
    • Solves numerical problems on a grid
  • Current development: use web technology (WSDL, SOAP) for grid services
  • Web and grid technology are merging
task parallelism
Task parallelism
  • Many systems for task parallelism (master-worker, replicated workers) exist for the grid
  • Examples
    • MW (master-worker)
    • Satin: divide&conquer (hierarchical master-worker)
message passing
Message passing
  • Several MPI implementations exist for the grid
  • PACX MPI (Stutgart):
    • Runs on heterogeneous systems
  • MagPIe (Thilo Kielmann):
    • Optimizes MPI’s collective communicationfor hierarchical wide-area systems
  • MPICH-G2:
    • Similar to PACX and MagPIe, implemented on Globus
java programming
Java programming
  • Java uses bytecode and is very portable
    • ``Write once, run anywhere’’
  • Can be used to handle heterogeneity
  • Many systems now have Java interfaces:
    • Globus (Globus Java Commodity Grid)
    • MPI (MPIJava, MPJ, …)
    • Gridlab Application Toolkit (Java GAT)
  • Ibis is a Java-centric grid programming system
parameter sweep applications
Parameter sweep applications
  • Computations what are mostly independent
    • E.g. run same simulation many times with different parameters
  • Can tolerate high network latencies, can easily be made fault-tolerant
  • Many systems use this type of trivial parallelism to harness idle desktops
    • APST, SETI@home, Entropia, XtremWeb
workflow applications
Workflow applications
  • Link and compose diverse software tools and data formats
    • Connect modules and data-filters
  • Results in coarse-grain, dataflow-like parallelism thatcan be run efficiently on a grid
  • Several workflow management systems exist
    • E.g. Virtual Lab Amsterdam (predecessor VL-e)
portals
Portals
  • Graphical interfaces to the grid
  • Often application-specific
  • Also portals for resource brokering, file transfers, etc.
outline1
Outline
  • Grids
  • Application-level tools for grids
  • Parallel programming on grids
  • Case study: Ibis
distributed supercomputing
Distributed supercomputing
  • Parallel processing on geographically distributed computing systems (grids)
  • Examples:
    • SETI@home (), RSA-155, Entropia, Cactus
  • Currently limited to trivially parallel applications
  • Questions:
    • Can we generalize this to more HPC applications?
    • What high-level programming support is needed?
grids versus supercomputers
Grids versus supercomputers
  • Performance/scalability
    • Speedups on geographically distributed systems?
  • Heterogeneity
    • Different types of processors, operating systems, etc.
    • Different networks (Ethernet, Myrinet, WANs)
  • General grid issues
    • Resource management, co-allocation, firewalls, security, authorization, accounting, ….
approaches
Approaches
  • Performance/scalability
    • Exploit hierarchical structure of grids(previous project: Albatross)
  • Heterogeneity
    • Use Java + JVM (Java Virtual Machine) technology
  • General grid issues
    • Studied in many grid projects: VL-e, GridLab, GGF
speedups on a grid
Speedups on a grid?
  • Grids usually are hierarchical
    • Collections of clusters, supercomputers
    • Fast local links, slow wide-area links
  • Can optimize algorithms to exploit this hierarchy
    • Minimize wide-area communication
  • Successful for many applications
    • Did many experiments on a homogeneous wide-area test bed (DAS)
example n body simulation
Example: N-body simulation
  • Much wide-area communication
    • Each node needs info about remote bodies

CPU 1

CPU 1

CPU 2

CPU 2

Amsterdam

Delft

trivial optimization
Trivial optimization

CPU 1

CPU 1

CPU 2

CPU 2

Amsterdam

Delft

wide area optimizations
Wide-area optimizations
  • Message combining on wide-area links
  • Latency hiding on wide-area links
  • Collective operations for wide-area systems
    • Broadcast, reduction, all-to-all exchange
  • Load balancing

Conclusions:

    • Many applications can be optimized to run efficiently on a hierarchical wide-area system
    • Need better programming support
the ibis system
The Ibis system
  • High-level & efficient programming support for distributed supercomputing on heterogeneous grids
  • Use Java-centric approach + JVM technology
    • Inherently more portable than native compilation
      • “Write once, run anywhere ”
    • Requires entire system to be written in pure Java
  • Optimized special-case solutions with native code
    • E.g. native communication libraries
ibis programming support
Ibis programming support
  • Ibis provides
    • Remote Method Invocation (RMI)
    • Replicated objects (RepMI) - as in Orca
    • Group/collective communication (GMI) - as in MPI
    • Divide & conquer (Satin) - as in Cilk
  • All integrated in a clean, object-oriented way into Java, using special “marker” interfaces
    • Invoking native library (e.g. MPI) would give upJava’s “run anywhere” portability
compiling optimizing programs
Compiling/optimizing programs

JVM

source

bytecode

Javacompiler

bytecoderewriter

bytecode

source

JVM

JVM

  • Optimizations are done by bytecode rewriting
    • E.g. compiler-generated serialization (as in Manta)
satin a parallel divide and conquer system on top of ibis
Satin: a parallel divide-and-conquer system on top of Ibis
  • Divide-and-conquer is inherently hierarchical
  • More general than master/worker
  • Satin: Cilk-like primitives (spawn/sync) in Java
  • New load balancing algorithm (CRS)
    • Cluster-aware random work stealing
example
Example

interface FibInter {

public int fib(long n);

}

class Fib implements FibInter {

int fib (int n) {

if (n < 2) return n;

return fib(n-1) + fib(n-2);

}

}

Single-threaded Java

example1

GridLab testbed

Example

interface FibInter

extends ibis.satin.Spawnable {

public int fib(long n);

}

class Fib

extends ibis.satin.SatinObject

implements FibInter {

public int fib (int n) {

if (n < 2) return n;

int x = fib (n - 1);

int y = fib (n - 2);

sync();

return x + y;

}

}

Java + divide&conquer

ibis implementation
Ibis implementation
  • Want to exploit Java’s “run everywhere” property, but
    • That requires 100% pure Java implementation,no single line of native code
    • Hard to use native communication (e.g. Myrinet) or native compiler/runtime system
  • Ibis approach:
    • Reasonably efficient pure Java solution (for any JVM)
    • Optimized solutions with native code for special cases
status and current research
Status and current research
  • Programming models
    • ProActive (mobility)
    • Satin (divide&conquer)
  • Applications
    • Jem3D (ProActive)
    • Barnes-Hut, satisfiability solver (Satin)
    • Spectrometry, sequence alignment (RMI)
  • Communication
    • WAN-communication, performance-awareness
challenges
Challenges
  • How to make the system flexible enough
    • Run seamlessly on different hardware / protocols
  • Make the pure-Java solution efficient enough
    • Need fast local communication evenfor grid applications
  • Special-case optimizations
fast communication in pure java
Fast communication in pure Java
  • Manta system [ACM TOPLAS Nov. 2001]
    • RMI at RPC speed, but using native compiler & RTS
  • Ibis does similar optimizations, but in pure Java
    • Compiler-generated serialization at bytecode level
      • 5-9x faster than using runtime type inspection
    • Reduce copying overhead
      • Zero-copy native implementation for primitive arrays
      • Pure-Java requires type-conversion (=copy) to bytes
grid experiences with ibis
Grid experiences with Ibis
  • Using Satin divide-and-conquer system
    • Implemented with Ibis in pure Java, using TCP/IP
  • Application measurements on
    • DAS-2 (homogeneous)
    • Testbed from EC GridLab project (heterogeneous)
distributed asci supercomputer das 2

VU (72 nodes)

UvA (32)

GigaPort

(1-10 Gb)

Leiden (32)

Delft (32)

Utrecht (32)

Distributed ASCI Supercomputer (DAS) 2
satin on gridlab
Satin on GridLab
  • Heterogeneous European grid testbed
  • Implemented Satin/Ibis on GridLab, using TCP
  • Source: van Nieuwpoort et al., AGRIDM’03 (Workshop on Adaptive Grid Middleware, New Orleans, Sept. 2003)
gridlab
GridLab
  • Latencies:
    • 9-200 ms (daytime),9-66 ms (night)
  • Bandwidths:
    • 9-4000 KB/s
experiences
Experiences
  • No support for co-allocation yet (done manually)
  • Firewall problems everywhere
    • Currently: use a range of site-specific open ports
    • Can be solved with TCP splicing
  • Java indeed runs anywhere

modulo bugs in (old) JVMs

  • Need clever load balancing mechanism (CRS)
cluster aware random stealing
Cluster-aware Random Stealing
  • Use Cilk’s Random Stealing (RS) inside cluster
  • When idle
    • Send asynchronous wide-area steal message
    • Do random steals locally, and execute stolen jobs
    • Only 1 wide-area steal attempt in progress at a time
  • Prefetching adapts
    • More idle nodes  more prefetching
  • Source: van Nieuwpoort et al., ACM PPoPP’01
performance on gridlab
Performance on GridLab
  • Problem: how to define efficiency on a grid?
  • Our approach:
    • Benchmark each CPU with Raytracer on small input
    • Normalize CPU speeds (relative to a DAS-2 node)
    • Our case: 40 CPUs, equivalent to 24.7 DAS-2 nodes
    • Define:
      • T_perfect = sequential time / 24.7
      • efficiency = T_perfect / actual runtime
    • Also compare against single 25-node DAS-2 cluster
results for raytracer
Results for Raytracer

RS = Random Stealing, CRS = Cluster-aware RS

some statistics
Some statistics
  • Variations in execution times:
    • RS @ day: 0.5 - 1 hour
    • CRS @ day: less than 20 secs variation
  • Internet communication (total):
    • RS: 11,000 (night) - 150,000 (day) messages 137 (night) - 154 (day) MB
    • CRS: 10,000 - 11,000 messages 82 - 86 MB
summary
Summary
  • Parallel computing on Grids (distributed supercomputing) is a challenging and promising research area
  • Many grid programming environmenents exist
  • Ibis: a Java-centric Grid programming environment
    • Portable (“run anywhere”) and efficient
  • Future work: Virtual Laboratory for e-Science (VL-e)
    • Government-funded (20 M.Euro) Dutch project