Grid systems architecture

FutureGrid Status ReportSteven Handsteven.hand@cl.cam.ac.ukJoint project with Jon Crowcroft (CUCL), Tim Harris (CUCL), Ian Pratt (CUCL), Andrew Herbert (MSR), Andy Parker (CeSC)

Grid systems architecture • Common themes of self-organization and distribution • Four motivating application areas: 1. Massively-scalable middleware 2. Advanced resource location mechanisms 3. Automatic s/w replication and distribution 4. Global data storage and publishing • Experimental test beds (PlanetLab, UK eScience centres / access grid / JANET).

Common Techniques • P2P (DHT) layer for distribution: • Using Bamboo routing substrate (Intel Research) for passing messages between peers • Provides a fault-tolerant and scalable overlay net • Can route a message to any node in an n node network in O(ln n) hops, with O(ln n) routing information at each node • Location/Distance Service: • Basic idea: Euclidean co-ordinate space for the Internet • Using PCA + lighthouse/virtual landmark • Running PlanetLab measurements looking at sensitivity to #dimensions, etc • Building “plug in” service for Bamboo neighbour selection

1. P2P Group Communication • Built on Bamboo and location service: • Use location service (co-ordinates) to get best forward route • Use RPF (Scribe) algorithm to build tree • Tree is max 2*delay of native IP multicast tree • Can build per source, or “centered” tree based on density of group, #senders, #receivers, … • Current status: • General system deployed and under test on PlanetLab • Whiteboard demo program works on top of this • Next steps: • IP multicast tunnels across multicast incapable ‘chasms’ • P2P overlay for vic/rat/access grid anticipated end ‘04

2. Distributed resource location 2. Translate to locations in a multi-dimensional search space 3. Partition/replicate the search space 4. Queries select portions of the search space 1. Determine machine locations and resource availability

Current Focus • Location-based resource co-allocation • Wish to choose subset of available nodes according to resource availability and location • First filter set, then use heuristic to solve constrained problems of the form far(near(S1,S2), near(S3,S4), C1) • System built around P2P spatial index • Three phase algorithm • Find an approximate solution in terms of clusters • Use simmulated annealing to minimize associated cost • Select representative machine(s) for each cluster • Results close to ‘brute force’ (average 10% error)

3. P2P Computing • Attempt to use P2P communication principles to gain similar benefits for grid computing • Proceeding on three axes targeting core computation, bioinformatics and batch workloads • Algorithm-specific load tolerance: • Want to allow decentralized independent load shedding • Client submits parallel computation to M > N nodes s.t. any N results suffice to produce ‘correct’ result • General case intractable: focus on algorithm-specific solns • Current focus on matrix operations using erasure codes • Also considering sketches as approximation technique

P2P Computing (2) • Indexing genomic sequences (3 x 109) • Based on using suffix array indexes; supports string matching, motif deletion, sequence alignments, etc • Smaller memory reqs than state of art suffix tree • Distributed on-line construction using P2P overlay • Caching issues (memory and swap) need investigation • Batch-aware ‘spread spectrum’ storage • Observe many batches share considerable data • Want to encourage client-driven distribution of data but avoid centralized quotas and pathological storage use • Use Palimpsest P2P storage system with ‘soft guarantees’ • Data discarded under load => need refresh to keep

4. Global Data Storage • Global-scale distributed file system • Mutability; shared directories; random access • Data permanence, quotas • Aggressive, localized caching in proportion to demand • While maintaining coherence • Storage Nodes • Confederated, well connected, relatively stable • Offer multiples of a unit of storage in return for quota it can distribute amongst users • Clients • Access via nearest Storage Node

Basic Storage Technique • Immutable data blocks, mutable index blocks • Block Id is H(contents), or H(public key) for index blocks • Insert a block by using Bamboo to route it to the node with Id nearest to the Id of the block • Maintain replicas on adjacent nodes for redundancy • Send storage vouchers to user’s accountant nodes

Content-based chunking B1 B1 B2 • Blocks with same Id are reference counted • Clients split file with content-based hash • Rabin fingerprint over 48 byte sliding window • Similar files share blocks • Reduces storage req • Improves caching perf Insert new block B8 and withdraw B2 and B3 Write B3 B4 B4 B5 B5 Read Read: return data B6 Insert new block B9 and withdraw B6 – B7unchanged Insert B7 B7

Summary and Future Work • Attempt to push towards a ‘Future GRID’ • Four ‘strands’ with common themes and (some) common infrastructure • Group communication, resource co-allocation, load flexible computing, global distributed storage • All four strands making progress: • Early papers / tech reports in all cases • Bamboo and location-service deployed & under test • Next steps include: • Move PlanetLab experiments to UK eScience infrastructure • Analysis and test of prototype designs/software

Bamboo Route convergence

Caching • Data either returned directly, or via previous hop if block is “hot” • Cached copies are “drawn-out” from primary store toward requestors • Exploits local route convergence

Mutable index blocks <folder name=“tim” <file name=“hello.txt”> <blocklist> <block o=“234”> …id… </block> </blocklist> </file> <folder name =“bar” <file name=“hello2.txt”> <blocklist> …id… </blocklist> </file></folder><overlay> <index path=.> …id… <\index> </overlay> • May describe an arbitrary file hierarchy • Index block has associated keypair (eFS, dFS) • Insert index block using hash of public key as Id • Authenticate update by signing insertion voucher using private key • May link to other index blocks • Merge contents • Organise according to access/update patterns Voucher: H(blk), repl_factor, eFS  dFS

Update dissemination

Shared file spaces • Users can only update their own index blocks • Sharing through overlaying • Import other user’s name space, modify, re-export • Copy on Write overlay • Active delete markers

Grid systems architecture

Grid systems architecture

Presentation Transcript

Grid Architecture for eLearning

Grid Architecture

Smart Grid Systems

GRID SYSTEMS

Grid Computing Systems

Virtual Data Grid Architecture

Grid architecture at PHENIX

Grid Checkpoining Architecture

GRID SYSTEMS

Grid Control Architecture

Systems Architecture

Service Oriented Grid Architecture

Grid Architecture

Grid Systems

Introduction to Grid Architecture What is Architecture?

Grid Computing Systems

Introduction to Grid Architecture What is Architecture?

Grid architecture at PHENIX

Systems Architecture

GRID SYSTEMS

Systems Architecture