1 / 36

Introduction to Grid Computing

Slides adapted from Midwest Grid School Workshop 2008 ( https://twiki.grid.iu.edu/bin/view/Education/MWGS08). Introduction to Grid Computing. Overview. Supercomputers, Clusters, and Grids Motivations and Applications Grid Middlewares and Architectures Job Management, Data Management

hinto
Download Presentation

Introduction to Grid Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Slides adapted from Midwest Grid School Workshop 2008 (https://twiki.grid.iu.edu/bin/view/Education/MWGS08) Introduction toGrid Computing

  2. Overview Supercomputers, Clusters, and Grids Motivations and Applications Grid Middlewares and Architectures Job Management, Data Management Grid Security National Grids Grid Workflow 2

  3. Computing “Clusters” are today’s Supercomputers Cluster Management “frontend” A few Headnodes, gatekeepers and other service nodes I/O Servers typically RAID fileserver Lots of Worker Nodes Disk Arrays Tape Backup robots 3

  4. Cluster Architecture Cluster User … Cluster User InternetProtocols Shared Cluster Filesystem Storage (applications and data) Head Node(s) Login access (ssh) Cluster Scheduler (PBS, Condor, SGE) Web Service (http) Remote File Access (scp, FTP etc) Node 0 ComputeNodes (10 to 10,000 PC’swithlocal disks) … … … Node N Job execution requests & status 5

  5. Scaling up Science:Citation Network Analysis in Sociology Work of James Evans, University of Chicago, Department of Sociology 6

  6. Scaling up the analysis • Query and analysis of 25+ million citations • Work started on desktop workstations • Queries grew to month-long duration • With data distributed acrossU of Chicago TeraPort cluster: • 50 (faster) CPUs gave 100 X speedup • Many more methods and hypotheses can be tested! • Higher throughput and capacity enables deeper analysis and broader community access. 7

  7. ComputeCluster ComputeCluster ComputeCluster Grids consist of distributed clusters Grid Client Grid Site 1:Fermilab Grid Storage Grid Service Middleware Application & User Interface Grid Site 2: Sao Paolo Grid Client Middleware Grid Storage Grid Service Middleware GridProtocols Resource, Workflow & Data Catalogs …Grid Site N: UWisconsin Grid Storage Grid Service Middleware 8

  8. ~PBytes/sec ~100 MBytes/sec Offline Processor Farm ~20 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec Online System Tier 0 CERN Computer Centre ~622 Mbits/sec or Air Freight (deprecated) Tier 1 FermiLab ~4 TIPS France Regional Centre Germany Regional Centre Italy Regional Centre ~622 Mbits/sec Tier 2 Tier2 Centre ~1 TIPS Caltech ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS HPSS HPSS HPSS HPSS HPSS ~622 Mbits/sec Institute ~0.25TIPS Institute Institute Institute Physics data cache ~1 MBytes/sec 1 TIPS is approximately 25,000 SpecInt95 equivalents Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Tier 4 Physicist workstations Initial Grid driver: High Energy Physics Image courtesy Harvey Newman, Caltech 9

  9. Grids Provide Global ResourcesTo Enable e-Science 10

  10. = Data Transfer = Compute Job Grids can process vast datasets. • Many HEP and Astronomy experiments consist of: • Large datasets as inputs (find datasets) • “Transformations” which work on the input datasets (process) • The output datasets (store and publish) • The emphasis is on the sharing of these large datasets • Workflows of independent program can be parallelized. Mosaic of M42 created on TeraGrid Montage Workflow: ~1200 jobs, 7 levels NVO, NASA, ISI/Pegasus - Deelman et al. 11

  11. For Example:Digital Astronomy • Digital observatories provide online archives of data at different wavelengths • Ask questions such as: what objects are visible in infrared but not visible spectrum?

  12. Virtual Organizations • Groups of organizations that use the Grid to share resources for specific purposes • Support a single community • Deploy compatible technology and agree on working policies • Security policies - difficult • Deploy different network accessible services: • Grid Information • Grid Resource Brokering • Grid Monitoring • Grid Accounting 13

  13. The Grid Middleware Stack (and course modules) Grid Application (M5) (often includes a Portal) Workflow system (explicit or ad-hoc) (M6) Job Management (M2) Data Management (M3) Grid Information Services (M5) Grid Security Infrastructure (M4) Core Globus Services (M1) Standard Network Protocols and Web Services (M1) 14

  14. ComputeCluster ComputeCluster ComputeCluster Grids consist of distributed clusters Grid Client Grid Site 1:Fermilab Grid Storage Grid Service Middleware Application & User Interface Grid Site 2: Sao Paolo Grid Client Middleware Grid Storage Grid Service Middleware GridProtocols Resource, Workflow & Data Catalogs …Grid Site N: UWisconsin Grid Storage Grid Service Middleware 15

  15. Globus and Condor play key roles • Globus Toolkit provides the base middleware • Client tools which you can use from a command line • APIs (scripting languages, C, C++, Java, …) to build your own tools, or use direct from applications • Web service interfaces • Higher level tools built from these basic components, e.g. Reliable File Transfer (RFT) • Condor provides both client & server scheduling • In grids, Condor provides an agent to queue, schedule and manage work submission 16

  16. Provisioning Grid architecture is evolving to a Service-Oriented approach. • Service-oriented Gridinfrastructure • Provision physicalresources to support application workloads ...but this is beyond our workshop’s scope. See “Service-Oriented Science” by Ian Foster. Users Composition • Service-oriented applications • Wrap applications as services • Compose applicationsinto workflows Workflows Invocation ApplnService ApplnService “The Many Faces of IT as Service”, Foster, Tuecke, 2005 17

  17. Workshop Module 2 Job and resource management 18

  18. User GRAM provides a uniform interfaceto diverse cluster schedulers. GRAM LSF VO Condor VO Site A Site C PBS UNIX fork() VO VO Site B Site D Grid 20

  19. DAGMan • Directed Acyclic Graph Manager • DAGMan allows you to specify the dependencies between your Condor jobs, so it can manage them automatically for you. • (e.g., “Don’t run job “B” until job “A” has completed successfully.”)‏

  20. Workshop Module 3 Data Management 22

  21. Data management services provide the mechanisms to find, move and share data • GridFTP • Fast, Flexible, Secure, Ubiquitous data transport • Often embedded in higher level services • RFT • Reliable file transfer service using GridFTP • Replica Location Service • Tracks multiple copies of data for speed and reliability • Storage Resource Manager • Manages storage space allocation, aggregation, and transfer • Metadata management services are evolving 23

  22. Grids replicate data files for faster access • Effective use of the grid resources – more parallelism • Each logical file can have multiple physical copies • Avoids single points of failure • Manual or automatic replication • Automatic replication considers the demand for a file, transfer bandwidth, etc. 24

  23. Workshop Module 4 Grid Security 25

  24. Grid security is a crucial component • Problems being solved might be sensitive • Resources are typically valuable • Resources are located in distinct administrative domains • Each resource has own policies, procedures, security mechanisms, etc. • Implementation must be broadly available & applicable • Standard, well-tested, well-understood protocols; integrated with wide variety of tools 26

  25. Workshop Module 5 National GridCyberinfrastructure 27

  26. Open Science Grid • 50 sites (15,000 CPUs) & growing • 400 to >1000 concurrent jobs • Many applications + CS experiments; includes long-running production operations • Up since October 2003 Diverse job mix www.opensciencegrid.org 28

  27. TeraGrid provides vast resources via a number of huge computing facilities. 29

  28. To efficiently use a Grid, you must locate and monitor its resources. • Check the availability of different grid sites • Discover different grid services • Check the status of “jobs” • Make better scheduling decisions with information maintained on the “health” of sites 30

  29. Virtual Organization (VO) Concept • VO for each application or workload • Carve out and configure resources for a particular use and set of users

  30. OSG Resource Selection Service: VORS 32

  31. Workshop Module 6 Grid Workflow 33

  32. A typical workflow pattern in image analysis runs many filtering apps. Workflow courtesy James Dobson, Dartmouth Brain Imaging Center 34

  33. Swift scripts: Parallelism via foreach type imagefile; (imagefile output) flip(imagefile input) { app { convert "-rotate" "180" @input @output; } } imagefile observations[ ] <simple_mapper; prefix=“orion”>; imagefile flipped[ ] <simple_mapper; prefix=“orion-flipped”>; foreach obs.i in observations { flipped[i] = flip(obs); } Nameoutputsbased on inputs Process alldataset membersin parallel

  34. Conclusion: Why Grids? • New approaches to inquiry based on • Deep analysis of huge quantities of data • Interdisciplinary collaboration • Large-scale simulation and analysis • Smart instrumentation • Dynamically assemble the resources to tackle a new scale of problem • Enabled by access to resources & services without regard for location & other barriers 36

  35. Dr. Gabrielle Allen Center for Computation & Technology Department of Computer Science Louisiana State University gallen@cct.lsu.edu Based on:Grid Intro and Fundamentals Review 37

More Related