1 / 41

Grid Computing 1

Grid Computing 1. Grid Book, Chapters 1, 2, 3, 22 “Implementing Distributed Synthetic Forces Simulations in Metacomputing Environments” Brunett, Davis, Gottschalk, Messina, Kesselman http://www.globus.org. Outline. What is Grid computing? Grid computing applications Grid computing history

jerrygarcia
Download Presentation

Grid Computing 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Computing 1 Grid Book, Chapters 1, 2, 3, 22 “Implementing Distributed Synthetic Forces Simulations in Metacomputing Environments” Brunett, Davis, Gottschalk, Messina, Kesselman http://www.globus.org CSE 160/Berman

  2. Outline • What is Grid computing? • Grid computing applications • Grid computing history • Issues in Grid Computing • Condor, Globus, Legion • The next step CSE 160/Berman

  3. What is Grid Computing? • Computational Gridis a collection of distributed, possibly heterogeneous resources which can be used as an ensemble to execute large-scale applications • Computational Grid also called “metacomputer” CSE 160/Berman

  4. Computational Grids • Term computational grid comes from an analogy with the electric power grid: • Electric power is ubiquitous • Don’t need to know the source (transformer, generator) of the power or the power company that serves it • Analogy falls down in the area of performance • Ever-present search for cycles in HPC. Two foci of research • “In the box” parallel computers -- PetaFLOPS architectures • Increasing development of infrastructure and middleware to leverage the performance potential of distributed Computational Grids CSE 160/Berman

  5. Grid Applications • Distributed Supercomputing • Distributed Supercomputing applications couple multiple computational resources – supercomputers and/or workstations • Examples include: • SFExpress (large-scale modeling of battle entities with complex interactive behavior for distributed interactive simulation) • Climate Modeling (high resolution, long time scales, complex models) CSE 160/Berman

  6. Distributed Supercomputing Example – SF Express • SF Express = (Synthetic Forces Express) large scale distributed simulation of behavior and movement of entities (tanks, trucks, airplanes, etc.) for interactive battle simulation. • Entities require information about • State of terrain • Location and state of other entities • Info updated several times a second • Interest management allows entities to only look at relevant information, enabling scalability CSE 160/Berman

  7. SF Express • Large scale SF Express run goals • Simulation of 50,000 entities in 8/97, 100,000 entries in 3/98 • Increase fidelity and resolution of simulation over previous runs • Improve • Refresh rate • Training environment responsiveness • Number of automatic behaviors • Ultimately use simulation for real-time planning as well as training • Large scale runs extremely resource-intensive CSE 160/Berman

  8. SF Express Programming Issues • How should entities be mapped to computational resources? • Entities receive information based on “interests” • Communication reduced and localized based on “interest management” • Consistency model for entity information must be developed • Which entities can/should be replicated? • How should updates be performed? CSE 160/Berman

  9. R R R R I I I S S S S S S S S S S S S D D D S S S SF Express Distributed Application Architecture • D = data server, I = interest management, R = router, S = simulation node CSE 160/Berman

  10. 50,000 entity SF Express Run • 2 large-scale simulations run on August 11, 1997 CSE 160/Berman

  11. 50,000 entity SF Express Run • Simulation decomposed terrain (Saudi Arabia, Kuwait, Iraq) contiguously among supercomputers • Each supercomputer simulated a specific area and exchanged interest and state information with other supercomputers • All data exchanges were flow-controlled • Supercomputers fully interconnected, dedicated for experiment • Success depended on “moderate to significant system administration, interventions, competent system support personnel, and numerous phone calls.” • Subsequent Globus runs focused on improving data, control management and operational issues for wide area CSE 160/Berman

  12. High-Throughput Applications • Grid used to schedule large numbers of independent or loosely coupled tasks with the goal of putting unused cycles to work • High-throughput applications include RSA keycracking, Seti@home (detection of extra-terrestrial intelligence), MCell CSE 160/Berman

  13. High-Throughput Applications • Biggest master/slave parallel program in the world with master = website, slaves = individual computers CSE 160/Berman

  14. High-Throughput Example - MCell • MCell – Monte Carlo simulation of cellular microphysiology. Simulation implemented as large-scale parameter sweep. CSE 160/Berman

  15. MCell • MCell architecture: simulations performed by independent processors with distinct parameter sets and shared input files CSE 160/Berman

  16. MCell Programming Issues • How should we assign tasks to processors to optimize locality? • How can we use partial results during execution to steer the computation? • How do we mine all the resulting data from experiments for results • During execution • After execution • How can we use all available resources? CSE 160/Berman

  17. Data-Intensive Applications • Focus is on synthesizing new information from large amounts of physically distributed data • Examples include NILE (distributed system for high energy physics experiments using data from CLEO), SAR/SRB applications (Grid version of MS Terraserver), digital library applications CSE 160/Berman

  18. Data-Intensive Example - SARA • SARA = Synthetic Aperture Radar Atlas • application developed at JPL and SDSC • Goal:Assemble/process files for user’s desired image • Radar organized into tracks • User selects track of interestand properties to be highlighted • Raw data is filtered and converted to an image format • Image displayed in web browser

  19. Data Servers Compute Servers Client Computation servers and data servers are logical entities, not necessarily different nodes . . . SARA Application Architecture • Application structure focused around optimizing the delivery and processing of distributed data

  20. OGI UTK UCSD SARA Programming Issues • Which data server should replicated data be accessed from? • Should computation be done at the data server or data moved to a compute server or something in between? • How big are the data files and how often will they be accessed? AppLeS/NWS

  21. TeleImmersion • Focus is on use of immersive virtual reality systems over a network • Combines generators, data sets and simulations remote from user’s display environment • Often used to support collaboration • Examples include • Interactive scientific visualization (“being there with the data”), industrial design, art and entertainment CSE 160/Berman

  22. Teleimmersion Example – Combustion System Modeling • A shared collaborative space • Link people at multiple locations • Share and steer scientific simulations on supercomputer • Combustion code developed by Lori Freitag at ANL • Boiler application used to troubleshoot and design better products Chicago San Diego CSE 160/Berman

  23. Early Experiences with Grid Computing • Gigabit Testbeds Program • Late 80’s, early 90’s, gigabit testbed program was developed as joint NSF, DARPA, CNRI (Corporation for Networking Research, Bob Kahn) initiative • Goals were to • investigate potential architecture for a gigabit/sec network testbed • explore usefulness for end-users CSE 160/Berman

  24. Gigabit Testbeds –Early 90’s • 6 testbeds formed: • CASA (southwest) • MAGIC (midwest) • BLANCA (midwest) • AURORA (northeast) • NECTAR (northeast) • VISTANET (southeast) • Each had a unique blend of research in applications and in networking and computer science research CSE 160/Berman

  25. Gigabit Testbeds CSE 160/Berman

  26. Gigabit Testbeds CSE 160/Berman

  27. I-Way • First large-scale “modern” Grid experiment • Put together for SC’95 (the “Supercomputing” Conference) • I-Way consisted of a Grid of 17 sites connected by vBNS • Over 60 applications ran on the I-WAY during SC’95 CSE 160/Berman

  28. I-Way “Architecture” • Each I-WAY site served by an I-POP (I-WAY Point of Presence) used for • authentication of distributed applications • distribution of associated libraries and other software • monitoring the connectivity of the I-WAY virtual network • Users could use single authentication and job submission across multiple sites or they could work directly with end-users • Scheduling done with a “human-in-the-loop” CSE 160/Berman

  29. I-Soft – Software for I-Way • Kerberos based authentication • I-POP initiated rsh to local resources • AFS for distribution of software and state • Central scheduler • Dedicated I-WAY nodes on resource • Interface to local scheduler • Nexus based communication libraries • MPI, CaveComm, CC++ • In many ways, I-Way experience formed foundation of Globus CSE 160/Berman

  30. SPRINT I-Way Application: Cloud Detection • Cloud detection from multimodal satellite data • Want to determine if satellite image is clear, partially cloudy or completely cloudy • Used remote supercomputer to enhance instruments with • Real-time response • Enhanced function, accuracy (of pixel image) • Developed by C. Lee, Aerospace Corporation, Kesselman, Caltech et al. CSE 160/Berman

  31. PACIs • 2 NSF Supercomputer Centers (PACIs) – SDSC/NPACI and NCSA/Alliance, both committed to Grid computing • vBNS backbone between NCSA and SDSC running at OC-12 with connectivity to over 100 locations at speeds ranging from 45 Mb/s to 155 Mb/s or more CSE 160/Berman

  32. PACI Grid CSE 160/Berman

  33. NPACI Grid Activities • Metasystems Thrust Area one of the NPACI technology thrust areas • Goal is to create an operational metasystems for NPACI • Metasystems players: • Globus (Kesselman) • Legion (Grimshaw) • AppLeS (Berman and Wolski) • Network Weather Service (Wolski) CSE 160/Berman

  34. Alliance Grid Activities • Grid Task Force and Distributed Computing team are Alliance teams • Globus supported as exclusive grid infrastructure by Alliance • Grid concept pervasive throughout Alliance • Access Grid developed for use by distributed collaborative groups • Allliance grid players include Foster (Globus), Livny (Condor), Stevens (ANL), Reed (Pablo), etc. CSE 160/Berman

  35. Other Efforts • Centurion Cluster = Legion testbed • Legion cluster housed at UVA • 128 533 MHz Dec Alphas • 128 Dual 400 MHz Pentium2 • Fast ethernet and myrinet • Globus testbed = GUSTO which supports Globus infrastructure and application development • 125 sites in 23 countries as of 2/2000 • Testbed aggregated from partner sites (including NPACI) CSE 160/Berman

  36. GUSTO (Globus) Computational Grid CSE 160/Berman

  37. IPG • IPG = Information Power Grid • NASA effort in grid computing • Globus supported as underlying infrastructure • Application focus include aerospace design, environmental and space applications CSE 160/Berman

  38. Research and Development Foci for the Grid • Applications • Questions revolve around design and development of “Grid-aware” applications • Different programming models: polyalgorithms, components, mixed languages, etc. • Program development environment and tools required for development and execution of performance-efficient applications Applications Middleware Infrastructure Resources CSE 160/Berman

  39. Research and Development Foci for the Grid • Middleware • Questions revolve around the development of tools and environments which facilitate application performance • Software must be able to assess and utilize dynamic performance characteristics of resources to support application • Agent-based computing and resource negotiation Applications Middleware Infrastructure Resources CSE 160/Berman

  40. Research and Development Foci for the Grid • Infrastructure • Development of infrastructure that presents a “virtual machine” view of the Grid to users • Questions revolve around providing basic services to user: security, remote file transfer, resource management, etc., as well as exposing performance characteristics. • Services must be supported by heterogeneous and interoperate Applications Middleware Infrastructure Resources CSE 160/Berman

  41. Research and Development Foci for the Grid • Resources • Questions revolve around heterogeneity and scale. • New challenges focus on combining wireless and wired, static and dynamic, low-power and high-power, cheap and expensive resources • Performance characteristics of grid resources vary dramatically, integrating them to support performance of individual and multiple applciations extremely challenging Applications Middleware Infrastructure Resources CSE 160/Berman

More Related