1 / 61

Overview of Grid Computing

Overview of Grid Computing. J. Charles Kesler MCNC. Overview. Introduction: Why Grids? Applications for Grids Basic Grid Architecture Grid Platforms Market Segments Examples: Globus, OGSA, AVAKI Building a Grid Project Manager’s View System Administrator’s View

mignon
Download Presentation

Overview of Grid Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Grid Computing J. Charles Kesler MCNC

  2. Overview • Introduction: Why Grids? • Applications for Grids • Basic Grid Architecture • Grid Platforms • Market Segments • Examples: Globus, OGSA, AVAKI • Building a Grid • Project Manager’s View • System Administrator’s View • Example: The North Carolina BioGrid Project • Grid Reference Resources

  3. Why Grids? From the Viewpoint of Research Computing • Researchers are buying clusters • A cluster for every researcher in many cases • Of course, a cluster comes with a non-trivial amount of storage • Computational power is like commodity Internet bandwidth – all readily available capacity will be consumed • But, there is a lot of capacity sitting idle in these cluster islands across organizations • Maintenance of clusters is often done inefficiently • …by someone who would prefer to be doing something other than systems administration

  4. Current State of Research Computing • Researchers are asking IT to… • Host and/or administer compute clusters • Host applications and datasets • Provide update and backup utilities for datasets • Optimize and/or port applications • Provide a front end for simplified access to resources • Provide tools for workflow automation • That is, IT could benefit from a "utility computing" model to deliver services to researchers

  5. Collaboration in the Research Community • Researchers at multiple universities are often working together on the same grants, so they need to share: • Hardware resources • Applications • Data sets • Results • This sharing has to happen across multiple, mutually distrustful administrative domains • The buzzword: Virtual Organization (“VO”)

  6. Grid Computing’s Potential for Research UNC-W Attributes • Single sign-on, security • Policy-based resource sharing Duke NCSU ECSU NCCU Virtual Computers ECU NCArts UNC-G WSSU UNC-C WCU UNC-CH UNC-P Virtual Databases ASU UNC-A WFU FSU NCAT • Unified view of data and computers Computers and data appear to be local • Efficient access to large data sets Caching Replication

  7. Grids According to the Experts “Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources.” From The Anatomy of the Grid by Foster, Kesselman and Tuecke “A grid is all about gathering together resources and making them accessible to users and applications.” Dr. Andrew Grimshaw, CTO Avaki

  8. Grids Are By Definition Heterogeneous • It’s about legacy resources, infrastructure, applications, policies, and procedures • The grid and its administrators must integrate in stealth mode…with • Firewalls • Filesystems • Queuing systems • Grumpy systems administrators • Tried and true applications

  9. What It Means To… • The end user: • Can transparently access resources in multiple VO’s • Can more easily collaborate with other researchers • The IT administrator: • Has a secure framework for implementing distributed resource sharing • Local resource administrators can control access to their resources • The manager: • Sees better utilization of capital resources • Has a tool that helps break down organizational barriers

  10. Challenges in Grid Computing • Reliable performance • Trust relationships between multiple security domains • Deployment and maintenance of grid middleware across hundreds or thousands of nodes • Access to data across WAN’s • Access to state information of remote processes • Workflow / dependency management • Distributed software and license management • Accounting and billing

  11. Overview • Introduction: Why Grids? • Applications for Grids • Basic Grid Architecture • Grid Platforms • Market Segments • Examples: Globus, OGSA, AVAKI • Building a Grid • Project Manager’s View • System Administrator’s View • Example: The North Carolina BioGrid Project • Grid Reference Resources

  12. Applications for a Grid • Generally, apps that work well on clusters can work well on grids • Non-interactive / batch jobs • Parallel computations with minimal interprocess communication and workflow dependencies • Reasonable data transfer requirements • Sensible economics

  13. Non-Interactive / Batch Jobs • Difficult to get a real-time UI for jobs running on the grid • A possible interactive application: spreadsheet computation • Want to take advantage of off-peak free cycles • Jobs run for several days, weeks or months • The user might prefer to be sleeping while the job runs! • Running processes might need to be interrupted or re-prioritized based on the current load on a grid compute engine • Idle thread / “screensaver” computing

  14. Parallel Computations • Application needs to be able to run as multiple, mostly independent pieces • Good Example: Parameter space study • Thousands++ of input files • Processed independently by the same application • Output file generated for each run (corresponding to an input file) • Analysis of the results reported in the output files to find the optimal solution • Need to build workflow management and results analysis tools around the grid-based components

  15. Minimal Interprocess Communications and Dependencies • Can’t depend on the network’s QoS • Can’t rely upon the order of execution and completion • Apps that need these things are better suited for tightly coupled compute platforms (e.g. SMP systems) • Grid can still be useful as a meta-scheduler and data source for such apps • e.g. the user submits the job to the grid queue and asks for the best available SMP resource

  16. Reasonable Data Transfer Requirements • It is usually necessary to “stage” files and executables as part of running a grid job • Data transfer time should be small relative to each component job’s run time • Solution: Caching and replication -- but these are not perfect and can be non-trivial to implement • Another solution: schedule the job where the data is (instead of bringing the data to the job) • Might be required if the data is only licensed for some nodes • But, if instead the application is only licensed to run on particular nodes, then the data has to be brought to where the application is

  17. The Bottom Line: Sensible Economics To Grid or Not To Grid: Productivity Gains > Cost of Building Grid + Opportunity Costs of Resources

  18. Costs: Grid Middleware Architects and Developers User Training Infrastructure Hardware Opportunity Costs Would a big SMP box return better results for your problem? Benefits: Better Utilization of Existing Capital Resources More Efficient Users Ability to complete more work in the same amount of time Performance near or sometimes as good as the big SMP box Some Costs and Benefits

  19. Overview • Introduction: Why Grids? • Applications for Grids • Basic Grid Architecture • Grid Platforms • Market Segments • Examples: Globus, OGSA, AVAKI • Building a Grid • Project Manager’s View • System Administrator’s View • Example: The North Carolina BioGrid Project • Grid Reference Resources

  20. The Single System Model User Interface / API Authentication Authorization Accounting Resource Discovery Process Management Message Passing Data Management Operating System Storage Compute

  21. What Makes a Cluster a Cluster? • Uses a Distributed Resource Manager (DRM) to manager job scheduling • Tightly coupled - High speed, low latency interconnect network • Shared storage for home directories, high throughput scratch space, applications • Fairly homogenous - Configuration management is important! • Single administrative domain • User accounts managed with traditional mechanisms

  22. High Speed Interconnect The Cluster Model Master Node User Interface/API 3A RD PM MP DM Cluster DRM Configuration Management Shared Storage Cluster DRM Cluster DRM Cluster DRM Cluster DRM 3A RD PM MP DM 3A RD PM MP DM 3A RD PM MP DM 3A RD PM MP DM Operating System Operating System Operating System Operating System Storage Compute Storage Compute Storage Compute Storage Compute Cluster Node Cluster Node Cluster Node Cluster Node

  23. How is an Enterprise Grid Different from a Cluster? • Heterogeneous - Clusters, SMP, even workstations of dissimilar configurations, but all are tied together through a grid middleware layer • Lightly coupled - Connected via 100 or 1000Mbps Ethernet • Introduces a resource registry and grid security service • But usually only a single registry and security service for the grid • Not necessarily a single administrative domain

  24. Enterprise LAN or WAN Cluster Interface Cluster Interface Cluster Interface Cluster Interface Cluster Interface Cluster Interface AA AA AA AA AA AA RD RD RD RD RD RD PM PM PM PM PM PM MP MP MP MP MP MP DM DM DM DM DM DM Operating System Operating System Operating System Operating System Operating System Operating System Storage Storage Storage Storage Storage Storage Compute Compute Compute Compute Compute Compute The Enterprise Grid Model User Interface/API 3A RD PM MP DM Grid Interface Resource Registry Security Infrastructure Grid Interface Grid Interface Grid Interface Grid Interface 3A RD PM MP DM 3A RD PM MP DM 3A RD PM MP DM 3A RD PM MP DM Cluster DRM Cluster DRM Operating System Operating System Storage Compute Storage Compute SMP SMP

  25. How is a Global Grid Different from an Enterprise Grid? • "Grid of Grids" - Collection of enterprise grids • Loosely coupled between sites - Not much control over QoS* • Mutually distrustful administrative domains • Multiple grid resource registries and grid security services *Not true for grids in the NCREN network!

  26. WAN LAN LAN LAN The Global Grid Model Site B SMP Cluster Cluster Cluster Site A Grid Grid Grid Grid UI/API Grid RR SI RR SI UI/API Grid Site C Grid Grid Grid Grid SMP SMP Cluster Cluster RR SI UI/API Grid Grid Grid Grid Grid SMP SMP SMP Cluster

  27. Overview • Introduction: Why Grids? • Applications for Grids • Basic Grid Architecture • Grid Platforms • Market Segments • Examples: Globus, OGSA, AVAKI • Building a Grid • Project Manager’s View • System Administrator’s View • Example: The North Carolina BioGrid Project • Grid Reference Resources

  28. Grid Platforms -- Market Segments One Way to Categorize Grids: • Toolkits • Integrated Environments Or Another Way to Look at Grids: • Server Aggregation • Desktop Aggregation

  29. Where Platforms Fit in the Market Integrated Environments • Platform LSFMulti-Cluster • Entropia • United Devices • Data Synapse • Avaki • Parabon • IBM Grid Toolbox • NMI • OGSA Toolkits • Globus • BOINC Desktop Aggregation Server Aggregation

  30. The Early Adopter Market for Grid Technology Private Sector Pharmaceuticals Banking & Finance Energy Mix of Industry and Academia Life Sciences Entertainment Integrated Environments (does anyone want this?) Public Sector Academia Government National Labs Toolkits Desktop Aggregation Server Aggregation

  31. Grid Platform Example: Globus Toolkit V2 • Primary development occurred at Argonne National Labs • Principals were Ian Foster and Carl Kesselman • Open source • But architecture development was a closed process • Toolkit approach: different “bundles” that can be installed depending upon what functions are desired • API through CoG (Commodity Grid) kits • Java, Python, CORBA, Perl, Matlab, Web services, JSP

  32. Globus Toolkit V2 • Majority of its use is in university and government research environments • Some vendors offer value-added versions • IBM Grid Toolbox • Platform Globus • NSF Middleware Initiative (NMI) is packaging pre-built Globus with other relevant components • NWS (Network Weather Service) • KX.509/KCA (Kerberos-X.509 integration) • Condor-G as a “metascheduler” • GSI-enabled OpenSSH

  33. Globus Toolkit V2 “Pillars” Resource Management (GRAM) Information Services (MDS) Data Management (GASS) Grid Security Infrastructure (GSI)

  34. Globus Toolkit V2 Stack GRAM MDS GASS/GridFTP HTTP LDAP FTP GSI TLS/SSL TCP/IP

  35. Globus Toolkit V2 Key Components:GRAM, MDS and GASS • Grid Resource Allocation Manager (GRAM) • Server-side: “gatekeeper” process that controls execution of job managers • Client-side: “globusrun” UI to launch jobs • Monitoring and Directory Service (MDS) • GRIS: Grid Resource Information Service collects local info • GIIS: Grid Index Information Service collects GRIS info • Global Access to Secondary Storage (GASS) • GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command • Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt

  36. Globus Toolkit V2 Key Components:GSI • Uses a TLS/SSL-based PKI infrastructure • All server resources (i.e. gatekeeper, GRIS) and users have a public key that has been digitally signed by the CA (the “certificate”) and a private key • “grid-cert-request” to generate key pair • User/sysadmin sends the public key to CA • CA signs the public key with its private key and returns to the signed certificate to the user/sysadmin • The user/sysadmin stores the signed certificate in the local filesystem • Certificate contains: the subject name, the subject’s public key, the CA’s name, and the CA’s signature

  37. Globus Toolkit V2 Key Components:GSI • Logging in to the grid (“grid-proxy-init”): • User creates a temporary public-private key pair • User’s private key is used to digitally sign the temporary public key -- this becomes the “proxy” certificate • This creates a chain of trust from the CA to the user to the proxy certificate • The proxy certificate and associated private key are transmitted with a job • The proxy certificate can be used to issue commands on remote servers on the user’s behalf (“delegation”) • On remote servers, there is a “grid-mapfile” that maps user cert subject names to local userids

  38. Globus Toolkit V2 Additional Components • Grid Packaging Tools (GPT) • Used to build (“gpt-build”), install (“gpt-install”) and localize (“gpt-postinstall”) Globus components • MPICH-G2 • A Globus V2 enabled version of MPI (Message Passing Interface) • Based on MPICH • Utilizes GSI, MDS and GRAM

  39. Network Grid Node Grid Node Grid Node Grid Node gatekeeper gatekeeper gatekeeper gatekeeper GRIS GRIS GRIS GRIS in.ftpd in.ftpd in.ftpd in.ftpd Globus Toolkit V2 Network Services Client Node GRAM Client GIIS Server Certificate Authority

  40. GRAM, MDS and GASS Interactions GRAM MDS GASS process resource process resource process resource GIIS GridFTP in.ftpd LDAP LDAP job manager GRIS gatekeeper RSL/DUROC/HTTP 1.1 LDAP LDAP gsiftp job allocation job management resource discovery data transfer data control user / proxy Client

  41. Strengths: Mindshare and collaboration in both industry & academia Open source Standards-based underpinnings (e.g. SSL, LDAP) Flexibility and CoG API's Driving OGSA with heavy resource commitment from IBM Weaknesses: Significant effort required to get applications working on a grid Not production quality at this time No “metascheduler” -- user has to explicitly tell their jobs where to run Globus Toolkit V2 Strengths and Weaknesses

  42. Grid Platform Example: OGSA • Merging Grid and Web Services technologies • Developing open standards for grid computing • Sponsored by the GGF (organization modeled after IETF) • Primary working groups: OGSA and OGSI • Many vendors involved: IBM, Sun, Oracle, AVAKI, UD, etc… • (But, ANL and IBM seem to have the upper hand) • Working with the W3C to extend web services • Still in alpha / early beta form • Will be open source and commercial implementations • Open source: Globus 3. • Commercial: IBM (Websphere), AVAKI, UD, etc…

  43. Some Key OGSA Concepts • Grid Service Handle (GSH) • GSH is a globally unique name assigned to every resource • Does not contain any protocol or instance specific information such as network address • Grid Service Reference (GSR) • Contains the instance-specific information (e.g. network address) • Only valid for a limited lifespan • Factory • Creates and manages grid services per user request • Returns the GSH and GSR for a new instance

  44. OGSA / Globus 3.0 Preview Release • Implementation of the Grid Service Specification • Built on top of Apache Axis and Java CoG • Based in J2EE environment, Limited .NET and C support at this point • Globus Toolkit 3.0 expected release • Alpha - Jan 13, 2003 @ GlobusWorld • Final – June 2003

  45. OGSA / Globus 3.0 Stack GRAM MDS GASS/GridFTP Grid Services Abstraction SOAP + GSI TLS/SSL Other Transports TCP/IP

  46. OGSA Example Registry GSH GSR User A Mapping Service Application Factory Service Auth Factory Service Request to Create Auth Service User B Request to Auth User Application Service Instance Auth Service Instance User Auth Info

  47. Grid Platform Example: AVAKI • Original technology came from the Legion project at UVa (which was also used as part of NPACI); principal is Andrew Grimshaw (now CTO) • Integrated solution - load and run • Object-oriented architecture • Data Grid (v3.0) - new architecture meant as the stepping stone to OGSA; implemented with J2EE • Compute Grid (v2.6) - latest release of Legion-based technology; has compute and data grid integrated • Comprehensive Grid: 3.0 Data + 2.6 Compute Grids

  48. AVAKI 3.0 Data Grid Architecture Avaki Domain Controller AVAKI Domain Controller LDAP (User Info) interconnect Other grids Grid Server (metadata) Grid Server (metadata) Data Access Server (NFS) /dmf/edu /data/ncbi /home/edu /data/riceblast /grid /grid/dmf/edu /grid/home/edu /grid/data /grid/data/ncbi /grid/data/riceblast Share Server Share Server Share Server Share Server /dmf/edu /local/data /home/edu /local/data

  49. Strengths Vendor support Easy to deploy Data grid Comprehensiveness Works through firewalls (w/ its Proxy server) Moving towards OGSA Weaknesses Vendor is a relatively small company Doesn't have significant mindshare Currently does not publish its API's AVAKI Strengths and Weaknesses

  50. Overview • Introduction: Why Grids? • Applications for Grids • Basic Grid Architecture • Grid Platforms • Market Segments • Examples: Globus, OGSA, AVAKI • Building a Grid • Project Manager’s View • System Administrator’s View • Example: The North Carolina BioGrid TestBed • Grid Reference Resources

More Related