1 / 41

From Clusters to Grids

From Clusters to Grids. October, 2003 – Linkoping, Sweden Andrew Grimshaw Department of Computer Science, Virginia CTO & Founder Avaki Corporation. Agenda. Grid Computing Background Legion Existing Systems & Standards Summary. Grid Computing. First: What is a Grid System?.

jack
Download Presentation

From Clusters to Grids

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Clusters to Grids October, 2003 – Linkoping, Sweden Andrew Grimshaw Department of Computer Science, Virginia CTO & Founder Avaki Corporation

  2. Agenda • Grid Computing Background • Legion • Existing Systems & Standards • Summary

  3. Grid Computing

  4. First: What is a Grid System? A Grid system is a collection of distributed resources connected by a network Examples of Distributed Resources: • Desktop • Handheld hosts • Devices with embedded processing resources such as digital cameras and phones • Tera-scale supercomputers

  5. What is a Grid? A grid is all about gathering together resources and making them accessible to users and applications. A grid enables users to collaborate securely by sharing processing, applications, and data across heterogeneous systems and administrative domains for collaboration, faster application execution and easier access to data. • Compute Grids • Data Grids

  6. What are the characteristics of a Grid system? Numerous Resources Ownership by Mutually Distrustful Organizations & Individuals Connected by Heterogeneous, Multi-Level Networks Different Security Requirements & Policies Required Different Resource Management Policies Potentially Faulty Resources Geographically Separated Resources are Heterogeneous

  7. What are the characteristics of a Grid system? Numerous Resources Ownership by Mutually Distrustful Organizations & Individuals Connected by Heterogeneous, Multi-Level Networks Different Security Requirements & Policies Required Different Resource Management Policies Potentially Faulty Resources Geographically Separated Resources are Heterogeneous

  8. Success requires an integrated solution AND flexible policy Technical Requirements of a Successful Grid Architecture • Simple • Secure • Scalable • Extensible • Site Autonomy • Persistence & I/O • Multi-Language • Legacy Support • Single Namespace • Transparency • Heterogeneity • Fault-tolerance & Exception Management Manage Complexity!!

  9. Implication:Complexity is THE Critical Challenge How should complexity be addressed?

  10. High High Low Low Sockets & Shells Integrated Solution A low-level or “socket & shell” approach is low in robustness & high in time and cost to develop. An integrated approach is high in robustness and low in time and cost to develop. High-level versus low-level solutions As Application Complexity Increases, Differences Between the Systems Increase Dramatically High Time & Cost Robustness Low Low High

  11. The Importance of Integration in a Grid Architecture • If separate pieces are used, then the programmer must integrate the solutions. • If all the pieces are not present, then the programmer must develop enough of the missing pieces to support the application. Bottom Line: Both raise the bar by putting the cognitive burden on the programmer.

  12. Misconceptions about Grids • Simple cycle aggregation • State of the state is essentially scheduling and queuing for CPU cluster management • These definitions are selling short the promise of Grid technology • AVAKI believes grids are not just about aggregating and scheduling CPU cycles but also … • Virtualizing many types of resources, internally and across domains • Empowering anyone to have secure access to any and all resources through easy administration

  13. Compute Grids Categories • Sons of SETI@home • United Devices, Entropia, Data Synapse • Low-end, desktop cycle aggregation • Hard sell in corporate America • Cluster Load Management • LSF, PBS, SGE • High end, great for management of local clusters but not well proven in multi-cluster environments • As soon as you go outside of the local cluster to cross-domain multi-cluster, the game changes dramatically with the introduction of three major issues: • Data • Security • Administration To address these issues, you need a fully-integrated solution, or a toolkit to build one

  14. Typical Grid Scenarios Global Grids • Multiple enterprises, owners, platforms, domains, file systems, locations, and security policies • Legion, Avaki, Globus Enterprise Grids • Single enterprise; multiple owners, platforms, domains, file systems, locations, and security policies • SUN SGE EE, Platform Multi-cluster Cluster & Departmental Grids • Single owner, platform, domain, file system and location • SUN SGE, Platform LSF, PBS Desktop Cycle Aggregation • Desktop only • United Devices, Entropia, Data Synapse

  15. What are grids being used for today? • Multiple sites with multiple data sources (public and private) • Need secure access to data and applications for sharing • Have partnership relationships with other organizations: internal, partners, or customers • Computationally challenging applications • Distributed R&D groups across company, networks and geographies • Staging large files • Want to utilize and leverage heterogeneous compute resources • Need for accounting of resources • Need to handle multiple queuing systems • Considering purchasing compute cycles for spikes in demand

  16. Legion

  17. Legion Grid Capabilities • Wide-area data access • Distributed processing • Global naming • Policy-based administration • Resource accounting • Fine-grained security • Automatic failure detection and recovery Legion Grid Software Users Applications Legion G R I D Wide-area access to data, processing and application resources in a single, uniform operating environment that is secure and easy to administer Load Mgmt & Queuing Load Mgmt & Queuing Server Data Application Application Desktop Server Data Cluster Server Data Department A Partner Department B Vendor

  18. Legion Combines Data and Compute Grid sequence_a HQ-1 PM - 1 R D - 2 sequence_c Users Applications Legion G R I D Compute Data Load Mgmt & Queuing Load Mgmt & Queuing Server Data Application Application Desktop Server Data Cluster Server Data Department A Partner Department B Vendor

  19. The Legion Data Grid

  20. Data Grid Capabilities • Federates multiple data sources • Provides global naming • Works with local and virtual file systems – NFS, XFS, CIFS • Accesses data in DAS, NAS, SAN • Uses standard interfaces • Caches data locally Data Grid Users Applications Legion G R I D Wide-area access to data at its source location based on business policies, eliminating manual copying and errors caused by accessing out-of-date copies Server Application Data Application Desktop Server Data Cluster Server Data Department A Partner Department B Vendor

  21. Data Grid Share Legion Data Grid transparently handles client and application requests, maps them to the global namespace, and returns the data Users Applications Data mapped to Grid namespace via Legion ExportDir Linux NT Solaris Solaris Headquarters Informatics Partner Research Center Tools Vendor

  22. Access Point sequence_a sequence_c Fine-grained Security Data Grid Access • Access files using • standard NFS • protocol or Legion • commands • -NFS security issues eliminated • - Caches exploit semantics • Access files using • global name • Access based on • specified privileges Users Applications Data sequence_a PM-1 Cluster HQ - 1 App_A Server RD - 2 sequence_c sequence_b BLAST Cluster Headquarters Informatics Partner Research Center Tools Vendor

  23. Legion-NFS sequence_a sequence_c Fine-grained Security Data Grid Access using virtual NFS • Complexity = Servers + Clients • Clients mount grid • Servers share files to grid • Clients access data using • NFS protocol • Wide-area access to data • outside administrative • domain Data sequence_a sequence_c Department A Department B Partner

  24. / a b c d e f g h Keeping Data in the grid • Legion storage servers • Data is copied into Legion storage servers that execute on a set of hosts. • The particular set hosts used is a configuration option - here five hosts are used • Access to the different files is completely independent and asynchronous • Very high sustained read/write bandwidth is possible using commodity resources Local Disk Local Disk Local Disk Local Disk Local Disk

  25. I/O Performance Read performance in NFS, Legion-NFS, and Legion I/Olibraries. The x axis indicates the number of clients that simultaneously perform 1 MB reads on 10 MB files, and the y axis indicates total read bandwidth. All results are the average of multiple runs. All clients on 400 MHZ Intel’s, NFS server on 800 MHZ Intel server.

  26. Data Grid Benefits • Easy, convenient, wide-area access to data – regardless of location, administrative domain or platform • Eliminates time-consuming copying and obtaining accounts on machines where data resides • Provides access to the most recent data available • Eliminates confusion and errors caused by inconsistent naming of data • Caches remote data for improved performance • Requires no changes to legacy or commercial applications • Protects data with fine-grained security and limits access privileges to those required • Eases data administration and management • Eases migration to new storage technologies

  27. The Legion Compute Grid

  28. Compute Grid Capabilities • Job scheduling and priority-based queuing • Easy integration with third party load management and queuing software • Automatic staging of data and applications • Efficient processing of both sequential and parallel applications • Failure detection and recovery • Usage accounting Compute Grid Users Applications Legion G R I D Wide-area access to processing resources based on business policies, managing utilization of processing resources for fast, efficient job completion Server Application Data Application Desktop Server Application Cluster Server Data Department A Partner Department B Vendor

  29. Login/Submission Login/Submission sequence_a HQ-1 PM - 1 R D - 2 sequence_c Fine-grained Security Compute Grid Access • The grid: • Locates resources • Authenticates and grants access privileges • Stages applications and data • Detects failures and recovers • Writes output to specified location • Accounts for usage Users Applications Application Compute Data App_A BLAST Scheduling, Queuing, Usage Management, Accounting, Recovery NT Server PM-1 Data Cluster HQ - 1 App_A Solaris Server RD - 2 Data Data BLAST Linux Cluster Headquarters Informatics Partner Research Center Tools Vendor

  30. MPI P-space studies - multi-run Parallel C++ Parallel object-based Fortran CORBA binding Object migration Accounting legion_make - remote builds Fault-tolerant MPI libraries post-mortem debugger “console” objects parallel 2D file objects Collections Tools - All are cross-platform

  31. One Favorite

  32. Related Work

  33. Related Work • Avaki • All distributed systems literature • Globus • AFS/DFS • LSF, PBS, …. • Global Grid Forum - OGSA

  34. Avaki Company Background • Grid Pioneers - a Legion spin-off • Over $20M capitalization • The only commercial grid software provider with a solution that addresses data access, security, and compute power challenges • Standards efforts leader Standards Organizations Customers Partners

  35. AFS/DFS comparison with Legion Data Grid • AFS presumes that all files kept in AFS - no federation with other file systems. Legion allows data to be kept in Legion, or in an NFS, XFS, PFS, or Samba file system. • AFS presumes all sites using Kerberos and that realms “trust” each other - Legion assumes nothing about local authentication mechanism and there is no need for cross-realm trust • AFS semantics are fixed - copy on open - Legion can support multiple semantics. Default is Unix semantics. • AFS volume oriented (sub-tree’s) - Legion can be volume oriented or file oriented • AFS caching semantics not extensible - Legion caching semantics are extensible

  36. Legion & Globus GT2 • Projects with many common goals: • Metacomputing (or the “Grid”) • Middleware for wide-area systems • Heterogeneous resource sets • Disjoint administrative domains • High-performance, large-scale applications

  37. Legion Specific Goals • Shared collaborative environment including shared file system • Fault-tolerance and high-availability • Both HPC applications and distributed applications • Complete security model including access control • Extensible • Integrated - create a meta-operating system

  38. Many “Similar” Features • Resource Management Support • Message-passing libraries • e.g., MPI • Distributed I/O Facilities • Globus GASS/remote I/O vs. Avaki Data Grid • Security Infrastructure

  39. Globus • The “toolkit” approach • Provide services as separate libraries • E.g. Nexus, GASS, LDAP • Pros: • Decoupled architecture • easy to add new services into the mix • Low buy-in: use only what you like! • In practice all the pieces use each other • Cons: • No unifying abstractions • very complex environment to learn in full • composition of services difficult as number of services grows • Interfaces keep changing due to ever evolving design • Does not cover space of problems

  40. Standards: GGF Background: • Grid standards are now being developed at the Global Grid Forum (GGF) • In-development standard, Open Grid Services Infrastructure (OGSI) will extend Web Services (SOAP/XML, WSDL, etc.) • Names and a two level name scheme • Factories and lifetime management • Mandatory set of interfaces, e.g., discovery interfaces • OGSA – Open Grid Services Architecture • Over-arching architecture • Still in development

  41. Summary • Grids are about resource federation and sharing • Grids are here today. They are being used in production computing in industry to solve real problems and provide real value. • Compute Grids • Data Grids • We believe that users want high-level abstractions - and don’t want to think about the grid. • Need low activation energy and legacy support • There are a number of challenges to be solved - and different applications and organizations want to solve them differently • Policy heterogeneity • Strong separation of policy and mechanism • Several areas where really good policies are still lacking • Scheduling • Security and security policy interactions • Failure recovery (and the interaction of different policies)

More Related