1 / 48

Grid Computing

Grid Computing. Hakan ÜNLÜ CMPE 511 Presentation Fall 2004. Overview. General Introduction to Grid Computing Introduction: Why Grids? Applications for Grids Basic Grid Architecture Grid Platforms & Standarts Issues in Grid Computing Hardware: Blade Computers

bayard
Download Presentation

Grid Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

  2. Overview • General Introduction to Grid Computing • Introduction: Why Grids? • Applications for Grids • Basic Grid Architecture • Grid Platforms & Standarts • Issues in Grid Computing • Hardware: Blade Computers • System Management : Globus Toolkit • Software: Scheduling

  3. What is Grid Computing? • Computational and Networking Infrastructure that is designed to provide pervasive, uniform and reliable access to data, computational and human resources distributed over wide area environments

  4. Grids Are By Definition Heterogeneous • It’s about legacy resources, infrastructure, applications, policies, and procedures • The grid and its administrators must integrate in stealth mode…with • Firewalls • Filesystems • Queuing systems • Grumpy systems administrators • Tried and true applications

  5. A Grid Example

  6. Challenges in Grid Computing • Reliable performance • Trust relationships between multiple security domains • Deployment and maintenance of grid middleware across hundreds or thousands of nodes • Access to data across WAN’s • Access to state information of remote processes • Workflow / dependency management • Distributed software and license management • Accounting and billing

  7. Applications for a Grid • Generally, apps that work well on clusters can work well on grids • Non-interactive / batch jobs • Parallel computations with minimal interprocess communication and workflow dependencies • Reasonable data transfer requirements • Sensible economics • Productivity Gains > Cost of Building Grid + Opportunity Costs of Resources

  8. Non-Interactive / Batch Jobs • Difficult to get a real-time UI for jobs running on the grid • A possible interactive application: spreadsheet computation • Want to take advantage of off-peak free cycles • Jobs run for several days, weeks or months • The user might prefer to be sleeping while the job runs! • Running processes might need to be interrupted or re-prioritized based on the current load on a grid compute engine • Idle thread / “screensaver” computing

  9. Seti@Home

  10. Parallel Computations • Application needs to be able to run as multiple, mostly independent pieces • Can’t depend on the network’s Quality of Service • Can’t rely upon the order of execution and completion • Apps that need these things are better suited for tightly coupled compute platforms (e.g. SMP systems) • Grid can still be useful as a meta-scheduler and data source for such apps • e.g. the user submits the job to the grid queue and asks for the best available SMP resource

  11. Costs: Grid Middleware Architects and Developers User Training Infrastructure Hardware Opportunity Costs Would a big SMP box return better results for your problem? Benefits: Better Utilization of Existing Capital Resources More Efficient Users Ability to complete more work in the same amount of time Performance near or sometimes as good as the big SMP box Some Costs and Benefits

  12. Basic Grid Architecture • Clusters and how grids are different than clusters • Departmental Grid Model • Enterprise Grid Model • Global Grid Model

  13. What Makes a Cluster a Cluster? • Uses a Distributed Resource Manager (DRM) to manager job scheduling • Tightly coupled - High speed, low latency interconnect network • Fairly homogenous - Configuration management is important! • Single administrative domain

  14. High Speed Interconnect The Cluster Model Master Node User Interface/API 3A RD PM MP DM Cluster DRM Configuration Management Shared Storage Cluster DRM Cluster DRM Cluster DRM Cluster DRM 3A RD PM MP DM 3A RD PM MP DM 3A RD PM MP DM 3A RD PM MP DM Operating System Operating System Operating System Operating System Storage Compute Storage Compute Storage Compute Storage Compute Cluster Node Cluster Node Cluster Node Cluster Node

  15. How is an Enterprise Grid Different from a Cluster? • Heterogeneous - Clusters, SMP, even workstations of dissimilar configurations, but all are tied together through a grid middleware layer • Lightly coupled - Connected via 100 or 1000Mbps Ethernet • Introduces a resource registry and grid security service • But usually only a single registry and security service for the grid • Not necessarily a single administrative domain

  16. Enterprise LAN or WAN Cluster Interface Cluster Interface Cluster Interface Cluster Interface Cluster Interface Cluster Interface AA AA AA AA AA AA RD RD RD RD RD RD PM PM PM PM PM PM MP MP MP MP MP MP DM DM DM DM DM DM Operating System Operating System Operating System Operating System Operating System Operating System Storage Storage Storage Storage Storage Storage Compute Compute Compute Compute Compute Compute The Enterprise Grid Model User Interface/API 3A RD PM MP DM Grid Interface Resource Registry Security Infrastructure Grid Interface Grid Interface Grid Interface Grid Interface 3A RD PM MP DM 3A RD PM MP DM 3A RD PM MP DM 3A RD PM MP DM Cluster DRM Cluster DRM Operating System Operating System Storage Compute Storage Compute SMP SMP

  17. How is a Global Grid Different from an Enterprise Grid? • "Grid of Grids" - Collection of enterprise grids • Loosely coupled between sites - Not much control over Quality of Service • Mutually distrustful administrative domains • Multiple grid resource registries and grid security services

  18. WAN LAN LAN LAN The Global Grid Model Site B SMP Cluster Cluster Cluster Site A Grid Grid Grid Grid UI/API Grid RR SI RR SI UI/API Grid Site C Grid Grid Grid Grid SMP SMP Cluster Cluster RR SI UI/API Grid Grid Grid Grid Grid SMP SMP SMP Cluster

  19. Grid Platforms & Standards • The Global Grid Forum • http://www.gridforum.org/ • Globus Toolkit • DCML (Data Center Markup Language)

  20. Globus Toolkit V2 “Pillars” Resource Management (GRAM) Information Services (MDS) Data Management (GASS) Grid Security Infrastructure (GSI)

  21. Globus Toolkit V2 Stack GRAM MDS GASS/GridFTP HTTP LDAP FTP GSI TLS/SSL TCP/IP

  22. Globus Toolkit V2 Key Components:GRAM, MDS and GASS • Grid Resource Allocation Manager (GRAM) • Server-side: “gatekeeper” process that controls execution of job managers • Client-side: “globusrun” UI to launch jobs • Monitoring and Directory Service (MDS) • GRIS: Grid Resource Information Service collects local info • GIIS: Grid Index Information Service collects GRIS info • Global Access to Secondary Storage (GASS) • GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command • Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt

  23. Globus Toolkit V2 Additional Components • Grid Packaging Tools (GPT) • Used to build (“gpt-build”), install (“gpt-install”) and localize (“gpt-postinstall”) Globus components • MPICH-G2 • A Globus V2 enabled version of MPI (Message Passing Interface) • Based on MPICH • Utilizes GSI, MDS and GRAM

  24. Network Grid Node Grid Node Grid Node Grid Node gatekeeper gatekeeper gatekeeper gatekeeper GRIS GRIS GRIS GRIS in.ftpd in.ftpd in.ftpd in.ftpd Globus Toolkit V2 Network Services Client Node GRAM Client GIIS Server Certificate Authority

  25. GRAM, MDS and GASS Interactions GRAM MDS GASS process resource process resource process resource GIIS GridFTP in.ftpd LDAP LDAP job manager GRIS gatekeeper RSL/DUROC/HTTP 1.1 LDAP LDAP gsiftp job allocation job management resource discovery data transfer data control user / proxy Client

  26. Strengths: Mindshare and collaboration in both industry & academia Open source Standards-based underpinnings (e.g. SSL, LDAP) Flexibility and CoG API's Driving OGSA with heavy resource commitment from IBM Weaknesses: Significant effort required to get applications working on a grid Not production quality at this time No “metascheduler” -- user has to explicitly tell their jobs where to run Globus Toolkit V2 Strengths and Weaknesses

  27. Issues inGrid Computing Hardware : Blades

  28. Hardware Trends • HW Trends that enable Grids and Distributed Processing • There is a lot of idle computing power • Computers are now better connected • There are many different brands and configurations in any environment • And Distributed Computing that give rise to new HW architectures • Blade Computers

  29. What is a blade? • Inclusive chassis-based modular computing system that includes processors, memory, network interface cards and local storage on a single board. Blade Blade Farm Blade Chasis & Blades

  30. Anatomy of a blade

  31. How far it can go?

  32. Low Cost (power, heat, data center space) Physical Server Consolidation (Save space, eliminate cables) High Availability Integrated Systems Management Not suitable in small numbers Need for standardization (for network connection and management) Advantages & Disadvantages

  33. Blades & Grid • Each blade is a server that can run jobs. • Blades can be used to form clusters or grids. • With efficient management different configurations of blades can be used in a single grid computer. • Easy to expand • Protects investment

  34. Issues inGrid Computing System Management : Globus Toolkit

  35. Globus Toolkit V2 “Pillars” Resource Management (GRAM) Information Services (MDS) Data Management (GASS) Grid Security Infrastructure (GSI)

  36. Globus Toolkit V2 Stack GRAM MDS GASS/GridFTP HTTP LDAP FTP GSI TLS/SSL TCP/IP

  37. Globus Toolkit V2 Key Components:GRAM, MDS and GASS • Grid Resource Allocation Manager (GRAM) • Server-side: “gatekeeper” process that controls execution of job managers • Client-side: “globusrun” UI to launch jobs • Monitoring and Directory Service (MDS) • GRIS: Grid Resource Information Service collects local info • GIIS: Grid Index Information Service collects GRIS info • Global Access to Secondary Storage (GASS) • GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command • Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt

  38. Globus Toolkit V2 Additional Components • Grid Packaging Tools (GPT) • Used to build (“gpt-build”), install (“gpt-install”) and localize (“gpt-postinstall”) Globus components • MPICH-G2 • A Globus V2 enabled version of MPI (Message Passing Interface) • Based on MPICH • Utilizes GSI, MDS and GRAM

  39. Network Grid Node Grid Node Grid Node Grid Node gatekeeper gatekeeper gatekeeper gatekeeper GRIS GRIS GRIS GRIS in.ftpd in.ftpd in.ftpd in.ftpd Globus Toolkit V2 Network Services Client Node GRAM Client GIIS Server Certificate Authority

  40. GRAM, MDS and GASS Interactions GRAM MDS GASS process resource process resource process resource GIIS GridFTP in.ftpd LDAP LDAP job manager GRIS gatekeeper RSL/DUROC/HTTP 1.1 LDAP LDAP gsiftp job allocation job management resource discovery data transfer data control user / proxy Client

  41. Strengths: Mindshare and collaboration in both industry & academia Open source Standards-based underpinnings (e.g. SSL, LDAP) Flexibility and CoG API's Driving OGSA with heavy resource commitment from IBM Weaknesses: Significant effort required to get applications working on a grid Not production quality at this time No “metascheduler” -- user has to explicitly tell their jobs where to run Globus Toolkit V2 Strengths and Weaknesses

  42. Issues inGrid Computing Software : Scheduling

  43. Superscheduling • Superscheduling means scheduling resources in multiple administrative domains. • Various models • Submiting a job to a specific single machine • Submiting a job to single machines at multiple sites (With cancellation option) • Scheduling a single job to use multiple resources • Most common superscheduler : USERS 

  44. Phases Of Superscheduling • Resource Discovery • Authorisation Filtering • Application Requirement Definition • Minimal Requirement Filtering • System Selection • Gathering Information (Query) • Select Systems to run on • Run the Job • Make an Advance Reservation (Optional) • Submit Job to Resources • Preperation Tasks • Monitor Progress • Job Completion • Completion Tasks Source : Global Grid Forum, Scheduling Working Group, 10 Actions When Scheduling, Schopf, 2001

  45. Scheduling Framework (Ranganathan & Foster 2003) • External Scheduler • Local Scheduler • Dataset Scheduler

  46. Scheduling And Replication Algorithms • External Scheduler • JobRandom • JobLeastLoaded • JobDataPresent • JobLocal • Dataset Scheduler • DataDoNothing: No Active Replitication. Everything is on demand • DataRandom: Popular Datasets are replicated to Random Sites • DataLeastLoaded: Popular Datasets are snet to the least loaded sites.

  47. Simulation Results Average Response Times Average Data Transfered

  48. Grid Computing Thank You and Questions?

More Related