1 / 52

Introduction to Grid Computing

Introduction to Grid Computing. Concurrent and Distributed Programming course Mark Silberstein, CS,Technion. Electric Power Grid analogy A little bit of history. Beginning of the XX century Electric power Know how to generate and how to use. Problem for wide adoption: Generators

davidryan
Download Presentation

Introduction to Grid Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Grid Computing Concurrent and Distributed Programming course Mark Silberstein, CS,Technion

  2. Electric Power Grid analogyA little bit of history • Beginning of the XX century • Electric power • Know how to generate and how to use. • Problem for wide adoption: Generators • Solution: Electric power grid – INFRASTRUCTURE for power distribution and interface standardization • Integration of resources opens NEW opportunities • Beginning of the XXI century • Computational power • Know how to produce and how to use • Problem for high performance applications: High-end resources • Solution: Computational grid – INFRASTRUCTURE for pervasive and inexpensive access to high-end resource Mark Silberstein, CDP, Technion

  3. Typical Grid usage scenario Plug your PC into Computation Grid Infinite power (CPU/Storage/etc…) Start application You don’t care where it is running Get results Output is waiting for you locally Electric Power Grid usage scenario Plug in your Teapot (many) Infinite electric power capacity Turn it on You don’t care WHO supplies the power Drink your tea Water is inside the teapot Grid Computing Vision Mark Silberstein, CDP, Technion

  4. What is Grid Computing? • Computational Gridis a collection of distributed (geographically/administrative domains), heterogeneous resources which can be used as an ensemble to execute large-scale applications • Metacomputer – Virtualization of widely distributed resources Mark Silberstein, CDP, Technion

  5. PACI Grid Mark Silberstein, CDP, Technion

  6. Is it really that NEW idea? • People connected computers together and used them long before Grid was introduced • BUT! Everything was done manually: • I need to run simulation – Pre-Grid HOWTO Guide: • Call admin at the remote site to open account • Stage your application and data to remote site • Meanwhile storage is full, need to ask to remove old stuff • Different protocols • Reserve (another call to admin) CPU • Run job and pray that nothing fails • If everything is fine – stage back output • Call admin and pay • Do it for every site and with different protocols • Grid should provide AUTOMATION Mark Silberstein, CDP, Technion

  7. Scientific Grid Computing • Collaboration - “Virtual Organizations” • “I have CPU, you produce Data, she has Storage” • “I have X CPUs (Storage), you have Y CPUs (Storage). Use mine and I’ll use yours” • “I have Super Computer, but she has Visualization Cave. ” • On-Demand computing • “My experiment requires many CPUs/Disk/anything. Let me use your resources for 2 days.” • Better resource utilization • “My computers are never used at night. You may use them when they are idle” • Sharing of Experimental Results • CERN collider will produce PBytes of results. Researches all over the world want to analyze them Mark Silberstein, CDP, Technion

  8. Why Grid? Grid Applications • Distributed Supercomputing • Distributed Supercomputing applications couple multiple computational resources – supercomputers/clusters/workstations over inter/intra net • Examples include: • SFExpress (large-scale modeling of battle entities with complex interactive behavior for distributed interactive simulation) • Climate Modeling (high resolution, long time scales, complex models) Mark Silberstein, CDP, Technion

  9. Why Grid? Grid Applications • High-Throughput Applications • Grid used to schedule large numbers of independent or loosely coupled tasks with the goal of putting unused cycles to work • High-throughput applications include RSA keycracking, Seti@home (detection of extra-terrestrial intelligence), MCell (Bioinformatics) Mark Silberstein, CDP, Technion

  10. Why Grid? Grid Applications • Data-Intensive Applications • Focus is on synthesizing new information from large amounts of physically distributed data (TERA/PETA bytes) • Examples include NILE (distributed system for high energy physics experiments using data from CLEO), SAR/SRB applications, digital library applications, CERN Mark Silberstein, CDP, Technion

  11. Grid Computing Challenges • Grid is yet another computing platform: META computer • Unusable without specialized software, just like any other conventional computer • What makes our computer usable? • Operating System + Drivers • Management Software • Applications Mark Silberstein, CDP, Technion

  12. Applications High-level Services and Tools System utilities User libraries Hardware CPU Peripherals Memory Buses Layered View of Computer Architecture Core Services H/W Abstraction Layer I/O Security VM Scheduling OS Internal Object Management Mark Silberstein, CDP, Technion

  13. Zoom on Core Services Authentication, Authorization Allocation policy IPC, Communication, File System Core Services H/W Abstraction Layer I/O Security VM Scheduling Access to shared resources OS Internal Object Management Resources Access Protocols Naming Global Information Mark Silberstein, CDP, Technion

  14. Grids vs. “PC”;)) • Different administration domains • Security • Geographical distribution • Communication, Scheduler, Object Management • No global knowledge • Resource management, Naming • No centralized control • Resource management, Allocation policy, • Heterogeneity • Resource access protocols, Resource Management • Scale • And all this for millions of resources!! Mark Silberstein, CDP, Technion

  15. Applications High-level Services and Tools Resource Managers and Schedulers Grid Compilers High level communication Grid Utilities Data Replication Grid Programming Libraries Local Services Condor TCP UDP LSF PBS Linux AIX Layered View of Grid Architecture Core Services High performance I/O Synchronization Metacomputing Directory Access to remote storage Reservation Security Remote process management Accounting Mark Silberstein, CDP, Technion

  16. What is Grid Computing? • Computational Gridis a collection of distributed (geographically/administrative domains), heterogeneous resources, implementing open Grid protocols to enable their use as part of metacomputer(s) Mark Silberstein, CDP, Technion

  17. Agenda • Core services • Globus architecture • High Level services and tools • Condor-G Mark Silberstein, CDP, Technion

  18. Metacomputing Directory Core Services == Globus High performance I/O Security Remote process management Globus Toolkit Components Access to remote storage Grid Access to Secondary Storage MetaData Service Grid Resource Allocation Manager GridFTP Globus I/O Grid Security Infrastructure Mark Silberstein, CDP, Technion

  19. Globus ToolkitGrid Core Services • Provides Core Grid Services • GSI – security infrastructure • GRAM, DUROC – generic interface for resource allocation • GASS + GridFTP – data transfer and secondary storage access • MDS: GRIS/GIIS – Meta Data service • Replica Management – Data replication and management • Provides C/Java/(Python soon) API to use and extend the services • Provides command-line utilities • MPICH-G2 – Grid enabled MPI • Supports numerous architectures (no M$ yet) Mark Silberstein, CDP, Technion

  20. Security Terminology • Authentication: Establishing identity • Authorization: Establishing rights • Accounting • Message protection • Message integrity • Message confidentiality • Digital signature • Public/private key • Certificate • Certificate Authority (CA) Mark Silberstein, CDP, Technion

  21. Public Key Based Authentication • User sends certificate over the wire • Other end sends user a challenge string • User encodes the challenge string with private key • Possession of private key means you can authenticate as subject in certificate • Public key is used to decode the challenge. • If you can decode it, you know the subject • Treat your private key carefully!! • Private key is stored only in well-guarded places, and only in encrypted form Mark Silberstein, CDP, Technion

  22. Grid Security Requirements • Single sign-on • User should authenticate only once • Delegation of authority • Simultaneous access to large pool of resources • Site autonomy • Respect and not override local site security • Authentication and Authorization • One-to-one identification and user specific policy Mark Silberstein, CDP, Technion

  23. Globus Security Infrastructure • Provides public key-based security system that layers on top of local site security • User identified to system using X.509 certificate (same as certificates used for Web) containing info about the duration of permissions, public key, signature of certificate authority • Each user has a Grid User ID, private key, certificate signed by a Certificate Authority (CA) • GSI allows for delegation of authority and single sign on – certificate chaining with certificate proxy • Proxy is another certificate, signed by user private key • Allows remote process to act on behalf of user, without password exposure • Site autonomy: Grid User ID should have mapping to local user at the resource in order to “log in” Mark Silberstein, CDP, Technion

  24. Mutual authentication • User and resources generates certificate and gets it signed by trusted CA one time • Certificate contains user’s name and public key • Grid coordinating authority operates CA • User and resources each maintain list of trusted CA certificates • This enables mutual authentication (process by which a subject proves its identity to a requestor, typically through the use of a credential.) Mark Silberstein, CDP, Technion

  25. Globus GSI • General scenario: User wants to execute on remote resources • How this happens securely: • User is authenticated by a CA – one time only • To achieve a single logon effect, user creates a temporary user proxy credential • User proxy has limited lifetime which user specifies • User proxy credential sent to gatekeeper of each desired resource • Gatekeeper sends copy of its certificate to user • Mutual Authentication - user checks gatekeeper’s certificate signature against trusted certificates; gatekeeper checks user signature against CA’s trusted certificates • Gatekeeper checks to see if user has permission to execute on that machine • If user has permission, then job is submitted to local job scheduler and job is started on remote machine Mark Silberstein, CDP, Technion

  26. Single sign-on via “grid-id” & generation of proxy cred. Or: retrieval of proxy cred. from online repository Remote process creation requests* GSI-enabled gatekeeper Authorize Map to local id Create process Generate credentials Ditto GSI-enabled gatekeeper Process Process Communication* Local id Local id Kerberos ticket Restricted proxy Remote file access request* Restricted proxy User Proxy GSI-enabled FTP server Proxy credential Authorize Map to local id Access file * With mutual authentication GSI in Action“Create Processes at A and B that Communicate & Access Files at C” User Site A (Kerberos) Site B (Unix) Computer Computer Site C (Kerberos) Storage system Mark Silberstein, CDP, Technion

  27. Globus Resource Allocation Manager • Resource Management services provide mechanism for remote job submission and management • 3 low level services: • GRAM (Globus Resource Allocation Manager) • Provides remote job submission, monitoring and management • DUROC (Dynamically Updated Request Online Co-allocator) • Provides simultaneous job submission and barrier • Layers on top of GRAM • RSL ( Resource specification language) Mark Silberstein, CDP, Technion

  28. GRAM Requirements • Reliableinvocation and cancellation • Only-once semantics • Monitoring and event notification • Process failure should propagate to the submission site • Deferred process invocation – state transitions • Reliable job manager • Job may keep running, but remote monitoring agent may fail • Heterogeneity of platforms • Generic interface to any local resource manager • Send-boxing Mark Silberstein, CDP, Technion

  29. GRAM Components Client 1 6 Resource allocation request and process creation Site boundary Opaque https contact string Local Resource Manager Event Notification Control requests 4 5 Request Allocate & create processes Grid Security Infrastructure Create Job Manager 2 Gatekeeper Process 3 Monitor & control Parse Process RSL Library Process Mark Silberstein, CDP, Technion

  30. Grid Information Infrastructure • Requirements • Resource discovery • All grid resources are registered • Resource selection • Should contain specific resource information • Challenges • Any information is always “already old” • Scalability • Fault-tolerance • Unknown information structure • Consistency • Access control Mark Silberstein, CDP, Technion

  31. Globus Information Infrastructure • MDS (Metacomputing Directory Service) • MDS stores information about entry = some type of object (organization, person, network, computer, etc.) • Object class associated with each entry describes a set of entry attributes • Every entry is tagged with creation time and TTL • LDAP (Lightweight Directory Access Protocol) used to store information about resources • LDAP = hierarchical, tree-structured information model defining form and character of information Mark Silberstein, CDP, Technion

  32. MDS object Mark Silberstein, CDP, Technion

  33. Information Infrastructure Components • Information providers: Grid Resource Information Service (GRIS) • Run close to information source • Generate data in required format and store it in the Local Information Directory • Queries • Speak GRid Information Protocol (GRIP) • Perform soft-registration into Information Registries • Speak GRid Registration Protocol (GRRP) • Information Registries: Grid Index Information Service (GIIS) – Aggregates Info for Virtual Organization • Aggregate information about existing GRISes in VO • Provide hierarchical naming • May itself serve as GRIS for upper hierarchies • Forward all search requests to the low level GRISes Mark Silberstein, CDP, Technion

  34. GIIS GIIS VO C Query GRIS GRIS GRIS GRIS GRIS GRIS GRIS GRIS VO B CPU, disk, … CPU, disk, … CPU, disk, … CPU, disk, … CPU, disk, … CPU, disk, … CPU, disk, … CPU, disk, … How it all works Host1: Vo-B Host2: Vo-B Host3: Vo-B CPU=PIII FreeRAM=4GB Created=20.2.2003:14.00 TTL=10min Periodically registers (Soft registration) GIIS Periodically invokes scripts to obtain information VO A Mark Silberstein, CDP, Technion

  35. GASS/GridFTP • Grid Access to Secondary Storage • GASS Cache • Provides transparent access to remote files • open(“ftp://..) • Lazy copy • Utilities to enforce consistency • FTP – open standard • Problem: low performance • GridFTP – FTP with high performance enhancements Mark Silberstein, CDP, Technion

  36. Metacomputing Directory Core Services == Globus High performance I/O Security Remote process management Globus Toolkit Componentsjust to remind what we learnt Access to remote storage Grid Access to Secondary Storage MetaData Service Grid Resource Allocation Manager GridFTP Globus I/O Grid Security Infrastructure Mark Silberstein, CDP, Technion

  37. Grid resource management • Raw grid infrastructure is useless without resource manager • Resource manager requirements • Resource discovery • Resource selection • Optimal job placement • Scheduling • …. Mark Silberstein, CDP, Technion

  38. Resource Manager Global view of job invocation RSL Queries & Info Simple ground RSL Information Service Application Runtime monitoring Data and executable Staging Local resource managers GRAM GRAM GRAM Condor Linux PBS Mark Silberstein, CDP, Technion

  39. Condor-G – Condor gateway into grid • Manual job invocation using Globus services is difficult • Manual data staging • No job restart after failure • Security issues • No queuing • High load on invocation machine Mark Silberstein, CDP, Technion

  40. Globus Universe • Run a job on a Grid resource • Features • Job management • Fault tolerance • Credential management • User specifies grid resources in submission file • Jobs are queued locally and then are executed on grid resource Mark Silberstein, CDP, Technion

  41. 600 Globus jobs User Job How It Works Condor-G Grid Resource GRAM Schedd PBS GridManager Mark Silberstein, CDP, Technion

  42. Condor-G: problems • No resource selection • Job monitoring is restricted by GRAM • Can not use checkpointing and remote system calls Mark Silberstein, CDP, Technion

  43. GlideIn • Run the Condor daemons on Grid resources as user jobs • Create your own personal Condor pool from temporarily-acquired Grid resources • Brings the full power of Condor to the Grid Mark Silberstein, CDP, Technion

  44. Globus Grid LSF PBS Condor Condor-G Mark Silberstein, CDP, Technion

  45. Globus Grid 600 Condor jobs LSF PBS Condor Condor-G Mark Silberstein, CDP, Technion

  46. Globus Grid Condor-G 600 Condor jobs LSF PBS Condor Mark Silberstein, CDP, Technion

  47. Globus Grid Condor-G 600 Condor jobs LSF PBS glide-ins Condor Mark Silberstein, CDP, Technion

  48. Globus Grid Condor-G 600 Condor jobs LSF PBS glide-ins Condor Mark Silberstein, CDP, Technion

  49. Globus Grid Condor-G 600 Condor jobs LSF PBS glide-ins Condor Mark Silberstein, CDP, Technion

  50. Globus Grid Condor-G 600 Condor jobs LSF PBS glide-ins Condor Mark Silberstein, CDP, Technion

More Related