OGSA - A View From The Trenches Andrew Grimshaw GGF Architecture Area Co-Director January, 2005
Agenda • Background – quick • OGSA objectives and process • OGSA design teams • Opportunities for collaboration
What is an architecture? • In the computer systems world an architecture is the definition of the components, their interactions, and the design philosophy used in the development of the whole system. In a grid, high-performance secure, shared, collaboration distributed system, the architecture will define the services, their interactions, and the design philosophy. In other words, what are the pieces of the puzzle, how do they fit together, and what does the puzzle look like when complete. One of our design philosophies is that the pieces can be replaced, extended, and tailored to particular use cases. Further, a systems architecture is the architecture on which applications, application services, and specialized views or profiles of the architecture are built. OGSA is a grid system architecture.
Success Requires an integrated model at the foundation. Architecture Requirements • Simple • Secure • Standards-based • Multiple interoperable implementations • Scalable • Extensible • Site Autonomy • Persistence & I/O • Multi-Language • Legacy Support • Transparency • Heterogeneity • Fault-tolerance & Exception Management Manage Complexity!!
OGSA Aims and Perspective • Goals • Interoperable solutions to Grid based applications • Grid definitions sidebar • Addressing loosely coupled distributed computing • Philosophy • Standardization at the Architectural level • Similar to profiling. • Developed before and/or during standards development • Use existing standards and technology where possible • Use case driven gap analysis • Gaps are filled proactively • Not exclusively within the GGF (e.g. naming).
OGSA Process • Use Case Driven • 21 Detailed Use Cases (~ 6 pages each) • Tier 1 Available at: http://www.ggf.org/documents/GWD-I-E/GFD-I.029.pdf • Distributed Specification and Standardization • Identify and/or develop open and accessible standard specifications • Active current work in GGF, OASIS, W3C, and DMTF. • “Design Team” Working Model • Facilitate cross fertilization within and outside GGF. • Avoid redundant work applicable efforts • Focus mind share (the most valuable commodity) • e.g. DAIS-WG and OGSA-Data Design Team • Iterative Refinement • Abstract service evolving to concrete specifications • Documents: • OGSA: Use Cases, Informal Specification, Recommendation
OGSA –What is it? • Two streams • Profiles • Design Teams Working Groups • Process for design team, working group, profile development interaction • Draw circle
Profiles • Define a usage pattern and include specifications developed by working groups both within and external to GGF. • Issue: How mature and “widely adopted”? • Three “in the pipe” • Basic • Data • Execution Management
Design Teams • Naming – the foundation on which distributed systems are built • Security – deeply dependent on WS-Security • Data of all types • Execution Management Services – EMS • Logging – spit off into a working group
“A Rose by any other name would smell as sweet” Terms • Resource • Abstract resource name • Human name (paths and attributes) • Resource address • Resource identity • Binding scheme • Bind time
Why names? • Transparencies • Location • Migration • Failure • Replication • Scalability • and so on
Distributed naming is a well-understood area - properties • Unique • Provide identity • Comparable • Location portable • Widely adopted • Scalable – high performance • Extensible • Dynamic binding • …. • Two and three level name schemes dominate
Two level schemes • Human name -> address • E.g., DNS, Unix file system (string->inode) • abstract name -> address
Three level schemes • Human -> abstract -> address • In OGSA, • Human -> address and Human -> abstract will likely be handled by RNS – Resource Naming Service being developed by the GGF GFS-WG
OGSA Security • Process is not moving rapidly • Partially because they are waiting on WS Security • Maybe too focused on one set of use cases (big government labs working together) (my opinion)
OGSA Data & InfoD • Use case driven • Many different data “types” and use scenario’s from HEP to business intelligence • Strong consensus emerging with some issues still around meta-data and information dissemination • Strawman services defined for flat files, interacting with GFS. Pushing for early spec’s. • Interacting with existing GGF WG’s including GFS, GSM, DIAS, Info-D • Interacting begun with WSDM
Info Services • Troubleshooting • Event Management • Discovery • Logging – spun off
EMS Overview • Basic problem: provision, execute and manage services (including legacy applications) in a grid • Some use cases • start up a cache service • on-demand, utility computing • start up and manage a set of legacy applications • Want to be able to “instantiate” a service and have the grid figure it out, and provide management interfaces throughout the lifetime of the service
EMS addresses issues such as: • Where can a service execute? What are the locations it can execute because of resource restrictions such as memory, CPU and binary type, available libraries, and available licenses? Given the above, what policy restrictions are in place that may further limit the candidate set of execution locations? • Where should the service execute? Once it is known where the service can execute, the question is where should it execute? This may involve different selection algorithms that optimize different objective functions or are trying to enforce different policies or service level agreements. • Prepare the service to execute. Just because a service can execute somewhere does not necessarily mean it can execute there without some setup. Setup could include deployment and configuration of binaries, libraries, staging data, or other operations to prepare the local execution environment to execute the service. • Get the service executing. Once everything is ready, actually start the service and register it in the appropriate places. • Manage (monitor, restart, move, etc.). Once the service is started in must be managed and monitored. What if it fails? Or fails to meet its service agreements. Should it be restarted in another location? What about state? Should the state be “checkpointed” periodically to ensure restartability? Is the service participating in some sort of fault-detection and recovery scheme?
EMS Services fall into three sets • Resources that model processing, storage, executables, resource management, and provisioning • Job management and monitoring services; and • Resource selection services that collectively decide where to execute a service.
Job Manager Typical Pattern Candidate Set Generator (Work -Resource mapping) • Provisioning • Deployment • Configuration Execution Planning Services Information Services Reservation Persistent State Handle Service Service Container Accounting Services
Opportunities For Collaboration • OMII and EGEE efforts intersect with OGSA design team efforts • We all win if we can come to consensus • EMS • The basic problem that everyone (Globus, SGE, LSF, Legion, EGEE, OMII) solves is the same. • Solutions have many similarities • EMS team spent quite a bit of time hammering those out • We’re here to make sure that OMII input is part of design • Similarly for data