E N D
1. Slide 1 Grid-BGC Annual Review Year 1
2. Slide 2
3. Slide 3
4. Slide 4
5. Slide 5
6. Slide 6 These progress review slides are summaries of the quarterly reports filed in e-books.These progress review slides are summaries of the quarterly reports filed in e-books.
7. Slide 7
8. Slide 8
9. Slide 9
10. Slide 10
11. Slide 11
12. Slide 12
13. Slide 13 System Requirements and Architecture (1/14) Requirements Gathering Process
High-level usage scenarios were developed.
Scenarios were refined into functional requirements.
Requirements were distilled into the Software Requirements Specification (SRS). Available on project website.
The SRS is a living document, which will require modifications throughout the project.
14. Slide 14 System Requirements and Architecture (2/14)
15. Slide 15 System Requirements and Architecture (3/14) System Design Goals
The system is being designed for a long lifespan, anticipating an extensive user base.
Manage changing technologies
Globus Toolkit (OGSA -> WSRF)
GUI Technologies (JSF / Portlets)
Computational Models
Manage changing resources
Computational Resources
Storage Resources OGSA = open grid service architecture
WSRF = web services resource framework
JSF = java server facesOGSA = open grid service architecture
WSRF = web services resource framework
JSF = java server faces
16. Slide 16 System Requirements and Architecture (4/14) System Design Drivers
The needs of our research community
Project Resources
Portioning the system into components that map to the available resources on the project.
Partitioning components to facilitate development in isolated environments
Changing Security Policies
Need to be able to react to changing security policies and procedures throughout the project. Some cases of partitioning include the work the CU group did, and how the interface to the models was laid out so that they can be developed in isolation.Some cases of partitioning include the work the CU group did, and how the interface to the models was laid out so that they can be developed in isolation.
17. Slide 17 System Requirements and Architecture (5/14)
18. Slide 18 System Requirements and Architecture (6/14)
19. Slide 19 System Requirements and Architecture (7/14) Software Architecture: Application Stack overview
20. Slide 20 System Requirements and Architecture (8/14) Software Architecture: User Interface Layer
Web Portal organizes work flow and provides customized interface to users projects and data objects.
Thin client architecture accommodates distributed user base.
Implementation based on JSP / Struts.
Initial prototype nearing completion:
User logins supported, with NCAR authentication.
Proof of concept for database connectivity.
Next stages:
Completion of workflow structure.
Submission to beta testers for feedback.
Revise to incorporate beta tester feedback.
JSP = java server pagesJSP = java server pages
21. Slide 21 System Requirements and Architecture (9/14) Software Architecture: User Interface Layer
22. Slide 22 System Requirements and Architecture (10/14) Software Architecture: User Interface Layer
23. Slide 23 System Requirements and Architecture (11/14) Software Architecture: User Interface Layer
24. Slide 24 System Requirements and Architecture (12/14) Software Architecture: Application Logic Layer
Domain Model
Core Application Logic and Workflow Management.
Java Object Model.
Job Management Services
Contains services to execute the models on remote resources.
Uses the Globus Toolkit for the grid infrastructure.
Key abstraction for managing upcoming changes in the Globus Toolkit. Domain Model = project and data object modelDomain Model = project and data object model
25. Slide 25 System Requirements and Architecture (13/14) Software Architecture: Data Management Layer
Data Mapping Services
RDBMS specific mapping.
Abstracts details specific to a particular database implementation.
File Storage Services
Manages the file storage resources for the system.
Manages online disk cache and NCAR MSS resources. Currently we are mapping to the specifics of using Postgres.
RDBMS = relational database management systemCurrently we are mapping to the specifics of using Postgres.
RDBMS = relational database management system
26. Slide 26 System Requirements and Architecture (14/14) Software Architecture: Data Storage Layer
Split into two components: relational database and file based storage.
Design drivers
User interface responsiveness.
Minimize mass store access costs.
Relational database
Postgres 7.3
File storage
Online disk cache
NCAR mass storage system
27. Slide 27 Grid Services and Job Management (1/11) GridBGC Execution Engine Goals:
Export a Grid service for executing Daymet and Grid-BGC simulations
Support running multiple simulation engines and accepting requests from multiple user interfaces
Provide reliable tile-based simulation execution
28. Slide 28 Grid Services and Job Management (2/11) This service-based approach exports a GridBGC service as a Globus Grid service.
Clients simply request that simulations be run with specified parameters, and the system takes care of the details.
This service-based approach exports a GridBGC service as a Globus Grid service.
Clients simply request that simulations be run with specified parameters, and the system takes care of the details.
29. Slide 29 Grid Services and Job Management (3/11) Globus Managed Jobs vs. GridBGC Service:
Application-based approach: Globus Grid Resource Allocation and Management (GRAM) provides a Managed Job Execution service
GRAM uses User Hosting Environments
Allows running executables somewhere on the Grid
We selected a service-based approach
We dont require completely portable code
NCAR MSS is not globally available secondary storage, it requires machine-specific DataMover configuration
Instead, an installable GridBGC service is used by remote clients
30. Slide 30 Grid Services and Job Management (4/11) Grid-BGC Service: design approach
Provide a Globus-based Grid Service for running Daymet and Biome-BGC simulations.
Design goals:
Portable installable Grid Service and execution engine
Simple XML client interface for simulation specification
Reliable execution and data transfer
Submit and forget
We use the following Globus Toolkit components:
Grid Security Infrastructure (GSI) for authentication
GridFTP for data transfer
Web Services (WS) for the Grid Service
We separate the Grid Service from job execution: we run the simulations separately using a Job Engine
31. Slide 31 Grid Services and Job Management (5/11)
32. Slide 32 Grid Services and Job Management (6/11) Simulation and Tile execution
Grid Service:
Listens for user job submissions and queries
Interacts with database only
Java-based Execution Engine:
Potential states:
Waiting and stalling
Data stage-in
Model execution
Data stage-out
Cleanup and finalization
33. Slide 33 Grid Services and Job Management (7/11) Potential Job States
Most tile jobs finish with Success.
Tile jobs may terminate with Errors
Unrecoverable problems (e.g. missing files, system code errors)
Errors handler will save return codes and error messages for user query.
Tile jobs may be Held for manual intervention
True anticipated transients
DataMover servers down, NCAR MSS down, etc.
Disk space issues, scheduled maintenance
Tiles held while administrator corrects situation
Tiles may resume from Held state
34. Slide 34 Grid Services and Job Management (8/11) Administrative Support Applications
Additional web interface to JobEngine server
Grid Service also exports data for inclusion in main portal interface
Grid Service also exports data for inclusion in main portal interfaceGrid Service also exports data for inclusion in main portal interface
35. Slide 35 Grid Services and Job Management (9/11) Grid Service System Testing
Automated client simulator
Generates random simulation and tile requests
Submits job via Globus Web Services
Transfers data from DataPortal using DataMover and GridFTP
Runs sample application using PBS on Hemisphere
Client simulator may be run on DataPortal or Hemisphere
Test configuration downloads 2-5 10-20 MB files, requires 15-30 minutes walltime, and uploads 1 10-20 MB file
Testing methodology
Run through crontab, submits about 100 tiles/hour
Overall, we have run over 1000 simulated tiles through the system
Testing identified timeout problems with the DataMover system on Hemisphere
36. Slide 36 Grid Services and Job Management (10/11) Challenges: DataMover Integration
DataMover from Lawrence Berkeley Laboratories
Grid transfer engine
Integrates with NCAR Mass Storage System
Runs local disk resource manager cache
Received interim release from LBL
Working directly with DataMover developers
Developers Alex Sim and Junmin Gu have been responsive and helpful.
37. Slide 37 Grid Services and Job Management (11/11)
38. Slide 38
39. Slide 39
40. Slide 40 Moving visualization and interface with existing data systems to Year 2, to make sure that the System architecture for our system is stable first.Moving visualization and interface with existing data systems to Year 2, to make sure that the System architecture for our system is stable first.
41. Slide 41
42. Slide 42
43. Slide 43
44. Slide 44
45. Slide 45
46. Slide 46