1 / 45

Grid-BGC Annual Review Year 1

gianna
Download Presentation

Grid-BGC Annual Review Year 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Slide 1 Grid-BGC Annual Review Year 1

    2. Slide 2

    3. Slide 3

    4. Slide 4

    5. Slide 5

    6. Slide 6 These progress review slides are summaries of the quarterly reports filed in e-books.These progress review slides are summaries of the quarterly reports filed in e-books.

    7. Slide 7

    8. Slide 8

    9. Slide 9

    10. Slide 10

    11. Slide 11

    12. Slide 12

    13. Slide 13 System Requirements and Architecture (1/14) Requirements Gathering Process High-level usage scenarios were developed. Scenarios were refined into functional requirements. Requirements were distilled into the Software Requirements Specification (SRS). Available on project website. The SRS is a living document, which will require modifications throughout the project.

    14. Slide 14 System Requirements and Architecture (2/14)

    15. Slide 15 System Requirements and Architecture (3/14) System Design Goals The system is being designed for a long lifespan, anticipating an extensive user base. Manage changing technologies Globus Toolkit (OGSA -> WSRF) GUI Technologies (JSF / Portlets) Computational Models Manage changing resources Computational Resources Storage Resources OGSA = open grid service architecture WSRF = web services resource framework JSF = java server facesOGSA = open grid service architecture WSRF = web services resource framework JSF = java server faces

    16. Slide 16 System Requirements and Architecture (4/14) System Design Drivers The needs of our research community Project Resources Portioning the system into components that map to the available resources on the project. Partitioning components to facilitate development in isolated environments Changing Security Policies Need to be able to react to changing security policies and procedures throughout the project. Some cases of partitioning include the work the CU group did, and how the interface to the models was laid out so that they can be developed in isolation.Some cases of partitioning include the work the CU group did, and how the interface to the models was laid out so that they can be developed in isolation.

    17. Slide 17 System Requirements and Architecture (5/14)

    18. Slide 18 System Requirements and Architecture (6/14)

    19. Slide 19 System Requirements and Architecture (7/14) Software Architecture: Application Stack overview

    20. Slide 20 System Requirements and Architecture (8/14) Software Architecture: User Interface Layer Web Portal organizes work flow and provides customized interface to users projects and data objects. Thin client architecture accommodates distributed user base. Implementation based on JSP / Struts. Initial prototype nearing completion: User logins supported, with NCAR authentication. Proof of concept for database connectivity. Next stages: Completion of workflow structure. Submission to beta testers for feedback. Revise to incorporate beta tester feedback. JSP = java server pagesJSP = java server pages

    21. Slide 21 System Requirements and Architecture (9/14) Software Architecture: User Interface Layer

    22. Slide 22 System Requirements and Architecture (10/14) Software Architecture: User Interface Layer

    23. Slide 23 System Requirements and Architecture (11/14) Software Architecture: User Interface Layer

    24. Slide 24 System Requirements and Architecture (12/14) Software Architecture: Application Logic Layer Domain Model Core Application Logic and Workflow Management. Java Object Model. Job Management Services Contains services to execute the models on remote resources. Uses the Globus Toolkit for the grid infrastructure. Key abstraction for managing upcoming changes in the Globus Toolkit. Domain Model = project and data object modelDomain Model = project and data object model

    25. Slide 25 System Requirements and Architecture (13/14) Software Architecture: Data Management Layer Data Mapping Services RDBMS specific mapping. Abstracts details specific to a particular database implementation. File Storage Services Manages the file storage resources for the system. Manages online disk cache and NCAR MSS resources. Currently we are mapping to the specifics of using Postgres. RDBMS = relational database management systemCurrently we are mapping to the specifics of using Postgres. RDBMS = relational database management system

    26. Slide 26 System Requirements and Architecture (14/14) Software Architecture: Data Storage Layer Split into two components: relational database and file based storage. Design drivers User interface responsiveness. Minimize mass store access costs. Relational database Postgres 7.3 File storage Online disk cache NCAR mass storage system

    27. Slide 27 Grid Services and Job Management (1/11) GridBGC Execution Engine Goals: Export a Grid service for executing Daymet and Grid-BGC simulations Support running multiple simulation engines and accepting requests from multiple user interfaces Provide reliable tile-based simulation execution

    28. Slide 28 Grid Services and Job Management (2/11) This service-based approach exports a GridBGC service as a Globus Grid service. Clients simply request that simulations be run with specified parameters, and the system takes care of the details. This service-based approach exports a GridBGC service as a Globus Grid service. Clients simply request that simulations be run with specified parameters, and the system takes care of the details.

    29. Slide 29 Grid Services and Job Management (3/11) Globus Managed Jobs vs. GridBGC Service: Application-based approach: Globus Grid Resource Allocation and Management (GRAM) provides a Managed Job Execution service GRAM uses User Hosting Environments Allows running executables somewhere on the Grid We selected a service-based approach We dont require completely portable code NCAR MSS is not globally available secondary storage, it requires machine-specific DataMover configuration Instead, an installable GridBGC service is used by remote clients

    30. Slide 30 Grid Services and Job Management (4/11) Grid-BGC Service: design approach Provide a Globus-based Grid Service for running Daymet and Biome-BGC simulations. Design goals: Portable installable Grid Service and execution engine Simple XML client interface for simulation specification Reliable execution and data transfer Submit and forget We use the following Globus Toolkit components: Grid Security Infrastructure (GSI) for authentication GridFTP for data transfer Web Services (WS) for the Grid Service We separate the Grid Service from job execution: we run the simulations separately using a Job Engine

    31. Slide 31 Grid Services and Job Management (5/11)

    32. Slide 32 Grid Services and Job Management (6/11) Simulation and Tile execution Grid Service: Listens for user job submissions and queries Interacts with database only Java-based Execution Engine: Potential states: Waiting and stalling Data stage-in Model execution Data stage-out Cleanup and finalization

    33. Slide 33 Grid Services and Job Management (7/11) Potential Job States Most tile jobs finish with Success. Tile jobs may terminate with Errors Unrecoverable problems (e.g. missing files, system code errors) Errors handler will save return codes and error messages for user query. Tile jobs may be Held for manual intervention True anticipated transients DataMover servers down, NCAR MSS down, etc. Disk space issues, scheduled maintenance Tiles held while administrator corrects situation Tiles may resume from Held state

    34. Slide 34 Grid Services and Job Management (8/11) Administrative Support Applications Additional web interface to JobEngine server Grid Service also exports data for inclusion in main portal interface Grid Service also exports data for inclusion in main portal interfaceGrid Service also exports data for inclusion in main portal interface

    35. Slide 35 Grid Services and Job Management (9/11) Grid Service System Testing Automated client simulator Generates random simulation and tile requests Submits job via Globus Web Services Transfers data from DataPortal using DataMover and GridFTP Runs sample application using PBS on Hemisphere Client simulator may be run on DataPortal or Hemisphere Test configuration downloads 2-5 10-20 MB files, requires 15-30 minutes walltime, and uploads 1 10-20 MB file Testing methodology Run through crontab, submits about 100 tiles/hour Overall, we have run over 1000 simulated tiles through the system Testing identified timeout problems with the DataMover system on Hemisphere

    36. Slide 36 Grid Services and Job Management (10/11) Challenges: DataMover Integration DataMover from Lawrence Berkeley Laboratories Grid transfer engine Integrates with NCAR Mass Storage System Runs local disk resource manager cache Received interim release from LBL Working directly with DataMover developers Developers Alex Sim and Junmin Gu have been responsive and helpful.

    37. Slide 37 Grid Services and Job Management (11/11)

    38. Slide 38

    39. Slide 39

    40. Slide 40 Moving visualization and interface with existing data systems to Year 2, to make sure that the System architecture for our system is stable first.Moving visualization and interface with existing data systems to Year 2, to make sure that the System architecture for our system is stable first.

    41. Slide 41

    42. Slide 42

    43. Slide 43

    44. Slide 44

    45. Slide 45

    46. Slide 46

More Related