140 likes | 256 Views
Remote HPC Computing. Mr. Robert Burke. Relevant FNMOC Projects. Enterprise Operational Modeling (EOM) Enable FNMOC exploitation of enterprise-wide HPC assets Run models remotely at the Navy DSRC Distribute data directly to customers from Navy DSRC
E N D
Remote HPC Computing Mr. Robert Burke
Relevant FNMOC Projects • Enterprise Operational Modeling (EOM) • Enable FNMOC exploitation of enterprise-wide HPC assets • Run models remotely at the Navy DSRC • Distribute data directly to customers from Navy DSRC • Fully Coupled COAMPS-OS Modeling Capability Initiative • Atmospheric Model Bridge Strategy • Interim solution until Earth System Prediction Capability (ESPC) • Anticipated by 2015 • Needed until at least 2020 for ESPC implementation 2
NOGAPS to ESPC Baseline and Assumptions • NOGAPS will be replaced with the Navy Global Environmental Model (NAVGEM) in 2011 • New Semi-Lagrangian dynamic core and new physics • Resolution upgrades will continue if computational resources allow • New data upgrades will continue if available and supported • NUOPC ensemble and common standards lead to national system • ESPC (or other next generation system) is targeted for operational implementation by 2020 - 2025 • Anticipate a national modeling capability with Navy as contributor • Development and schedule of ESPC is uncertain • Bridge strategy required for Navy global NWP between 2013 and 2020 • Based on NAVGEM data assimilation cycle run at FNMOC, with extended forecasts run at DSRC • Goal is to maintain Navy competence while investing in ESPC • Computational, manpower, and R&D resources will constrain COAs 3
EOM Project Plans • FY11 EOM Plans • Operationalize COAMPS-OS for NAVO regions at the Navy DSRC • Data Management and Transfer • Job Management and Control • Information Assurance (IA) • Documentation – Processes, Approvals, SOPs • Demonstrate NOGAPS Ensemble at the Navy DSRC • FY12 and beyond EOM Plans • Optimize Operationalization among FNMOC, NAVO, and Navy DSRC • Data Management and Transfer • Job Management and Control • Information Assurance (IA) • Configuration Management • Operationalize other Compute Intensive Models at Navy DSRC • NAVGEM • Global ensemble • COAMPS-OS ensemble 6
EOM Data Plan • COAMPS-OS Operational Data Transfer Alternatives • Best solution: data transfer mechanism via ticketless, kerberized remote copy • Best data transfer performance • Can be completely automated with any scheduling mechanism • Bi-directional data transfer, either system can push or pull data • Requires one or more (scalable) A2 Emerald gateway nodes to be provisioned and kerberized • Navy ODAA (NAVNETWARCOM) waiver needed to address IA issues • Interim solution: data sources via CAGIPS and BFT • CAGIPS for all supported data types (currently NOGAPS initial and boundary conditions) • BFT for all data types • Backup data source: GODAE for NAVDAS atmospheric observations and NCODA ocean observations • Interim solution: data transfer back to FNMOC via DMZ 7
EOM Job Management • COAMPS-OS Job Management Situation • NAVO runs COAMPS-dependent ocean models once daily • Model run mechanisms and paradigms • NAVO runs are time dependent, automated via script, and run generally without intervention • FNMOC runs are event dependent, tightly controlled and monitored • COAMPS-OS Operational Job Management Alternatives • PBS Pro remote execution without Supervisor Monitor Scheduler (SMS) • PBS Pro unkerberized already in use at both FNMOC and Navy DSRC • Longer term plans for EOM should minimize software dependency • Alternate control mechanisms with greater operator activity for initiating and controlling run are possible and necessary • Rapid Ocean Assessment Model Environment Relocatable (ROAMER) System • Script-based job monitoring system • Could be tailored and extended for FNMOC usage 10
EOM IA • COAMPS-OS IA Situation • EOM Framework uses three types of connectivity • FNMOC and Navy DSRC connectivity (logon, data transfer, run models) • Data transfer from FNMOC to NAVO and NAVO to Navy DSRC • FNMOC Job Initiation, Control and Monitoring of DSRC model runs • FNMOC and Navy DSRC Maintain Different Security Postures • FNMOC part of operational community requiring full C&A with necessary demilitarized zones (DMZ), firewall, and border routers • Navy DSRC an R&D HPC center bound by HPCMP and DOD IA policies for R&D systems – currently no DMZ or firewall • Navy DSRC does have NAVNETWARCOM ATO with residual risk rating of Low • EOM IA Special Requirements • Most Ports, Protocols and Services (PPS) required for connection of FNMOC workstations and FNMOC operational cluster to Navy DSRC are Navy network policy compliant • Data transfer and job management between MAC II (FNMOC) and MAC III (DSRC) systems 11
EOM IA Issues Explored • DoD IA Mission Assurance Categories • Mission Assurance Category I (MAC I) • Systems handling vital information to mission effectiveness of deployed or contingency forces in terms of both content and timeliness • Require most stringent protection measures • Not applicable to FNMOC • Mission Assurance Category II (MAC II) • Systems handling important information to support deployed or contingency forces • Consequences of loss of integrity are unacceptable • Loss of availability can only be tolerated for a short time • Require safeguards beyond best practices to ensure adequate assurance • FNMOC operational systems • Mission Assurance Category III (MAC III) • Systems handling information necessary for the conduct of day-to-day business, but does not materially affect support to deployed or contingency forces in the short term • Consequences of loss of integrity could include include delay or degradation of services or commodities enabling routine activities • Navy DSRC 12
EOM Considerations with the DSRC • EOM IA Strategy • Minimize software and PPS used exclusively for EOM • Trade off EOM functionality and ease-of-use to gain IA, maintainability, and mobility • Obtain ODAA approval for preferred data transfer alternatives • DSRC Technology • Hardware technology refresh cycle • DSRC typically three years, • FNMOC 5-7 years • Software availability • DSRC a compute-engine • FNMOC requirements • Job management • Process control • Configuration management 13
Summary • Leveraging remote HPC assets is part of a long-range strategy to deliver capability in a budget constrained world • There are unique challenges presented by DoD Information Assurance requirements • By carefully choosing what jobs can run remotely, “cloud-like” computing is possible