Analysis

Anatomy of a Climate Science Workflow CASCADE The CASCADE Team SCIENCE DRIVER DEMONSTRATION OF METHODS The CASCADE climate group's charter has two major thrusts: detection & attribution of extreme events and climate model's ability to simulate extreme events, both of which are grounded heavily using statistical methodologies and plan to utilize scalable HPC-aware software infrastructure to handle current and upcoming climate science challenges through a common unified workflow. Addressing emerging requirements for the analysis of extreme events is a growing challenge in the climate science community. The scale of data currently at terabytes will only grow larger, while processing three to six hours of intervals will become a more frequent occurrence, and focus increasingly on high resolution datasets (i.e, 1/4th to 1/8th degree and beyond). The high resolution and high frequency analysis will be several orders of magnitude greater, resulting in critical need for effective utilization of HPC resources and a software infrastructure and workflow designed to take advantage of these resources. CASCADE Workflow • Optimization Strategies • DepCache • MPI • Threading CASCADE Workflow Module CASCADE Workflow Module CASCADE Workflow Module Unified Workflow Service D & A Workflow Job Scheduler Input/Output Numpy Fault Tolerance Model Fidelity Validation & Verification Performance Analysis Data Reduction Data Tracking Stats User, Data, and Resource Management DESIGN OF METHODS The CASCADE software infrastructure team is tasked with providing a streamlined interoperable infrastructure and expertise in scaling algorithms, simplifying the exercise and coordination of large ensemble runs for analysis and computation of uncertainties, and leading deployment efforts with the focus on modular, extensible components and an emphasis on usability. This section highlights three characteristic instances of the CASCADE unified workflow: handling performance challenges, scalability challenges for model fidelity, and providing scalable statistical analysis routines. These components provide the building blocks of use cases from the members of each of the other CASCADE teams. The performance pipeline highlights effort required to speed up analysis routines exercised hundreds to thousands of times, the model-fidelity pipeline highlights effort required to provide scalable ensemble execution, and finally the statistical analysis pipeline highlights the efforts to parallelize spatio-temporal statistical routines such as extreme value analysis which is then utilized within the model fidelity and detection & attribution . The examples below show how resources and efforts are shared within a more unified construction of work that manages effective utilization of resources, data movement, scheduling, and management. Analysis Remote Clients Modules • TECA • EVA, GEV • MpiWrapper • Archiving, & Distribution • TECA • EVA, GEV • MpiTaskWrapper • Archiving, & Distribution Resource Configuration ----- Common Usage Templates SCIENCE IMPACT Merging Separate Tasks • Climate-Centric WorkFlow Environment • Identification of Use Cases • Extraction of Computational Algorithms • Scaling & Optimization of Work • Templated Workflow Configurations for Common Use Cases • Abstraction of Services to HPC environments • Archiving, Distribution, and Verification Strategy • TODO.. D&A Model Statistics FILE System/HPSS I/O: Data Movement and Staging Reanalysis Data FILE System I/O - Spatio-Temporal: Multi: All Time, Ensemble Members One: All Grid + All Time • Parallel Execution: • Create Initial Condition Parallel Execution: Analysis over Yearly|Monthly|Hourly Data • Parallel Execution: • Run CESM • Parallel Execution: • Run CESM Analysis • Parallel Execution: • For Each Location • Execute R-based Extreme Value Analysis Algorithms • Analysis: • TECA| EVA| Custom Verification and Archiving • Comparative Analysis • Hindcast results from CESM runs at varying spatial resolution Analysis Data Reduction & Output: Output Multiple Results such return values and standard error to NETCDF • Distribution: • To ESGF Service at NERSC Repeat for a range of partial simulation resolutions BATCH PROCESING DATA MANAGEMENT PERFORMANCE METRICS CUSTOM ANALYSIS DATA TRACKING FAULT TOLERANCE PROVENANCE

Analysis

Analysis

Presentation Transcript

multivariate analysis: factor analysis

Strategic Analysis. Environmental Analysis

ANALYSIS

Nonlinear Analysis: Riks Analysis

Analysis

Problem Analysis/Statement Behaviour Analysis Participant Analysis Communication Channel Analysis

Analysis

Analysis

Analysis

Terrain Analysis (Surface Analysis)

Analysis

Complexity Analysis : Asymptotic Analysis

Analysis

Analysis

Complexity Analysis : Asymptotic Analysis

Analysis

Data Analysis Regression Analysis

Analysis

Analysis

Analysis

Analysis

Analysis