Crystal Ball Panel

SOS 7 ORNL Heterogeneous Distributed Computing Research Crystal Ball Panel Al Geist ORNL March 6, 2003

ORNL Heterogeneous Distributed Computing Research Look into the Future Federated Tera-clusters Petascale systems Reply Hazy Try Again Adaptable software HPC Linux Fault Tolerance High performance I/O Eight Ball

Resource Management Accounting & user mgmt System Monitoring System Build & Configure Job management ORNL Heterogeneous Distributed Computing Research Scalable Systems Software for Terascale Centers ORNL ANL LBNL PNNL SNL LANL Ames IBM Cray Intel Unlimited Scale NCSA PSC SDSC Collectively (with labs, NSF centers, and industry) define standard interfaces between systems components for interoperability Goal Create scalable, standardized management tools for efficiently running our large computing centers Part of the DOE SciDAC effort www.scidac.org/ScalableSystems

Progress so far on Integrated Suite Working Components and Interfaces (bold) Grid Interfaces Meta Scheduler Meta Monitor Meta Manager Meta Services Accounting Scheduler System & Job Monitor Node State Manager Service Directory Standard XML interfaces Node Configuration & Build Manager authentication communication Event Manager Important! Allocation Management Usage Reports Validation & Testing Process Manager Job Queue Manager Components written in any mixture of C, C++, Java, Perl, and Python Hardware Infrastructure Manager Checkpoint / Restart

ORNL Heterogeneous Distributed Computing Research Underneath it all Rogue OS and/or daemons cited as problem by existing computer centers Single System Img Adaptive O/S Asymmetric Kernels A scalable file system Scalable High Performance OS What will it be? Linux Lightweight kernel (like Red, BG/L) Scyld approach Other? Fast-OS effort

MTBF Time Ckpt restart Scale ORNL Heterogeneous Distributed Computing Research Scale up and Fall Down Fault Tolerance serious issue when scaling to 100 TF and beyond RAS critical Checkpointing eventually becomes ineffective Need a Fault Tolerance Overhaul Needs: Adaptive runtime MPI Fault Tolerance New FT paradigms

Petascale Paths ORNL Heterogeneous Distributed Computing Research General Purpose vs Simple and Custom Software: Minimum OS w/ High performance but limited app support Full OS Tuned to hardware adapt on the fly Autonomic algorithms Hardware: Customized clusters for each group Centralized general purpose machine Internet in a box Or “out of the box”

ORNL Heterogeneous Distributed Computing Research Big Science The final word - don’t lose track of why we justify petascale systems Science will ultimately be driven by computation, simulation and modeling. Science drivers are key to success in HPC and visa versa

Crystal Ball Panel

Crystal Ball Panel

Presentation Transcript

PEC Pipeline: Looking into the Crystal Ball

A crystal ball for DRM

A Crystal Ball for Data-Intensive Processing

Introduction to Spreadsheet Simulation Using Crystal Ball

Cost Estimating w/ a Crystal Ball

Crystal ball gazing the ig toolkit

The Crystal Ball

Social Gaming Crystal Ball: 2012

Crystal Ball

Tutorial de Crystal Ball

Crystal Ball at MAMI

Status of the Crystal Ball

Looking Into the Crystal Ball

Higher Education’s Crystal Ball: Preparing for 2018

Crystal Ball: Risk Analysis

Crystal Ball Panel

Swarovski Crystal Ball Markers

Peering into the crystal ball

The Wrong Crystal Ball

Crystal Ball Globes Online at ShoppySanta