on the path to petascale top challenges to scientific discovery n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
On the Path to Petascale: Top Challenges to Scientific Discovery PowerPoint Presentation
Download Presentation
On the Path to Petascale: Top Challenges to Scientific Discovery

Loading in 2 Seconds...

play fullscreen
1 / 11

On the Path to Petascale: Top Challenges to Scientific Discovery - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

On the Path to Petascale: Top Challenges to Scientific Discovery. Scott A. Klasky NCCS Scientific Computing End-to-End Task Lead. 1. Code Performance. From 2004 - 2008, computing power for codes like GTC will go up 3 orders of magnitude! 2 Paths for Pscale computing for most simulations.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'On the Path to Petascale: Top Challenges to Scientific Discovery' - vivian


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
on the path to petascale top challenges to scientific discovery

On the Path to Petascale:Top Challenges to Scientific Discovery

Scott A. Klasky

NCCS

Scientific Computing

End-to-End Task Lead

1 code performance
1. Code Performance
  • From 2004 - 2008, computing power for codes like GTC will go up 3 orders of magnitude!
  • 2 Paths for Pscale computing for most simulations.
    • More physics. Larger problems.
    • Code Coupling.
  • My personal definition of leadership class computing.
    • “Simulation runs on >50% of cores, running for >10 hours.”
    • One ‘small’ simulation will cost $38,000 on a Pflop computer.
    • Science scales with processors.
  • XGC and GTC fusion simulations will run on 80% of cores for 80 hours ($400,000/simulation).
data generated
Data Generated.
  • MTF will be ~2 days.
  • Restarts contain critical information to replay the simulation at different times.
    • Typical Restarts = 1/10 of memory. Dumps every 1 hour. (Big 3 apps support this claim).
  • Analysis files dump every physical timestep. Typically every 5 minutes of simulation.
    • Analysis files vary. We estimate for ITER size simulations data output will be roughly 1GB/5 minutes.
  • DEMAND I/O < 5% of calculation.
  • Total simulation will potentially produce =1280TB + 960GB.
  • Need > (16*1024+12)/(3600 * .05) = 91GB/sec.
  • Asynchronous I/O is needed!!! (Big 3 apps (combustion, fusion, astro allow buffers).
    • Reduces I/O rate to (16*1024+12)/3600 = 4.5GB/sec. (with lower overhead).
    • Get the data off the HPC, and over to another system!
  • Produce HDF5 files on another system (too expensive for HPC system).
workflow automation is desperately needed with high speed data in transit techniques
Workflow Automation is desperately needed. (with high-speed data-in-transit techniques).
  • Need to integrate Autonomics into workflows….
  • Need to make it easy for the scientists.
  • Need to make it fault tolerant/robust.
a few days in the life of sim scientist day 1 morning
A few days in the life of Sim Scientist.Day 1 -morning.
  • 8:00AM Get Coffee, Check to see if job is running.
    • Ssh into jaguar.ccs.ornl.gov (job 1)
    • Ssh into seaborg.nersc.gov (job 2) (this is running yea!)
    • Run gnuplot to see if run is going ok on seaborg. This looks ok.
  • 9:00AM Look at data from old run for post processing.
    • Legacy code (IDL, Matlab) to analyze most data.
    • Visualize some of the data to see if there is anything interesting.
    • Is my job running on jaguar? I submitted this 4K processor job 2 days ago!
  • 10:00AM scp some files from seaborg to my local cluster.
    • Luckily I only have 10 files (which are only 1 GB/file).
  • 10:30AM first file appears on my local machine for analysis.
    • Visualize data with Matlab.. Seems to be ok. 
  • 11:30AM see that the second file had trouble coming over.
    • Scp the files over again… Dohhh
day 1 evening
Day 1 evening.
  • 1:00PM Look at the output from the second file.
    • Opps, I had a mistake in my input parameters.
    • Ssh into seaborg, kill job. Emacs the input, submit job.
    • Ssh into jaguar, see status. Cool, it’s running.
    • bbcp 2 files over to my local machine. (8 GB/file).
    • Gnuplot data.. This looks ok too, but still need to see more information.
  • 1:30PM Files are on my cluster.
    • Run matlab on hdf5 output files. Looks good.
    • Write down some information in my notebook about the run.
    • Visualize some of the data. All looks good.
    • Go to meetings.
  • 4:00PM Return from meetings.
    • Ssh into jaguar. Run gnuplot. Still looks good.
    • Ssh into seaborg. My job still isn’t running……
  • 8:00PM Are my jobs running?
    • ssh into jaguar. Run gnuplot. Still looks good.
    • Ssh into seaborg. Cool. My job is running. Run gnuplot. Looks good this time!
and later
And Later
  • 4:00AM yawn… is my job on jaguar done?
    • Ssh into jaguar. Cool. Job is finished. Start bbcp files over to my work machine. (2 TB of data).
  • 8:00AM @@!#!@. Bbcp is having troubles. Resubmit some of my bbcp from jaguar to my local cluster.
  • 8:00AM (next day). Opps still need to get the rest of my 200GB of data over to my machine.
  • 3:00PM My data is finally here!
    • Run Matlab. Run Ensight. Oppps…. Something’s wrong!!!!!!!!! Where did that instability come from?
  • 6:00PM finish screaming!
need metadata integrated into the high performance i o and integrated for simulation monitoring
Typical Monitoring

Look at volume averaged quantities.

At 4 key times this quantity looks good.

Code had 1 error which didn’t appear in the typical ascii output to generate this graph.

Typically users run gnuplot/grace to monitor output.

Need metadata integrated into the high-performance I/O, and integrated for simulation monitoring.
  • More advanced monitoring
    • 5 seconds move 600MB, and process the data.
    • Really need to use FFT for 3D data, and then process data + particles
      • 50 seconds (10 time steps) move & process data.
      • 8 GB for 1/100 of the 30 billion particles.
    • Demand low overhead <5%!
parallel data analysis
Parallel Data Analysis.
  • Most applications use scalar data analysis.
    • IDL
    • Matlab.
    • Ncar graphics.
  • Need techniques such as PCA
  • Need help, since data analysis is written quickly, and changed often… No harden versions…. Maybe….
new visualization challenges
New Visualization Challenges.
  • Finding the needle in the haystack.
    • Feature identification/tracking!
  • Analysis of 5D+time phase-space (with 1x1012) particles!
  • Real-time visualization of codes during execution.
  • Debugging Visualization.
where is my data
Where is my data?
  • ORNL, NERSC, HPSS (NERSC,ORNL), local cluster, laptop?
  • We need to keep track of multiple copies?
  • We need to query the data. Query based visualization methods.
  • Don’t want to distinguish between different disks/tapes.