1 / 10

FAST-OS: Petascale Single System Image

FAST-OS: Petascale Single System Image. Jeffrey S. Vetter (PI) Computer Science and Mathematics Computing & Computational Sciences Directorate Team Members Nikhil Bhatia, Collin Mccurdy, Hong Ong, Phil Roth, WeiKuan Yu. PetaScale single system image. What?

holt
Download Presentation

FAST-OS: Petascale Single System Image

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FAST-OS: Petascale Single System Image Jeffrey S. Vetter (PI)Computer Science and MathematicsComputing & Computational Sciences Directorate Team Members Nikhil Bhatia, Collin Mccurdy, Hong Ong, Phil Roth, WeiKuan Yu

  2. PetaScale single system image • What? • Make a collection of standard hardware look like one big machine, in as many ways as feasible • Why? • Provide a single solution to address all forms of clustering • Simultaneously address availability, scalability, manageability, and usability • Components • Virtual cluster using contemporary hypervisor technology • Performance and scalability of parallel shared root file system • Paging behaviors of applications, i.e., reducing TLB misses • Reducing “noise” from operating systems at scale

  3. Virtual clusters using contemporary hypervisor technology • Goal: Virtual cluster of domUs on multiple dom0s • Enable preemptive migration of OS images for reliability, maintenance, load-balancing • Have virtual login node images, virtual service node images, and virtual compute node images (x86_64, FC5-based) • Initial experimentation on two x86_64 RHEL4 hosts • Collect per-node/per-domain high-level information about host and guest domains • Use a copy-on-write scheme at NFS server • Unionfs-based root file system trees, one per virtual cluster node • Investigate oprofile for active profiling • Performance data should help make migratory decisions • Investigate functionality of Xentop and the LibVirt API for scalable management of domUs • Develop management console for manual and algorithmic control

  4. High-speed migration of virtual clusters with InfiniBand (IB) Goal: Reduce impact of virtualization on high-performance computing application performance • High bandwidth, low latency • Effective migration of OS Phase 1: Xen virtualization with IB • IBM and OSU have a prototype version, functioning over IA32 • Functioning over IA32, support only IB verbs and IPoIB • Need to port to x86_64 • Need support for other IB components Phase 2: Xen migration over IB, prototype under • Migration of IB-capable Xen domains over TCP • Snapshot and close IB domains before migration • Open and reconnect after migration • Address issues in handling user-space IB info • Protection key • IB context, queue pair numbers and LID • Need to coordinate static domains, too Phase 3: Migration of IB-capable Xen domains over IB, live migration

  5. Paging behavior of DOE applications Goal: Understand paging behavior of applications in large scale system Trace-based memory subsystem simulator • Performs dynamic tracing using PIN (Intel’s) • For each data memory reference, simulator • Executes code that looks up data in caches/TLB/PTEs • Estimates latency of misses • Produces estimate of cycles spent in memory subsystem • Offers these advantages of simulation • Allows evaluation of nonexistent configurations: • What if paging were turned off? • What if Opteron used Xeon smaller-L1-cache-for-larger-TLB tradeoff? Potential weakness: inaccuracy • Sequential vs parallel memory accesses • Intel (simulator) vs Portland Group (XT3) compiler • Different runtime libraries (MPI)

  6. Applications and methodology Applications: • High-Performance Computing and Communications benchmarks • Sequential: STREAM, DGEMM, FFT, RandomAccess • Parallel: MPI_RandomAccess, HPL, FFT, PTRANS • Climate:POP, CAM, HYCOM • Fusion: GYRO, GTC • Molecular Dynamics: AMBER, LAMMPS Methodology: • 16p version of each benchmark • Unmeasured: init code, warm up caches; measured: three time steps • Eight variants (results normalized to HW-large): • HW: large and small pages • Simulated Opteron: large, small, and no pages • Simulated Opteron/Xeon hybrid: large, small, no pages

  7. Example results: HYCOM Initial simulation results match measurement and illustrate differences in page configurations

  8. Parallel root file system Goals of study • Use a parallel file system for implementation of shared root environment • Evaluate performance of parallel root file system • Understand root I/O access pattern and potential scaling limits Current status • RootFS implemented using NFSv4, PVFS-2, and Lustre 1.4.6 • RootFS distributed using ramdisk implementation • Modified mkinitrd program locally (pvfs2 and lustre) • Modified to init scripts to mount root at boot time (pvfs2 and lustre) • Configured compute nodes to run small Linux kernel • Released two Lustre 1.4.6.4 patches for kernel 2.6.15 • File systems performance measured using IOR, b_eff_io, and NPB I/O

  9. Example results: IOR read/write throughput

  10. Contacts Jeffery S. Vetter Principle InvestigatorComputer Science and Mathematics Computing & Computational Sciences Directorate(865) 356-1649 vetter@ornl.gov Hong Ong(865) 241-1115 hongong@ornl.gov Nikhil Bhatia(865) 241-1535 bhatia@ornl.gov Collin McCurdy(865) 241-6433 cmccurdy@ornl.gov Phil Roth (865) 241-1543 rothpc@ornl.gov WeiKuan Yu (865) 574-7990 wyu@ornl.gov 10 Ong_SSI_0611

More Related