1 / 6

Bandwidth Challenge at SC03 NNSA ASCI LANL, LLNL, SNL NSF NCSA, SDSC

Bandwidth Challenge at SC03 NNSA ASCI LANL, LLNL, SNL NSF NCSA, SDSC. Cluster File Systems Acmemicro, Supermicro, Intel, S2io DataDirect Networks Foundry Networks Thank you to Qwest and SCinet. The Science.

limei
Download Presentation

Bandwidth Challenge at SC03 NNSA ASCI LANL, LLNL, SNL NSF NCSA, SDSC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bandwidth Challenge at SC03 NNSA ASCILANL, LLNL, SNLNSF NCSA, SDSC Cluster File Systems Acmemicro, Supermicro, Intel, S2io DataDirect Networks Foundry Networks Thank you to Qwest and SCinet

  2. The Science • Computational Science Teams in the NCSA and SDSC Alliances are asking for a single directory space or filesystem across multiple sites • ASC Applications that are run at all three DOE Labs need global namespace and common file system • A Scalable Global Parallel Filesystem deployment allows the creation today of multi-cluster capabilities up to and exceeding 100 TeraFlops • Enables at least 10X scaling of today’s biggest clusters with a single filesystem ASCI, NCSA, SDSC BWC at SC03

  3. Distributed FileSystem and Computing MPI Communication MPI OpenMP OpenMP OpenMP OpenMP Global Distributed Scalable I/O Local Local SNL LANL LLNL SDSC NCSA Idea: Provide a consistent high performance programming and storage model for multiple platform generations and across multiple geographic sites! Idea: Incrementally increase computing performance and storage capacity independently over distance and time! ASCI, NCSA, SDSC BWC at SC03

  4. Our BWC Configuration CFS Canada CFS UK ASCI Booth Lustre Clients Internet Lustre Clients 16 Dual 3.2 GHz P4s 2x10GigTeraGrid and Abilene 16x10GigEs SCinet B/W Display 2x10GigEs MDS 6x10GigEs 2x10Gig TeraGrid and Abilene DataDirect Disks SDSC San Diego Lustre OSTs MDS NCSA UIUC 64x1GigEs 32 Dual 3.2GHz P4s Lustre Clients Lustre OSTs DataDirect Disks ASCI, NCSA, SDSC BWC at SC03

  5. Real World Global Scalable Storage OST OST OST OST OST OST OST OST OST OST OST OST OST OCF SGS File System Cluster (OFC) 1,116 Dual P4 Compute Nodes 128 OST Heads USERS MCR - 1152 Port QsNet Elan3 B439 LLNL External Backbone 128 OST Heads OCF MetaData Cluster B439 MDS GW GW MDS 2 Login nodes 32 Gateway nodes @ 190 MB/s with 4 Gb-Enet delivered Lustre I/O over 2x1GbE 400- 600 Terabytes Federated Ethernet SW SW SW NAS Systems 24 HPSS Archive Dual P4 Head MM Fiber 1 GigE B 113 116 etc. SW SW SM Fiber 1 GigE PFTP FC RAID USERS Copper 1 GigE 146, 73, 36 GB 2Gig FC SW SW SW 2 Login nodes 32 Gateway nodes @ 190 MB/s 2Gig FC with 4 Gb-Enet delivered Lustre I/O over 2x1GbE 2 Login nodes 16 GW Nodes Itanium2 GW GW MDS MDS GW GW MDS MDS B439 B451 B451 ALC - 960 Port QsNet Elan3 Thunder - 1024 PortQsNet Elan4 PVC - 128 Port Elan3 52 Dual P4 924 Dual P4 Compute Nodes 6 Dual P4 1008 4 - Way Itanium2 Compute Nodes Render Nodes Display ASCI, NCSA, SDSC BWC at SC03

  6. What we’ve done new vs BWC Criteria • TCP Measurement – Aggregate filesystem bandwidth more than 3GB • IP Quality – tweaked with Lustre NAL for local and distance latencies • Innovative Implementation – first Lustre global name space over TeraGrid • Real World Application – Prototype for real application use modeled on LLNL Lustre deployment. ASC Applications that run at all three DOE Labs benefit from local and remote shared file access to minimize file movement and replication of data • Geographic Distribution – first time high performance tri-continental Lustre deployment • Geographic Implementation – single filesystem spanning North America and Europe (UK) • Improved Methods – Consistent file system interface across network • First Time Demonstration – Lustre at SC03, biggest distributed file system ASCI, NCSA, SDSC BWC at SC03

More Related