Hpc at pnnl march 2004 l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

HPC At PNNL March 2004 PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on
  • Presentation posted in: General

HPC At PNNL March 2004. R. Scott Studham, Associate Director Advanced Computing April 13, 2004. HPC Systems at PNNL. Molecular Science Computing Facility 11.8TF Linux based supercomputer using Intel Itanium2 processors and Elan4 interconnect

Download Presentation

HPC At PNNL March 2004

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Hpc at pnnl march 2004 l.jpg

HPC At PNNLMarch 2004

R. Scott Studham,

Associate Director

Advanced Computing

April 13, 2004


Hpc systems at pnnl l.jpg

HPC Systems at PNNL

  • Molecular Science Computing Facility

    • 11.8TF Linux based supercomputer using Intel Itanium2 processors and Elan4 interconnect

      • A balance for our users: 500TB Disk, 6.8 TB memory

  • PNNL Advanced Computing Center

    • 128 Processor SGI Altix

    • NNSA-ASC “Spray Cool” Cluster

2


William r wiley environmental molecular sciences laboratory l.jpg

William R. WileyEnvironmental Molecular Sciences Laboratory

  • Who are we?

    • A 200,000 square-foot U.S. Department of Energy national scientific user facility

    • Operated by Pacific Northwest National Laboratory in Richland, Washington

  • What we provide for you

    • Free access to over 100 state-of-the-art research instruments

    • A peer-review proposal process

    • Expert staff to assist or collaborate

  • Why use EMSL?

    • EMSL provides - under one roof - staff and instruments for fundamental research on physical, chemical, and biological processes.

3


Hpcs2 configuration l.jpg

HPCS2 Configuration

1,976 next generation Itanium® processors

928 compute nodes

…...

Elan4

Elan3

Lustre

2Gb SAN / 53TB

2 System Mgt nodes

4 Login nodes

with 4Gb-Enet

The 11.8TF system is in full operations now.

11.8TF

6.8TB Memory

4


Who uses the mscf and what do they run l.jpg

Who uses the MSCF, and what do they run?

Gaussian

5

FY02 numbers


Mscf is focused on grand challenges l.jpg

MSCF is focused on grand challenges

Fewer users focused on Longer, Larger runs and Big Science.

More than 67% of the usage is for large jobs.

Demand for access to this resource is high.

6


Slide7 l.jpg

The world-class science is enabled by having systems that enable the fastest time-to-solution for our science

Significant improvement (25-45% for moderate number of processors) in time to solution by upgrading the interconnect to Elan4.

  • Improved efficiency

  • Improved scalability

    HPCS2 is a science driven computer architecture that has the fastest time-to-solution for our users science of any system we have benchmarked.

7


Accurate binding energies for large water clusters l.jpg

Accurate binding energies for large water clusters

These results provide unique information on the transition from the cluster to the liquid and solid phases of water.

Code: NWChem

Kernel: MP2 (Disk Bound)

Sustained Performance: ~0.6 Gflop/s per processor (10% of peak)

Choke Point: Sustained 61GB/s of Disk IO and used 400TB of scratch space.

Only took 5 hours on 1024 CPUs of the HP cluster. This is a capability class problem that could not be completed on any other system.

8


Energy calculation of a protein complex l.jpg

Energy calculation of a protein complex

The Ras-RasGAP protein complex is a key switch in the signaling network initiated by the epidermal growth factor (EGF). This signal network controls cell death and differentiation, and mutations in the protein complex are responsible for 30% of all human tumors.

Code: NWChem

Kernel: Hartree-Fock

Time for solution:~3 hours for one iteration on 1400 processors

Computation of 107 residues of the full protein complex using approximately 15,000 basis functions. This is believed to be the largest calculation of its type.

9


Biogeochemistry membranes for bioremediation l.jpg

Biogeochemistry:Membranes for Bioremediation

Molecular dynamics of a lipopolysaccharide (LPS)

HPCS1

Classical molecular dynamics of the LPS membrane of Pseudomonas aeruginosa and mineral

Quantum mechanical/molecular mechanics molecular dynamics of membrane plus mineral

HPCS2

HPCS3

10


A new trend is emerging l.jpg

The MSCF provides a synergy between the computational and experimentalists.

A new trend is emerging

  • With the expansion into biology, the need for storage has drastically increased.

  • EMSL users have stored >50TB in the past 8 months. More than 80% of the data is from experimentalists.

Projected Growth Trend for Biology

Log Scale!

11


Storage drivers we support three different domains with different requirements l.jpg

Storage DriversWe support Three different domains with different requirements

  • High Performance Computing – Chemistry

    • Low storage volumes (10 TB)

    • High performance storage (>500MB/s per client, GB/s aggregate)

    • POSIX access

  • High Throughput Proteomics – Biology

    • Large storage volumes (PB’s) and exploding

    • Write once, read rarely if used as an archive

    • Modest latency okay (<10s to data)

    • If analysis could be done in place it would require faster storage

  • Atmospheric Radiation Measurement - Climate

    • Modest side storage requirements (100’s TB)

    • Shared with community and replicated to ORNL

12


Pnnl s lustre implementation l.jpg

PNNL's Lustre Implementation

  • PNNL and the ASCI Tri-Labs are currently working with CFS and HP to develop Lustre.

  • Lustre has been in full production since last Aug and used for aggressive IO from our supercomputer.

    • Highly stable

    • Still hard to manage

  • We are expanding our use of Lustre to act as the filesystem for our archival storage.

    • Deploying a ~400TB filesystem

660MB/s from a single client with a simple “dd” is faster than any local or global filesystem we have tested.

We are finally in the era where global filesystems provide faster access

13


Security l.jpg

Security

  • Open computing requires a trust relationship between sites.

  • User logs into siteA and ssh’s to siteB. If siteA is compromised the hacker has probably sniffed the password for siteB.

    • Reaction #1: Teach users to minimize jumping through hosts they do not personally know are secure (why did the user trust SiteA?)

    • Reaction #2: Implement one-time passwords (SecureID)

    • Reaction #3: Turn off open access (Earth simulator?)

14


Thoughts about one time passwords l.jpg

Thoughts about one-time-passwords

A couple of different hurdles to cross:

  • We would like to avoid having to force our users to carry a different SecureID card for each site they have access to.

  • However the distributed nature of security (it is run by local site policy) will probably end up with something like this for the short term.

    As of April 8th the MSCF has converted over to the PNNL SecureID system for all remote ssh logins.

    Lots of FedEx’ed SecureID cards

15


Summary l.jpg

Summary

  • HPCS2 is running well and the IO capabilities of the system are enabling chemistry and biology calculations that could not be run on any other system in the world.

  • Storage for proteomics is on a super-exponential trend.

  • Lustre is great. 660MB/s from a single client. Building 1/2PB single filesystem.

  • We rapidly implemented SecureID authentication methods last week.

16


  • Login