1 / 23

“Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

“Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems. Presented to: GoldSim Users Conference - 2007 October 25, 2007 San Francisco, CA Presented by: Patrick D. Mattie, M.S., P.G. Senior Member of Technical Staff Sandia National Laboratories

hanne
Download Presentation

“Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Beowulfery” – Cluster Computing Using GoldSim to Solve Embarrassingly Parallel Problems Presented to: GoldSim Users Conference - 2007 October 25, 2007 San Francisco, CA Presented by: Patrick D. Mattie, M.S., P.G. Senior Member of Technical Staff Sandia National Laboratories Contributions by: Stefan Knopf, GTG and Randy Dockter, SNL-YMP OFFCIAL USE ONLY

  2. Presentation Outline • Cluster Computing Defined • GoldSim and Beowulf? • ‘COTS’ Cluster Computing using GoldSim • GoldSim and E.T.? • Example Cluster • TSPA-Wulf • What is next? Pushing the limits…. OFFCIAL USE ONLY

  3. What is Cluster Computing?What is a Beowulf Cluster? Background OFFCIAL USE ONLY

  4. Cluster Computing Defined • What is a compute cluster? • A Cluster is a widely-used term meaning independent computers combined into a unified system through software and networking. At the most fundamental level, when two or more computers are used together to solve a problem, it is considered a cluster. • Clusters are typically used for High Availability (HA) for greater reliability or High Performance Computing (HPC) to provide greater computational power than a single computer can provide. OFFCIAL USE ONLY

  5. Beowulf Class Cluster • Beowulf Class Cluster is a simple design for high-performance computing clusters on inexpensive personal computer hardware. • Originally developed in 1994 by Thomas Sterling and Donald Becker at NASA • Beowulf Clusters • are scalable performance clusters • based on commodity hardware • require no custom hardware or software • A Beowulf Cluster is constructed from commodity computer hardware (Dell, HP, IBM, etc.) as simple as two networked computers sharing a file system on the same LAN or as complex as thousands of nodes with a high-speed, low-latency interconnects (networking) • Common uses are traditional technical applications such as simulations, biotechnology, and petroleum; financial market modeling, data mining and stream processing. • http://www.beowulf.org OFFCIAL USE ONLY

  6. Advantages of a Beowulf Class Cluster • Less computation time then running a serial process • COTS –’Commodity Off the Shelf’ • Doesn’t require a big budget • Doesn’t require specialized skill set • Can be built using existing computer resources and Local Area Networks (LAN) • Can be constructed over different system configurations/brands/resources • Useful for solving embarrassingly parallel problems OFFCIAL USE ONLY

  7. Why do I need a cluster? • An embarrassingly parallel problem is one for which no particular effort is needed to segment the problem into a very large number of parallel tasks, and there is no essential dependency (or communication) between those parallel tasks • A Monte Carlo simulation is an embarrassingly parallel problem • For example: a 100 realization simulation can be broken into 100 separate problems, each solved independently from the other. • http://en.wikipedia.org/wiki/Embarrassingly_parallel OFFCIAL USE ONLY

  8. Why do I need a cluster? • 100 realization run takes 1 minute per realization • One Computer (or core): • ~1.6 hours • On four computers (or cores): • 25 minutes • Ten computers (or cores): • 10 minutes OFFCIAL USE ONLY

  9. Cluster Computing Using GoldSimPro • GoldSim Distributed Processing Module • The Distributed Processing Module uses multiple copies of GoldSim running on multiple machines (and/or multiple processes within a single machine that has a multi-core CPU) • Grid Computing: Slaves Master OFFCIAL USE ONLY

  10. Cluster Computing - Distributed Processing "Distributed" or "grid computing" - in general is a special type of parallel computing which relies on complete computers (with onboard CPU, storage, power supply, network interface, etc.) connected to a network (private, public or the internet) by a conventional network interface, such as Ethernet. Examples include: • SETI@home Project: http://setiathome.ssl.berkeley.edu/ Analyzing radio telescope data in search of extraterrestrial intelligence OFFCIAL USE ONLY

  11. Cluster Computing Using GoldSimPro There are two versions of the Distributed Processing Module: • GoldSim DP (comes with all versions of GoldSim) • GoldSim DP Plus (licensed separately) OFFCIAL USE ONLY

  12. “Beowulfery” - YMP & GoldSim A Cluster Computing Example OFFCIAL USE ONLY

  13. TSPA-Wulf – Cluster Configuration • Window Server 2003, and Windows 2000 Advanced Server (3GB) • Network simulations (master-slave) • About 220 Intel Xeon 3.6 GHz dual-processor nodes with 8 GB RAM per machine, on a GigE LAN • 60 Intel Xeon 3.0 GHz dual-processor dual-core nodes with 16 GB RAM per machine, on a GigE LAN • One realization per slave CPU—after a slave CPU finishes one realization it accepts another from the master server • 680 processors available (plus 62 legacy processors) • 752 total OFFCIAL USE ONLY

  14. OFFCIAL USE ONLY

  15. OFFCIAL USE ONLY

  16. File Server Master Computer Slave Computers Cases are run by GoldSim as a distributed process from a directory on a Master. Individual realizations are run by GoldSim processes on Slaves. Storage area for TSPA model file • Controlled Storage area for: • Parameter Database • DLLs • input files Storage area for completed TSPA cases. Running the Model -- Overview OFFCIAL USE ONLY

  17. Set-Up On the Master Computer • TSPA model file • TSPA model file (1) Manually move model file to the Master computer. (2) Set-up model file to run specific case. (4) Document changes - conceptual write-up - check list - version control file • Parameter Database • -parameter values • - links to DLLs • - links to input files (3a) Global download of parameter values to model file. • input files • DLLs • input files • DLLs (3b) Global download transfers input files and DLLs to the Maste computer. storage areas onfile server directory on master computer (Transfers occur over LAN) File Server Master Computer OFFCIAL USE ONLY

  18. Running - Transfers to Slaves • PA02 • -Networked1 • - Networked2 • TSPA model file • DLLs • input files • At the start of the distributed process: • A “Networked” directory is created for each processor on each Slave computer. • GoldSim slave process is started for each processor on each Slave computer. • Model file transferred • DLLs transferred • Input files transferred • PA03 • -Networked1 • - Networked2 directory on master server • PA04 • -Networked1 • - Networked2 (2) Information (i.e., LHS sampling) for each realization is transferred to slave processes as they are available. 144 other slave computers Master Computer Slave Computers (Transfers occur over LAN) OFFCIAL USE ONLY

  19. Running - Transfers from Slaves • TSPA model file • PA02 • -Networked1 • - Networked2 (2) GoldSim loads the .gsr files into the model file when all realizations are completed. • PA03 • -Networked1 • - Networked2 (1) .gsr files transferred as each realization is completed. • .gsr files • one per realization • PA04 • -Networked1 • - Networked2 • DLLs • input files 144 other slave computers directory on master computer (Transfers occur over LAN) Slave Computers Master Computer OFFCIAL USE ONLY

  20. TSPA Model Architecture • File size and count • 645 input files (approximately 5 GB in size) • 14 DLLs • GoldSim file with no results (pre-run) is about 200 MB in size • GoldSim file after a run is about 5 to 6 GB in size (compressed); however, there is no intrinsic limitation other than the slowness of file manipulation on a 32-bit operating system OFFCIAL USE ONLY

  21. TSPA-Wulf Benchmarks • 1,000 realizations @ 90 minutes per realization • 62.5 Days to run serial mode • 120 processors would take ~ 12.5 hours • 99% faster • A Typical 1,000,000-year, 1000-realization run (about 470 time steps) requires 24 hours on 150 CPUs (75 dual processor single core nodes, 32-bit, 2.8-3.0 GHz) OFFCIAL USE ONLY

  22. What comes next? OFFCIAL USE ONLY

  23. SNL/GoldSim HPCC R&D • GoldSim evolution/migration to Microsoft HPC • Migration from 32-bit to 64-bit architecture? • Optimize modeling system for Microsoft HPC • Combined SNL/Microsoft/GoldSim task • Link GoldSim with the Microsoft CCS scheduler tool to automatically queue jobs and ‘on the fly’ prioritize or re-prioritize job resources. • Microsoft’s developers working with GoldSim • True Parallel processing? • Using OpenMP to take advantage of multi-cores • Optimize HPC Software for large compute cluster • Combined SNL/Microsoft task OFFCIAL USE ONLY

More Related